As is typically the case with my sporadic blog posts, we recently ran into a situation where, as a colleague likes to put it, we ended up far down the rabbit hole. There are a few lessons to be learned here so I’m going to explain a bit about the troubleshooting process, not just how we eventually fixed the issue.
The initial complaint as reported was that one set of laptops was not working properly, but only during the first class period, and only in one classroom. Now, it’s not uncommon for mobile devices to take a little while to wake up, but in this case the machines were turning on an hour before class started, and again, we were told the problem was very limited in scope.
We were able to duplicate the problem, but again, only during the one period, and even stranger, only when the class was in the room. We use Meraki access points, and based on this particular AP not seeing its mesh neighbors all that well, we initially thought we were dealing with some localized RF interference. After extensive troubleshooting of the RF environment we could not find any explanation for the apparent interference. The AP was seemingly overloaded during this period of time, but there was no obvious cause.
One thing that did stand out was certain wireless clients were transferring an abnormal amount of data. We noted this with the intent to revisit it later, but because we had no reports of issues from other locations, we concluded this was unrelated to our current situation.
That was a mistake. We assumed that because there were no other problems reported, that there were no other problems. As it turns out, nobody else was using laptops at this time of day. The lesson here is to check your assumptions.
Once we realized that we could not draw any conclusions from the location of the problem, we expanded our investigation and revisited the previously mentioned excessively large amounts of data being downloaded. The Meraki APs do an excellent job of reporting certain info, and it quickly became obvious the data was coming from our WSUS patch server.
We had recently rebuilt WSUS, and with 2012 R2 there are (were) some important out-of-band patches that can result in failed downloads. We reviewed these and confirmed that on a fully-patched 2012 R2 install, these needed updates are now included in the normal security and critical updates that Microsoft pushes out. (Evidently you still must add the .esd mime type to IIS, but that’s unrelated to this post.)
We next turned to one of the laptops that was experiencing a problem. On Windows 10, getting a human-readable update log requires running the PowerShell command Get-WindowsUpdateLog; this creates the old-style WindowsUpdate.log you may be used to from earlier Windows versions.
Searching for errors, we came across the following:
Misc Validating signature for <file guid> with dwProvFlags 0x00000080: Misc Error: 0x80092026 when verifying trust for <file guid> DownloadManager File failed postprocessing, error = 80092026 DownloadManager Failed file: URL = <WSUSServer><File>, Local path = <local path> DownloadManager Error 0x80092026 occurred while downloading update; notifying dependent calls.
I should add that many of the affected laptops were not consistently reporting to the new WSUS server, we were not certain if this was related but it certainly seemed suspicious.
A number of online searches confirmed that this error had something to do with a certificate check failing for a particular update. Since our content filter does SSL decryption, we double-checked that we were not decrypting any traffic to Microsoft’s CRL or patch servers: we were not. Of course, if something were being blocked by the content filter we would expect every client to be impacted, but most of our PCs were working fine.
We found different files failing post-processing on different computers, so this did not seem to be related to a specific update. Was what more interesting is that despite only one downloaded update failing, WSUS (or BITS) seemingly discarded the entire batch of downloads, so the next time Windows Update started to run, it re-downloaded everything.
After further troubleshooting, I stumbled on this thread on Microsoft TechNet. User “lforbes” posted the following:
I found the offending key for those that are interested. HKEY_USERS\S-1-5-18\Software\Microsoft\Windows\CurrentVersion\WinTrust\Trust Providers\Software Publishing State SHOULD be 146432 or 0x00023c00 If it switches to 408576 or 0x00063c00 [t]hen it will cause these Crypto Errors
Checking the registry on affected computers confirmed that the key in question was set to 0x00063c00. After changing it back to 0x0023c00, the trust provider errors stopped. I have no idea how he found that key, and what’s amazing is that this TechNet post seems to be the only reference to this error and this registry key out there.
I didn’t want to just make the change without understanding what it meant, so researching further lead to this post which clarifies that the “State” key is a bitwise flag for various settings of “Microsoft Trust verification services, which provide a common API for determining whether a specific subject can be trusted.” After a bit of hex to binary conversion and a comparison between the two sets of flags, I determined that the difference was the incorrect setting had enabled the flag to only “allow…items in personal trust database.” Seemingly this flag was preventing a system-level certificate from being used to validate a Microsoft signature. No clue why.
But of course things didn’t stop there.
Whether it was related to the original certificate error or not, many of the same PCs that had this error were also not reporting to WSUS. This at least I already had a partial fix for. Years ago I had put together a basic batch file that reset the SusClientId, deleted the SoftwareDistribution folder, and basically forced a full reset of the WSUS client. (I’m not taking credit for figuring this all out, but I have no idea where I first saw the info.)
The process involves running the following as a batch file on the affected PC:
net stop wuauserv
reg delete HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\WindowsUpdate /v AccountDomainSid /f
reg delete HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\WindowsUpdate /v PingID /f
reg delete HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\WindowsUpdate /v SusClientId /f
reg delete HKLM\SOFTWARE\Microsoft\Windows\CurrentVersion\WindowsUpdate /v SusClientIdValidation /f
del c:\windows\softwaredistribution /S /F /Q
net start wuauserv
wuauclt /resetauthorization /detectnow
This stops the Automatic Update service, clears out the registry keys that identity the system to WSUS, clears out the patch download cache, restarts the Automatic Update service, and then re-registers the system to WSUS. There’s one little problem. The “/detectnow” switch was apparently removed in the Win 10 version of wuauclt.exe, leaving no simple way to force a machine to scan WSUS for updates.
There are a couple of ways to get around this via VBScript or languages that directly call the Windows Update API. In our case however we already were licensed for WUInstall, which has this functionality built-in. We added an additional command at the end of the batch file to force WUInstall to scan for updates, and at this point, everything seems to be working properly.
Thanks for reading, and “lforbes,” if you’re out there, thank you for posting your fix.