Note: A lot of the information in this post comes from a great MSDN article located here.
Caveat: My client uses 64-bit servers (AMD Opterons), running 64-bit versions of Windows 2003 R2 and BizTalk 2006. IIS is running in 32-bit compatibility mode (as we use Sharepoint). I haven't yet worked out if the CRL problem occurs on 32-bit servers - I definitely haven't noticed the problem on our 32-bit servers as of yet.For 2 months, my BizTalk application was working fine. The system passed performance testing, and was deployed on the Live servers in preparation for final connectivity testing.
Then one Monday, last week, the test team complained that they were experiencing sporadic timeouts. On the same day, I was doing some testing on an unrelated BizTalk application on a separate server... and I noticed that I would occasionally get request-response latency approaching 70 secs...
Given that the same day I'd noticed I no longer had access to iTunes Radio from that morning (bah!), I assumed that changes had been made to our proxy sever or firewall. I fired up TCP View on the server I was working on, and there was our old friend SYN_SENT: something was blocking access to the CRL again. I spoke to the Tech Support team and discovered that no changes had been made to the proxy server. Leaving them to check for changes to our firewall and security policies, I decided to do some research into why this delay exists (if the call is blocked) and if there was a way around it. Here's what I discovered (refer to this article for a more in-depth explanation of Certificates and CRLs):
One thing I was curious about was this 15 second delay which kept popping up.
The Xceed Software post I had read had made reference to there being a 15 second delay hard-coded into the WinVerifyTrust API call.
Looking through the documentation for WinVerifyTrust I noticed two things:
I'm not about to trace what WinVerifyTrust does to actually check the CRL, but I'd suspect that it ends up delegating to either CertGetCertificateChain or CertVerifyRevocation (and I'd bet that internally, CertGetCertificateChain calls CertVerifyRevocation to verify the CRL for a given certificate).
Suffice to say that CertGetCertificateChain will build a chain of certificates starting from the given certificate, and building the chain all the way up to the root CA, and will optionally check the revocation status for each certificate in the chain; whilst CertVerifyRevocation will verify the revocation status for a single certificate.
And both of them take, as one of their parameters, a struct called CERT_REVOCATION_PARA.The format of that structure is:
Heh, look, there's a member called dwUrlRetrievalTimeout.
Wonder if that's relevant???
The documentation has this to say:
This member contains the time-out limit, in milliseconds. If zero, the revocation handler's default time-out is used.
And what's the revocation handler's default time-out?
Well, Microsoft doesn't specify this directly... but I notice in a related knowledge base post, that a value of 15000 milliseconds is used… i.e. 15 seconds!
So that's as far as we can go with that – unless IIS includes an option to configure this timeout, then we can't change it (and they do, sort of).
Whilst researching this post, I noticed that one solution that is frequently touted is to modify the following registry key:
HKCU\Software\Microsoft\Windows\CurrentVersion\WinTrust\Trust Providers\Software Publishing\State
But that's not much use, as that's for the Current User (hence the HKCU). Great if I was using my own local user account for the application pools, bad if I'm using a non-interactive user account (which we are). Plus I'm not sure this would work for IIS… maybe I'll try it at some stage.
(Note: looks like Microsoft are aware of this issue, because in Windows Vista/Longhorn there's now a Group Policy setting which lets you set this default timeout for non-interactive processes i.e. IIS App Pools!!)
So what's the solution in this case?
Well, unless the technical support guys can work out what they changed to block CRL access (I suspect they turned on authentication on the proxy), we have four choices:
Then when browsing through the PKI documentation, I noticed a reference to the same registry keys, plus a note saying "this setting was first introduced with MS04-011" (the IIS 5.0 update linked to above).
So it looks like it is possible to set the default timeout.
I haven't tried this, so can't verify that it works, but to me it's not the correct solution: the CRL should be available, either from AD or the URL, or by installing it manually – setting the timeout to a lower value seems to be just ignoring the problem, plus creates a potential security hole as you can't be sure that the certificate used to sign code is valid anymore.
Manually downloading and installing a CRLNeedless to say, I thought I'd have a go manually downloading the CRL and installing it – and it worked a treat. Problem solved (at least until the next CRL update is needed, which is August 2007). Still, gives us a breather to get it properly sorted.
Finding the URL to the certificate is easy: look in the certificate details for the CRL Distribution Point, and copy the URL from there. In this case, it's the Microsoft Code Signing Public Certification Authority CRL: http://crl.microsoft.com/pki/crl/products/CodeSignPCA.crl
You can put this URL in a web browser, and download the certificate.
(Note: if you're doing this in Windows Server 2003, you'll need to add crl.microsoft.com to your list of Trusted Sites, otherwise you won't be able to download the CRL file)
Once you have the file, you can install it following the instructions here:
And lo and behold, the problem was fixed.At least, it is fixed until August 30th 2007 when the CodeSignPCA.crl expires... But by then, I'm sure we'll have found a permanent fix!
Disclaimer The opinions expressed herein are my own personal opinions and do not represent my employer's view in anyway.