How to set your SLA for certificate authority?
When I was planning a PKI solution for one of my customers, the first thing that came to my mind is how to define the SLA for certificate authority? That is, how much time can a corporate wait for a failed CA server for example to become up and running, and what is the actual damage meanwhile? In other words, what cannot be down wile a CA is down?
The answer is simple, if CA is down, two things are immediately get affected:
- Your ability to issue certificates.
- Your ability to recover keys.
- Your ability to sign CRLs.
Well, in small environments, you do not care that much for point 1 and 2, and your focus shall go to point 3. Let me start by stating the urgency of signing CRL and how this might affect your SLA for certificate authority.
If you do not sign CRLs and publish them before the current CRL expires, any service that performs CRL checking for your digital certificates will eventually stop working, or accepting your internally issued certificates. This is a huge thing!
So, point 3 is the most important factor to look at when you have failed CA. You can do one of two things:
- If you possess the private key of the CA, you can manually sign and publish a CRL.
- You have to recover the CA before the current CRL expires.
As point 1 is not an easy thing or normal thing to do, my focus is point 2, which is fixing the CA before the current CRL expires.
If I have a 3 days CRL validity period, then I can have as much as 3 days until the current CRL expires (this is when the CA published a CRL and fail immediately), or I can have up to one second before the current CRL expires (this is when the CA published a CRL and failed after 3 days and right before publishing the next CRL).
CRL Overlap to the rescure
There is additional configuration the you can do, to help you with this problem and giving you more time to recover a failed CA, before the current published CRL expire. This option is called CRL Overlap. CRL Overlap can be configured by setting the following two registry key settings: CRLOverlapUnits and CRLOverlapPeriod.
So, in my previous example, my CRL had a validity period of 3 days. What I can do now is add a CRL Overlap Period of 3 days. With this configuration, my CRL will be valid for a period of 6 Days. However, at 3 days a new CRL will be published as well. This is illustrated in the graphic below:
In the example illustrated in the graphic above, CRL 1 will be valid for a period of 6 days. CRL 2 will be published at Day 3. So, if my CA fails between Day 1 and Day 3, I would still have 3 Days (Day 3 through Day 6) to perform an emergency CRL signing or to recover my CA in event of failure. If my CA fails between Day 3 and Day 6, there is a new CRL (CRL 2) that is valid through Day 9. So, if my CA fails between Day 3 and Day 6, I still have at least 3 days to perform emergency CRL signing or to recovery my CA, before revocation checking starts to fail. And the reason that I have the 3 days, is that CRL Overlap Period extended out my CRL for 3 days, and staggered the Next Publish and Next Update times by 3 days.
So far we have identified the concept of CRL overlap. This is an important thing to consider when planning your SLA for certificate authority. CRL overlap also helps in the following cases:
- Active Directory replication delays.
- CRL distribution from CA server to revocation server delays.
- Temporary network connectivity issues.
- Unexpected server failure.
Under the Certification Services configuration hive in the registry, two values control the overlap period for the base CRL, and two registry values define the overlap period for delta CRL creation:
- CRLOverlapPeriod = REG_SZ: Hours |Minutes (Units)
- CRLOverlapUnits = REG_DWORD:0x0 (Value)
- CRLDeltaOverlapPeriod = REG_SZ:Hours|Minutes (Units)
- CRLDeltaOverlapUnits = REG_DWORD:0x0 (Value)
You can verify the settings for the above registry keys on your CA computer with the following commands:
1certutil -getreg CA\CRLOv*
1certutil -getreg CA\CRLDeltaOv*
According to a TechNet article , this applies to both Windows 2008 R2 and Windows 2012: Microsoft states that the default setting is 10 percent from the CRL lifecycle, and if not configured manually, it will have maximum of 12 hours. If configured manually, the overlap period cannot exceed the publishing period.
By know, you get some idea about CRL overlap, and how this might become handy when defining your SLA for certificate authority.
CRL Certificates Extensions
To better understand your SLA for certificate authority, you need to understand couple of terms related to CRLThere are three terms used to describe the base and delta CRLs:
Effective Date (aka ThisUpdate)
[The term Effective date is used in the Windows certificate dialog while Certutil.exe and the RFC name this field “ThisUpdate“]
This is a mandatory field and it describes the date that a CRL became effective. The effective time, by default, is set to 10 minutes prior to the current date and time to allow for clock synchronization issues.
ThisUpdate = MaximumOf (CurrentTime – ClockSkewMinutes, CANotBefore)
In other words, usually ThisUpdate field value is CurrentTime minus ClockSkewMinutes (10 minutes by default). However, there is an exception when CA certificate is renewed. In this case, CurrentTime minus ClockSkewMinutes may occur prior to CA certificate validity. In this case, ThisUpdate field value equals NotBefore value of the CA certificate.
Next CLR Publish
This is a non critical extension (optional), which means that it is not mandatory for the application to consume it. This indicates the date and time when a CA will publish a new CRL. When a Windows computer uses a CRL for certificate verification, it also examines the Next CRL Publish extension. If the Next CRL Publish date is already in the past, it connects to the CRL distribution points (referenced in the certificate) and attempts a download of a newer CRL.
The time after the Next CRL Publish and before the Next Update is a buffer time to allow Windows computers retrieval of a CRL before the CRL has actually expired, and a buffer for you to recover a failed CA.
NextCRLPublish (Base CRL) = MinimumOf (CurrentTime + CRLPeriod, CANotAfter)
NextCRLPublish (Delta CRL) = MinumumOf (CurrentTime + CRLDeltaPeriod, CANotAfter)
Note: There is a feature called CRL Prefetching that allows certificate consumer to look at the Next CRL Publish extension and get newer CRLs in case they are available, that is the time between Next CRL Publish and Next Update. The way how CRL Prefetching work is beyond the scope of this blog post, but it is worth knowing that if the CRL is locally cached, and under certain conditions, download of new CRL might be skipped, even if Next CRL Publish date is already in the past.
If CRLDeltaPeriod is equal to zero, Delta CRL is not published. CRL cannot be valid after CA certificate expiration.
Mandatory field. The date and time that a Windows client considers as the expiration date of the CRL. From an operational viewpoint, this is the most critical information. If this date passes, Windows computers will invalidate certificates that are checked against this CRL. You have to recover a failed CA before the date specified in this extension.
NextUpdate (Base CRL) = MinimumOf(NextCRLPublish + InterimBaseCRLOverlap, CANotAfter)
NextUpdate (Delta CRL) = MinimumOf(NextCRLPublish + InterimDeltaCRLOverlap, CANotAfter)
It is so important to ask you self “What if my CA is down?”. Planning your SLA for certificate authority is so important, and I hope this post helped you figure out how to set your SLA for certificate authority and your whole PKI deployment.