How to design a PKI hierarchy
Design a PKI hierarchy
I want to share with you my experience in how to design a PKI hierarchy, and what things to look for when you planning to do so. PKI implementations are 90% planning, and 10% implementation and maintaining. Saying that, you should consider putting a lot of time and effort planning a PKI hierarchy.
We will assume that we are planning a PKI hierarchy for a company called Contoso, and go through each of the planning steps. Facts in this example are put to help you figure out how to map this to your own scenario.
- CA management strategy implemented by Contoso Corporate. Since centralized IT administration is implemented in Contoso IT environment, a central team in head office will define the PKI design.
- Prior existing of PKI in Contoso network. There is an already single enterprise CA in Contoso network, mainly used for Microsoft office live communicator server.
Applications that use PKI
Before we dig deep on how to design a PKI hierarchy, we need to seek the need. PKI deployment is typically launched when one or more applications that are dependent on the existence of a PKI are introduced. This leads to defining requirements as to who will manage the applications, number of users, certificate distribution, and how certificates are used by the applications.
PKI enabled Applications in Contoso International Network are:
- 802.1x port-based authentication.
- Digital signatures.
- Encrypting File System (EFS).
- Web authentication and encryption.
- IP security.
- Smart card logon.
- Secure e-mail.
Holders of Certificates
Certificate holders are those who request and require certificates. Certificate holders are:
- Network Devices.
- Security requirements: CA hierarchy must enforce Contoso security policy. Since there are many partners involved, CA hierarchy should also enforce any security requirements for external partners. This is done by implementing high security measures such as physical security and O.S hardening.
- Administration requirements: centralized administration requires centralized CA servers on the root active directory domain of Contoso. Management tasks will be handled by the IT team in the main office.
- Availability requirements: installing many CA servers and maintaining certificates templates on each server, increase the availability of certificate issuing services. Spreading CA server in different geographical locations also prevents a natural disaster hitting one data center to affect other CA servers.
Business requirements for designing PKI hierarchy include internal and external access requirements, availability requirements, and legal requirements.
- External access requirements: to issue certificates to partners, at least one CA should be accessible from the internet. External publishing mechanism can be used to implement web publishing and to authenticate partners with active directory. Publication of the certificate revocation lists (CRL) and CA certificates should be taken into consideration for external certificates validation.
- Legal requirements: certificate authorities should inform certificate holders and requesters about any legal requirements and obligations for certificate use of issued certificates. By defining Certificate Practice Statements (CPS), Contoso international can define legal requirements for certificate enrollment, use and revocation. CPS can be used also to define liability of Contoso in the event of a beach of security.
Define Legal Requirements
That is, defining Contoso legal requirements for issuing certificates that are issued by its certificate authorities. legal requirements are published in Contoso security policy, certificate policy, and certificate practice statement (CPS). CPS must be published on all certificate authorities and made available to all users and computers that require certificates from Contoso PKI.
CA Hierarchy Design
Since Contoso technical support is centralized, and due to the nature of separating Contoso offices to child domains under the forest root domain, CA servers will be hosted on the root forest domain and will issue certificates to certificate recipients hosted on child domains.
CA server’s location will be centralized in Contoso data centers (Hierarchy by location) because of:
- Each region is connected to Contoso data center.
- Business requirements for CA availability in the event of disaster in one data center.
- Localize distribution, management and enrollment of certificates
Defining PKI Management Staff
- CA administrator. Responsible for managing the configuration of the certification authority computer, including defining the CA’s property settings and certificate managers. A user is delegated this role through the assignment of the Manage CA permission at the CA. This rule is given to the infrastructure team in Contoso head office.
- CA officer. Responsible for certificate management. Also known as the certificate manager. Tasks include certificate revocation, issuance, and deletion. In addition, the certificate manager extracts archived private keys for recovery by a key recovery agent. A user is given this role through the assignment of the Issue and Manage Certificates permission at the CA. This rule is given to the infrastructure team in Contoso head office.
- Backup operator. Responsible for the backup and recovery of the CA database and CA configuration settings. A user is delegated this role through the assignment of the Back Up Files and Directories or the Restore Files and Directories user rights at the Group Policy Object (GPO) assigned to the certification authority server. This rule is given to the IT operations team in Contoso head office.
- Auditor. Responsible for defining the events generated on each certification authority server and for reviewing the security logs for events related to PKI management and operations. A user is given this role through the assignment of the Manage Auditing and Security Log user right at the GPO applied to the certification authority server. This role is given security team.
Defining PKI Hardware Resources
Learning how to design a PKI hierarchy involves planing hardware resources. Hardware resources depend on the level of security required and the type/number of PKI applications involved.
Storing the CA keys on Hardware Security Modules (HSM) is the most secure way, but also most expensive ways. At Contoso, CA keys will be kept on the local computer store, thus eliminating extra cost and providing satisfying level of security.
Offline Root CA is to be hosted on a virtual machine, thus eliminating the need to purchase a new hardware for it. The initial implementation of the Contoso PKI project consists of an offline root CA hosted on a virtual machine and another online Policy/Issuing CA server. All CA keys are stored locally by choosing Microsoft CSP to generate the keys.
In summary, hardware resources for the initial implementation are the online issuing CA server hardware since the Offline Root CA will be hosted in a virtual machine. No other issuing CA servers will be installed in the first phase of the PKI implementation.
Number of Tiers
how to design a PKI hierarchy involves knowing your PKI tears. Contoso IT team has decided that a two-tier PKI topology is the most suitable for their organization. A two-tier hierarchy consists of an offline root CA and one or more issuing CAs. The issuing CA will hold the role of issuing and policy certification authority.
To ensure security in a two-tier hierarchy, root CA is deployed as a standalone root CA. This allows an organization to deploy the root CA offline—that is, the CA is removed from the network to provide the computer with additional security layer.
In a multi-tier CA hierarchy, it does not matter which second-tier CA issues the certificates to computers, users, services, or network devices. All that matter is that the certificate issued by the second-tier CA chains to a trusted root CA—the offline root CA in this configuration.
To enhance the availability of certificate services, two or more issuing CAs must exist at the second tier. This prevents certificate services from being unavailable due to a single point of failure. The number of issuing CAs depends on the organization’s requirements. For example, a CA hierarchy can have different CAs for each geographic region, each sector or business unit, or each identified certificate policy used to validate a certificate’s subject. In the initial implementation, one issuing CA will be installed. Later, other issuing CA servers will be installed.
Choosing an Architecture
In this section of how to design a PKI hierarchy, I will be talking about choosing your architecture, placing your CA server and how to think of validity periods. The more certificates distributed, the more CAs required. A common design places issuing CAs at major hub sites in the network topology to provide regional site availability. This also provides high availability for certificate services, since clustering is not supported for CA servers at the time of writing this post. Having multiple Issuing CAs ensures that the organizations can still issue certificates in case one of the issuing CAs is down. Separating the issuing CAs in geographically separated hubs ensures that a big disaster hitting one hub will not bring the whole PKI system down.
Determining Certificate Validity Periods and renewal strategy
A certificate has a pre-defined validity period that consists of a start date/time, and an end date/time. An issued certificate’s validity period cannot be changed after certificate issuance. Determining the validity period at each tier of the CA hierarchy, including the validity period of the certificates issued to users, computers, services, or network devices, is a primary step when defining a CA hierarchy.
The recommended strategy for determining certificate validity periods is to start with the certificates issued to users, computers, services, or network devices by issuing CAs. The main point to remember is that a CA should not issue a certificate that exceeds the remaining lifetime on the CA certificate (actually the CA will NOT issue certificates with validity period that exceeds the remaining lifetime on the CA certificate). Although allowed by the standards, this scenario can lead to certificates with remaining validity periods to expire when the issuing CA’s certificate expires. It should be ensured that CA has enough remaining lifetime on its certificate to issue certificates with the required validity periods. A good rule of thumb is to make the CA certificate validity period at least twice the validity period of any CA-issued certificates.
In addition to doubling the validity period, we can also follow best practices and ensure that the CA renews its CA certificate value at half of the remaining validity period. The first time we renew an issuing CA certificate (after a period of five years in this scenario), we renew it with the original key pair. After the next five years pass, we renew the CA certificate with a new key pair. This ensures that the same key pair is never used for a period longer than the intended original validity period of 10 years.
Note: Setting the validity period for issued certificates on an issuing CA will determine the maximum validity period for any issued certificate, but this does not mean that any issued certificate will have such maximum validity. CA administrator can define certificate templates that has shorter certificate validity periods.
In this example, an issuing CA has certificate (version 1) valid from t=0 to t=10 and pair of public and private keys called [key pair x]. A CA will renew its certificate (version 2) with the same [key pair x] with validity between t=5 to t=15. CA will renew its certificate again (version 3) with new [key pair Y] with validity between t=10 to to=20.
It is also important to mention that the CA will issue certificates with validity of 5 years which is the half period of the issuing CA certificate validity period.
Let us say that a user got a certificate on t=3, it will be signed with CA [key pair x] and will be valid till t =8. During the period (t=3 to t=8), the certificate can be used since the CA certificate that issued it (version 1) is valid until t=10.
Let us assume also that a user got a certificate on t=7, it will be signed with the CA [key pair x] and will be valid till t =12. During this period (t=7 to t=12), the certificate can be used since the CA certificate that issue it (version 2) is valid until t = 15.
It is important to remember that the CA when renewing its certificate, it keeps the old certificate in its store for the certificates issued with the old one to continue being used during their validity.
In the case of a standalone CA, the definition of a validity period defines the validity period for all CA-issued certificates. In the case of an enterprise CA, the maximum validity period acts as a maximum value for any CA-issued certificates. An issued certificate is always assigned the lesser value of the remaining validity period of the CA certificate and the configured maximum validity period. In other words, if you define the maximum validity period as four years and the CA only has three years remaining in its certificate’s validity period; the validity period of a newly issued certificate is three years.
In the case of an enterprise CA, another variable enters the picture. Enterprise CAs issue certificates based on certificate templates. Each certificate template has its own configured validity period. The applied validity period for certificates issued by an enterprise CA is the minimum value of the CA certificate’s remaining validity period, the CA’s maximum validity period setting, and the certificate template’s validity period.
Let me explain more here, suppose that the CA certificate will expire in 3 years, and the CA is configured via (Certutil.exe utility) with validity period for issued certificate equals to 4 years , and a certain certificate template is configured with validity period of 5 years .The resultant is an issued certificate with 3 years validity!
Part of how to design a PKI hierarchy, is to understand how CRL files are distributed and consumed. When any application, user, or process validates a certificate, it makes sure that the certificate does chain to a root that is trusted on that system, is time-valid, and contains the specific functional capabilities for which it is being presented. If these checks pass, the certificate is considered valid.
Certificate Revocation list or CRL is checked to make sure certificates that otherwise would be considered valid, have not been revoked. CRL itself is a file, signed by a CA, which contains a list of revoked certificates. The list defines the revoked certificates by serial number, and it includes the revocation date in addition to the reason for revocation. An application that is performing certificate validation, determines whether or not it checks for revoked certificates as part of its validation process. When an application checks for revoked certificates, it retrieves the current CRL from one of the URLs specified in the CRL disruption points [or CDP for short] extension of the certificate being validated. After the CRL is retrieved, it is typically cached until its expiration. After a CRL is cached, the application performs further revocation checks against this cached copy, eliminating the need to retrieve the CRL for each revocation check. When expired, CRL itself becomes invalid, forcing the download of a new CRL
The following protocols can be used when defining publication points:
- LDAP URLs
- HTTPS URLs
- FTP URLs
- File URLs
The decision as to which protocols to implement for CRL or CA certificate publication depends on the frequency at which URLs are published, the protocols allowed to traverse network firewalls, and the network’s operating systems. To ensure maximum availability, the urls should be ordered so that the most common protocol used for CRL or CA certificate retrieval is listed first in the CDP extension. Other protocols are then listed in their order of usage. Usually, certificate-chaining engine will try to access the first url in the list and will time out after 30 seconds, then will move to the next urlin the list. So, the ordering is very important.
Since the PKI solution to be implementing is aimed to serve the internal network in Contoso, LDAP url will be listed first and then an http url. External accessible http url should be implemented to meet possible external (partners) needs.
- CRLs should not be published to Active Directory when the CRL publication period is shorter than the replication convergence time for the Active Directory forest. Because of this, CRLs publication period should be longer than one day.
- ldap:///CN=,CN= means the closest domain controller.
For Issuing CAs, it is a recommendation to have different CRL life time depending on the recipient of the certificate. For example, some organizations will configure the CRL lifetime to be 24 hours for the user CAs, and one week for the e-mail and computer CAs.
The purpose of the differential was to balance the need for timely CRL updates if a certificate needed to be revoked, while minimizing the performance impacts of frequent CRL retrieval and directory replication.
With the revocation of a user authentication certificate, some companies want the revocation status to take effect as quickly as possible. Because a CRL is cached until it expires, short expiration would ensure timely CRL updates that would reflect current revocation status more quickly.
This works fine if there is a separate issuing CA for users and another for another certificate recipient. But in Contoso network, the same CA will issue certificates for all certificate recipients. Because of this, the lifetime for CRL will be a week for the issuing CA. To solve the issue of the recommendation of having low value CRL lifetime for user certificates, user accounts will be disabled instead of depending on the revocation of certificate.
CRL publishing for the offline CAs is a manual process. Delta CRLs will be disabled on Root CA as a best practice and will be enabled on the issuing CA servers to avoid downloading the whole Base CRL, and to have more recent revocation information. Clients will start by downloading a Base CRL and then will download delta CRLs when published until the next Base CRL is published.
Delta CRL publication period will be every two days on issuing Cas, and disabled for the root CA. Thus, two Delta CRLs will be available between two Base CRL publications.
CRL Lifetime overlap
A CRL has an established lifetime, and a new CRL must be published before the old CRL expires. There is a buffer included in the CRL publication interval to define a specific amount of time for which existing CRLs will remain valid after the next scheduled CRL publication. The purpose of this overlap period is to provide time for manual publication and replication of the newly created CRL prior to the expiration of the original CRL, and to avoid leaving a gap in the availability of a valid CRL. The default overlap period is 10 percent of the CRL lifetime period, with a maximum of 12 hours.
Determining Certificates Key Length
Planning key length of all certificates involved in your PKI solution is a key thing when learning how to design a PKI hierarchy. Normally, a key size of 4096 would be recommended for security reasons, especially for the root CA. However, this may create all sorts of incompatibility problems with for example Cisco based network products (depending on what version of Cisco IOS is being used), since many 3rd party products have problems handling key sizes larger than 2048. And since network equipment can be integrated in solutions such as 802.1x for validation and compliance, key size will matter.
For this reason the Key length will be as follow:
- RootCA : 4096
- Issuing /Policy : 2048
Another key factor on how to design a PKI hierarchy, is planning how to revoke certificates. Before certificates are enrolled, PKI management team should know how to revoke certificates. Any certificate (except the root CA certificate itself) should have a pointer to a valid CRL. The CRL distribution point is included in the certificate’s extension and cannot be modified after a certificate is enrolled.
If an application is going to verify a certificate against the CRL and no valid CRL is available, the revocation check does not work and the certificate cannot be used for the transaction. If the application has properly implemented CRL checking, no authentication, encryption, or signing is allowed with this certificate until a valid CRL is available again.
The following Points should be verified:
- CDPs (CRL Distribution Points) should be modified on an offline CAs. It is a common mistake to not modify the default CRL distribution point of an isolated stand-alone CA. Because a root or intermediate CA is typically disconnected from the network, PKI-enabled clients cannot validate the issued certificates against the default CRL distribution point on the CA server. To make a CRL of an offline stand-alone CA publicly available, CRL should be manually published. An online CA on a computer that is joined to an Active Directory domain or forest automatically publishes the CRL to Active Directory so that it can be accessible through LDAP. Alternatively, the CRL can be made available through an HTTP URL that points to a location on a Web server.
- Relative LDAP CDPs are to be used instead of Fully Qualified LDAP CDPs. Depending on certificate types that are issued with a CA, the order of the CRL distribution points is important. For authentication certificates, it is beneficial to have a CRL or fully-qualified LDAP CRL distribution point as the first entry in the list of distribution points. If a relative LDAP CRL distribution point is specified, a client contacts the domain controller that is closest, according to the Active Directory site structure, to get the CRL. Fully-qualified LDAP CRL distribution points eliminate latency issue that may occur until the CRL has been replicated in Active Directory. For non-authentication certificates, you may want to use LDAP because LDAP is more fault-tolerant in an Active Directory environment compared to tolerance in a single-instance HTTP server. Because of the nature of Contoso network and firewall rules applied, relative LDAP CDPs are to be used.
- HTTP CRLs are to be used next. It is also an option to set both an LDAP and HTTP CRL distribution point URL to support clients that are Active Directory-aware, as well as clients that are not running Windows and that are not Active Directory-aware. This is also important in situation of mutual project with a partner that need to access a PKI enabled application at our side.
- CA Root Certificate Revocation. Since a root CA certificate has no parent CA that could maintain the CRL, there is no need to specify a CRL distribution point for the root CA certificate itself. To revoke a root CA, all certificates that have been issued by the root CA must be revoked instead. The following Points should be noticed:
- Offline CAs must continue to publish CRLs.
- A root CA certificate should have an empty CRL distribution point.
- Publishing CRLs. If certificates are exchanged with external entities, CRLs must be available at a location that is accessible for all internal and external entities. To satisfy this requirement, in this case the CRL is usually published in the organization’s perimeter network.
Part of how to design a PKI hierarchy, is to understand the security measures of your solution. Different approaches may be applied through physical or technical protection techniques as described below:
- Keeping the Root CA disconnected from the network. Root CA will be hosted in a Virtual server. The virtual machine image will be kept on secure protected place in Contoso main data center with a copy in the disaster recovery data center.
- Disable any un-needed services hosted on the Root CA server.
- Use a dedicated server to host the offline Root CA.
Enrollment strategy is also an important thing while learning how to design a PKI hierarchy.Different enrollment strategy is to be implemented depending on the certificate template and purpose of the certificate to be issued. To reduce the total cost of ownership, automatic enrollment for active directory users is the method to be used for user certificates. This means that the issuing policy is depending on the user active directory credential to identify the subject of the certificate requester. This is the initial design measure to be implemented for enrollment strategy.
CRL partitioning is another main reason why administrators often renew an issuing CA certificate. When a CA certificate is renewed, the CA will use the new key as well as any unexpired previous keys corresponding to previous certificates when generating revocation information. Therefore, a CA may be using multiple keys at the same time and will publish multiple CRLs corresponding to those keys.
CRL Partitioning is used to reduce the size of Base CRL. To explain more, I will give an example: Suppose the issuing CA:
- At t= 0, CA has current CA certificate named [certificate A] that is valid until coming 5 years (t= 1825).
- At t = 2, CA revokes two user certificates [certificates X and Y] that are signed with Certificate A, and issues a base CRL (0) that contains the serial numbers of [certificate X and Y].
- At t = 10, CA renews its certificate with new key pairs [certificate B]. Since the [certificate A] is still valid, CA will publish two CRLs, one signed with [certificate A] named CRL (1) and one signed with [certificate B] named CRL (2). Note that CRL (1) contains the serial numbers of [certificates X and Y], while he Base CRL (2) is empty.
- At t = 12, CA issues a user certificate named [certificate Z] to a user.
- At t = 15, CA revokes [certificate Z] and issues CRL (3) which contains only the serial number of [certificate Z].
To explain how this algorithm works, suppose the following:
- An application is presented with [certificate X], which is revoked and signed with CA [certificate A]. The application will look at AIA extension of the [certificate X] and will pull [Certificate A]. It will look at the CDP extension on the [certificate X] and will pull CRL Base (1), since CRL Base (1) is signed with [ certificate A]. The application will see that the certificate is revoked and will denied the request. All is working fine.
- An Application is presented with [ certificate Z], which is revoked and signed with CA [certificate B]. The application will look at AIA extension of [certificate Z] and will pull [Certificate B]. Then it will look at the CDP extension on the [certificate Z] and will pull CRL Base (3) since CRL Base (3) is signed with [certificate B]. The application will see that the certificate is revoked and will denied the request.
When the CRL size is too big , renewing the CA certificate with new key pairs will reduce the CRL size. This feature is named (CRL Partitioning ). The CA CRL signed with the new CA certificate only maintain the revoked certificates since the CA key renewal.
90% of any PKI project is planning and designing, while 10 % is implementation. Doing PKI right from the beginning is so important, as later on, it is extremely hard to change things. You should consider putting a lot of effort and time figuring out how to design a PKI hierarchy. I hope this article helped you know what to look at and what things to consider when doing so.