Part 4: On difficulties of certificate revocation

In my last blog, we talked about the generic PKI system framework and how it is adopted to realize vehicular communication, then talked about the needs of revoking certificates under certain circumstances. In this blog, we look at the general revocation issue.

This discussion is going to be more generic revocation issue, not necessarily unique to the V-PKI system for vehicular communication (V2X). But the fundamental aspects are common, thus I do consider it important to be aware of this aspect in the context of V2X also.

As mentioned in the last blog, there are situations where certificates need to be revoked. When a certificate is revoked, this fact needs to be propagated to others who may use it, so that they can correctly recognize that this certificate is no longer valid and should ignore all messages that contains it.

How are certificates revoked?

At the point, the question becomes – how do “others” (certificate users) get to know this situation that a certain certificate has been revoked? There have been multiple solutions to this. I’d say it’s a series of unsuccessful attempts to address this problem. To address problems in the original solution, a follow-up solution was created, which sort of fixed the problems in the previous solution but inadvertently created a new problem as a result. Then another follow-up solution was created to address the problems the previous solution created, but it also created yet another new problem, and on and on… The end result as of today is that the standardized revocation mechanism is incomplete (IMO). Then the whole problem of revocation changed its direction (which is also incomplete), which we talk about later in this blog. It’s one of those cases where a solution works in theory but not in reality.

For much of this discussion, I use internet browsers and TLS session setup with web servers as a use case. It’s rather unrelated to V2X communication, but the fundamentals of the problem and solutions also apply to V2X.

In the revocation checking in internet browsers (Chrome, Edge, Firefox, etc.), the situation ended up quite convoluted as various browser vendors started to implement their own proprietary revocation solutions. Their objectives are the same (make the revocation effective in practice). So they do similar things but do it differently, hence incompatible. Now, browser vendors mostly ignore the standardized solution and have an upper hand in how the revocation is actually done, and it looks like we’re stuck with this situation (their solutions basically embed revocation information and push it to browsers as a software update).

The historical evolution of certificate revocation is shown in the following figure. Detailed discussion of all these can be found here. In this blog, I stay at a high level without going too much into details.

Fig. 4.1 Historical Evolution of Certificate Revocation Mechanisms
Certificate Revocation List (CRL)

The original revocation mechanism is called Certificate Revocation List (CRL). In this solution, CAs regularly generate a list of revoked certificates and make it available to certificate users that may use those revoked certificates.

The problem with the CRL include: (1) scalability, (2) availability, (3) timeliness, (4) certificate-user dependent retrieval:

  • Scalability – CRL can become big. Thus downloading multiple CRLs over time consumes the users’ storage and requires upkeep. It’s simply not scalable. Even if you get a list of revoked certificates, if you don’t access any of the corresponding websites, handling CRL ends up simply a wasted effort.
  • Availability – when you try to contact the CRL server and it’s not reachable or available, then you’re out of luck (“please try again later…” 😮 ).
  • Timeliness – if the CRL content is not up-to-date, then its content becomes questionable. If the user (browser) believes it is up-to-date, it may end up making wrong judgement about a certificates validity. For example, if you download the latest CRL – which so happens that it was generated 3 days ago – it doesn’t reflect whatever happened in the last 3 days since it was generated. If a certain certificate was revoked yesterday, it’s not in the CRL. You’re looking at stale (and inaccurate) information.
  • User-dependency – even if the latest CRL is available, if you don’t download it, you don’t have the latest one. There is also a time lag issue similar to just mentioned above.
Online Certificate Status Protocol (OCSP)

To address the problems of CRL, OCSP was introduced (but only scalability aspect is addressed). What doesn’t change is that certificate users still need to initiate the necessary checking.

In case of browsers, when you try to access a website, it triggers TLS handshake (https). During the handshake, the web server gives you a server certificate. Your browser contacts the OCSP Responder (a fancy name for a server that keeps track of the revocation status of certificates) and asks if the server certificate you just received from the web server is still valid or not. The response is “yes”/”no”/”I don’t know” (“good/revoked/unknown” according to the OCSP spec).

The obvious pro of the OCSP approach is that it solves the scalability issue I mentioned above – you’re now querying the status of a single specific certificate rather than blindly downloading CRL and checking it. But you (browser) are still responsible for initiating the revocation status check for the said certificate. On top of it, it introduces a new privacy problem – since you’re checking the revocation status of a specific website with the OCSP Responder, you’re effectively revealing your browsing behavior. For example, if you’re accessing an adult content website, you’re essentially saying: “Hey Mr. OCSP Responder, I’m trying to access this xxx.com site. Is this certificate still valid?” Then this OCSP responder knows that you’re visiting that website. The responder may not know your name, but it knows your IP address, and it can reveal certain privacy-related info (e.g. location (address) and your browsing behavior) [Note: there are technologies such as VPN or Tor to masquerade your location, but that’s another story…].

Another problem OCSP introduces is that the OCSP interaction between the certificate user and the Responder is a blocking event. In other words, if the Responder doesn’t respond to your request, your browser has to wait. You may end up staring at a blank screen instead of the website content. It’s not a good user experience, to say the least (and I’d imagine browser vendors care about this kind of negative experience as it can impact negative user perception and eventual market share). To address this, browser vendors introduced a logic to set a timer and ignore this “no response” condition, which is called “soft failure.” In soft failure, the browser times out waiting for the Responder‘s response. Then it considers everything is fine and proceed rendering the website on the screen. It’s essentially ignoring the “no response” condition. IMO, it’s a “duh…” moment. :-0 If you’re going to ignore the negative result, why bother checking to begin with? But unfortunately, browser vendors seem to care more about the user experience rather than security mechanisms (or so I think).

OCSP Stapling

As such, OCSP solution created a new problem while addressing a certain problem in the CRL. So the next solution to this situation is what’s called OCSP stapling. As the name implies, it’s a variation of the original OCSP discussed above.

The twist in this staple solution is to address the privacy-related issue mentioned above. It’s done by the certificate owner (e.g. a web server in case of internet browsing) to voluntarily attach the OCSP Responder‘s check result along with the certificate during the TLS session establishment. This way, you (browser) don’t need to ask the revocation status with the Responder, thus you don’t voluntarily reveal your browsing behavior anymore. This stapling is an analogy that the certificate owner “staples” the certificate and its revocation check result together and gives them to the certificate user (you / browser). To put it another way, OCSP Stapling turned the table around and made the certificate owner responsible for running the certificate’s status checking with the OCSP Responder.

OCSP Must-Staple

To go one step further, OCSP Must-Staple option requires that certificate holders (web servers) support OCSP staple solution discussed above. It’s essentially mandating web servers to carry the burden of doing OCSP checking with the Responder, rather than having certificate users (you/browser) do that work in the original OCSP. As you may imagine, it’s more work for web hosts. It’s probably not surprising to say that the traction of this solution in web servers is not so great.

The Fundamental Issue of Revocation

After discussing all these historical revocation mechanisms, the fundamental issue remains unaddressed. It’s the whole idea of revocation check itself because it is a blacklist-based approach. In revocation checking, you ask a question: “Is this certificate revoked?” The answer can be yes or no. If the answer is yes, then it IS revoked. No ambiguities at all. However, if the answer is no, does that mean that it is still valid? – not necessarily. This way of asking a question introduces ambiguities when you get a negative response. Even if you get a negative response, the one (CRL or OCSP Responder) that says so may not know that it’s actually revoked – it’s the timing issue we discussed earlier in the CRL part. So asking a question to somebody without being certain that he/she knows the real (up-to-date) answer only creates doubt. On top of it, how are you expected to react when you get an “I don’t know” answer from OCSP Responder? It’s not hard to imagine it’s necessary to cover conditions that end up in that situation (e.g. network problem / server unreachable). It doesn’t look to be a good protocol design practice in any case (IMO).

Shift Toward Short-Term Certificates

After this series of trials and errors, standard body decided to change direction on this whole revocation issue. Instead of proactively trying to determine the validity status in real-time (or semi-real-time), the new direction is to set the certificate validity period be short so that they will naturally expire soon. In TLS certificates, the dominant validity period is 90 days. And the idea is to make it down to a few days (~4 days). This 4 day period is equivalent to the validity of OCSP response. So we can say that this short-term certificate tries to replicate the OCSP behavior without having proactive checking.

This idea is called “Short-Term, Automatically Renewed” (STAR), which is based on “Automatic Certificate Management Environment” (ACME) mechanism, which enables automating the issuance of certificates to web servers.

So the idea is this: if something bad happens and a certificate needs to be revoked, then just “don’t worry and be happy.” If you sit around for maximum of ~4 days, that certificate will naturally expire and go away. Problem solved! You don’t need to do anything to proactively revoke those certificates. It’s a very simple idea, and it actually sounds like a good solution. After all, we can move away from the whole revocation business once and for all.

But the implication of shortening the validity period means that CA has to issue many more certificates in much shorter cadence (90 days to 4 days). It’s simply more work for CA to generate many more certificates and for certificate owners to transfer, store, use, and discard them. Certificate users (browsers) also need to discard the expired ones every time it receives a new one from the certificate owners. If it’s a real-world physical materials, then this is pretty much analogous to creating many more garbage over time.

Because it forces CAs and certificate owners to do more work, it’s probably not surprising to see that this short-term certificate idea hasn’t gained much traction in reality as discussed in this paper mentioned earlier.

Certificate Usage and Handling in V-PKI (V2X Comm.) System

V-PKI system has adopted this short-term certificate concept discussed above. Due to the lack of traction in web traffic (TLS) usage, V2X communication may be the first system that actually implements this concept.

In V-PKI/V2X system, V-PKI generates and issues multiple short-term certificates, which are valid at the same time within a given period (some literature suggests examples such as 20 certificates valid for 7 days). This allows vehicles to periodically switch certificates during their validity periods. One possible usage scenario is for a vehicle to use a certificate and switch to another one in the order of minutes and repeat that process. Then after a week (7 days), the vehicle obtains a new set of certificates from the V-PKI system, and this cycle continues.

The idea behind this short-period certificate recycling in V2X is to protect users’ (vehicle owners) privacy by making it difficult to identify and track vehicles over time and space. As each certificate has its own unique ID unrelated to others, doing so can prevent the identification and tracking of the vehicles (owners and their locations, whereabouts, their mobility patterns, etc.). An analogy would be to have multiple face masks and keep changing them. Then you won’t know that me (wearing a mask) and me (wearing another mask) cannot be identified as the same person. One simple example is that the location where your vehicle stays overnight is the place you live, and the location it is during the day on week days are likely the place of your work. If V2X communication makes it easier for someone to know your house and work, then it’s certainly a privacy concern.

Regarding certificate revocation in V2X, as mentioned in the previous section, ones issued to certificate owners (vehicles) are not revoked – they are simply left alone to naturally expire after the short-term validity period ends. In case of events such as positive detection of vehicle’s misbehavior, V-PKI system refuses to issue certificates in the next cycle. This way, this vehicle in question is evicted from the V2X communication channel.

These misbehavior detection and privacy protection mechanisms are topics in themselves, which I plan to write separate blogs in my future blogs. Specifically, I’m going to write about my thoughts on misbehavior detection.

Part 3: Security solution – 2 (vehicle-PKI (V-PKI) system)

In this blog, we talk about the V-PKI system, which is the back-end system for issuing digital certificates to vehicles.

In my last blog, we talked about digital certificate and how it is used to guarantee the sender’s authenticity and content integrity, in combination with signature algorithms and public key cryptography. At the end of that discussion, I raised the question of how the receiving end (i.e. Bob) obtains the digital certificate to begin with. In this blog, we talk about the mechanism behind it.

It should be noted that digital certificates are used for many different purposes, including establishing session key to encrypt web traffic (“https” or “TLS”). In fact, V2X communication simply reuses this framework and define a system based on it.

Public key infrastructure (PKI)

In short, Public Key Infrastructure (PKI) is the back-end system that manages digital certificates. The word “manage” involves generation, delivery, distribution, use, revocation, and other related functionalities. It’s conceptually similar to CRUD (Create, Read, Update, Delete) in database operations. It’s sort of a lifecycle management system for digital certificates. As the name implies, PKI is an infrastructure (consisting of hardware, software, procedures/protocols, policies/rules, underlying communication, etc.) that facilitates the use of digital certificates.

A PKI system consists of multiple entities called Certificate Authorities (CAs). They are organized in a hierarchical manner, with the highest one called Root CA (RCA). One level below the RCA is one or more Intermediate CA (ICA), and the next level below is called Issuing CA, which generates certificates to the end entity (certificate users). The figure below shows this hierarchical concept.

Fig.3-1: CA hierarchy

Certificates issued by an entity at one level guarantee the authenticity of the entity one level below. For example, a certificate the RCA (issuer) issues to the ICA guarantees the authenticity of the ICA (certificate holder). Similarly, this chain of guarantee is propagated to the next-level, all the way down to the end entity, which are web servers in the case of websites (TLS/SSL). The trustworthiness of the RCA is known a priori, which is the essential starting point of this PKI concept. This propagating nature of trust bounded by the certificate is called chain of trust. To put it another way, a PKI system is essentially a mechanism that binds the public key in a certificate to its owner.

V-PKI systems

For vehicular communication (V2X), this generic PKI framework is custom-tailored to meet the purpose to issue certificates to vehicles. In this case, intermediate CA is not explicitly defined. There are two entities: Registration Authority (RA) and Authorization Authority (AA). As V-PKI system entities, both RA and AA are issued their own certificates by the RCA to guarantee their authenticity.

Out of RA and AA, the latter is the issuing CA as it is the one that issues certificates to vehicles at the time of registration to the V-PKI system. The RA is responsible for registering end entities (vehicles, in this case) to the V-PKI system, through the registration procedure as defined by the signaling protocol specified by the ETSI ITS standard. As mentioned, the AA actually issues certificates to vehicles after the successful registration with the RA. The following figure is an ETSI-defined V-PKI architecture, which illustrates this relationship.

Fig. 3-2 V-PKI Architecture (ETSI ITS)

The equivalent V-PKI architecture is defined by IEEE 1609.2. This serves as the base for the ETSI C-ITS architecture. As shown in the figure below, the IEEE 1609.2 architecture defines more system entities with more complex interactions among them. The fundamental idea behind this more complex architecture is to ensure privacy and anonymity of vehicles in such a way that no system entities can identify vehicles (and their owners) by itself without colluding with another entity. This is achieved by splitting up functionalities and limit information visibility/accessibility.

Fig. 3-3 V-PKI Architecture (IEEE 1609.2)

Misbehavior detection and certificate revocation

Aside from issuing certificates to end entities – as a part of managing certificates – another important functionality that the PKI system serves is the revocation of certificates. Certificates are revoked under multiple reasons. In the case of webhost, its certificate can be revoked when the web server is compromised, the website itself is disbanded, the company behind the website goes out of business, or ceases to exist due to a business acquisition, etc.

In case of V-PKI system, certificates issued to vehicles need to be revoked when a vehicle is compromised by an adversary. A compromised vehicle can potentially become the cause of negative consequences to the road safety by transmitting wrong information to mislead other vehicles in the vicinity, etc. In such cases, the affected vehicle needs to be evicted from the communication. It is achieved by revoking the certificates issued to that vehicle.

The receiving entities (vehicles) that are aware of the revocation status of a given vehicle can correctly detect and ignore all messages transmitted by the vehicle that owns the revoked certificates. This way, surrounding vehicles can protect themselves against potentially misleading or wrong information transmitted by this vehicle.

At least this is the theory. However, in reality, it is not so simple. In our next blog, we go a bit deeper about this difficulty.

Part 2: Security solution of V2X (1) – digital certificates

In this blog, we’re going to talk about the security solution of V2X communication.

In my last post, we talked about the background of the V2X communication and the reason why basic services are broadcast in clear (i.e., sent to any and all recipients within the communication range without encryption). This doesn’t sound like much of a protection — in fact, there’s no protection at all. After all, any device (legitimate or not) capable of receiving messages in that communication channel can receive, store, and analyze them, either in real time or later on. Furthermore, any device that can transmit in that channel can do so, irrespective of whether those entities and messages are legitimate or not.

In other words, basic messages are wide open to eavesdroppers and adversaries who send bogus messages.

This sounds bad from a security point of view. Thus, it requires some sort of assurance, at minimum, that messages are: (1) transmitted by genuine vehicles, and (2) their contents are genuine. With this, receiving entities can at least verify the authenticity of the sender and the integrity of those messages.

The solution to achieve that is to use digital signatures issued by public key infrastructure (PKI). To put it in another way, the only security solution in the basic services in V2X communication is verifying these 2 points only.

Of course, there are service in V2X above and beyond the basic services. These services include such as remote driving, platooning, etc. For these services, either unicast or multicast can be used. In fact, remote driving is probably one of the few use cases in V2X where unicast is used. In those cases, encryption can be applied to guarantee the confidentiality of the communication. Even in those cases, message integrity through the use of digital signature is an important element to the security of V2X communication.

Digital signature

Digital signature ensures that the message is unaltered (i.e. messages are not tampered). It uses public-key cryptography (PKC) such as the elliptic curve digital signature algorithm (ECDSA).

PKC involves the use of two keys, a public key and a private key. They are like two sides of a mirror. When used in encryption, the key used in encryption and decryption is different. The same principle applies to signature generation: the key used to generate a signature and the one to verify it are different. The following figure illustrates this operation.

Fig.2.1 Signature Generation and Verification

In this figure, Alice sends some message to Bob. After generating a message to send, Alice generates a signature using the message and the private key (often denoted as sk as “secret key”) as the inputs to the signature generation algorithm, such as ECDSA. Alice appends the resulting signature to the message and send them to Bob.

Upon receiving this message (incl. the signature), Bob extracts the message part and, along with the public key, he inputs them to the algorithm. The resulting signature is denoted as signature’. Finally, Bob compares the received signature (attached to the received message) against the calculated signature’. If they match, Bob can assure that the received message is not tampered.

So far so good. But now the question becomes how Bob obtained Alice’s public key to begin with. That’s where PKI comes in, which probably takes a blog by itself. We’ll discuss it in the next blog.