Part 4: On difficulties of certificate revocation

In my last blog, we talked about the generic PKI system framework and how it is adopted to realize vehicular communication, then talked about the needs of revoking certificates under certain circumstances. In this blog, we look at the general revocation issue.

This discussion is going to be more generic revocation issue, not necessarily unique to the V-PKI system for vehicular communication (V2X). But the fundamental aspects are common, thus I do consider it important to be aware of this aspect in the context of V2X also.

As mentioned in the last blog, there are situations where certificates need to be revoked. When a certificate is revoked, this fact needs to be propagated to others who may use it, so that they can correctly recognize that this certificate is no longer valid and should ignore all messages that contains it.

How are certificates revoked?

At the point, the question becomes – how do “others” (certificate users) get to know this situation that a certain certificate has been revoked? There have been multiple solutions to this. I’d say it’s a series of unsuccessful attempts to address this problem. To address problems in the original solution, a follow-up solution was created, which sort of fixed the problems in the previous solution but inadvertently created a new problem as a result. Then another follow-up solution was created to address the problems the previous solution created, but it also created yet another new problem, and on and on… The end result as of today is that the standardized revocation mechanism is incomplete (IMO). Then the whole problem of revocation changed its direction (which is also incomplete), which we talk about later in this blog. It’s one of those cases where a solution works in theory but not in reality.

For much of this discussion, I use internet browsers and TLS session setup with web servers as a use case. It’s rather unrelated to V2X communication, but the fundamentals of the problem and solutions also apply to V2X.

In the revocation checking in internet browsers (Chrome, Edge, Firefox, etc.), the situation ended up quite convoluted as various browser vendors started to implement their own proprietary revocation solutions. Their objectives are the same (make the revocation effective in practice). So they do similar things but do it differently, hence incompatible. Now, browser vendors mostly ignore the standardized solution and have an upper hand in how the revocation is actually done, and it looks like we’re stuck with this situation (their solutions basically embed revocation information and push it to browsers as a software update).

The historical evolution of certificate revocation is shown in the following figure. Detailed discussion of all these can be found here. In this blog, I stay at a high level without going too much into details.

Fig. 4.1 Historical Evolution of Certificate Revocation Mechanisms
Certificate Revocation List (CRL)

The original revocation mechanism is called Certificate Revocation List (CRL). In this solution, CAs regularly generate a list of revoked certificates and make it available to certificate users that may use those revoked certificates.

The problem with the CRL include: (1) scalability, (2) availability, (3) timeliness, (4) certificate-user dependent retrieval:

  • Scalability – CRL can become big. Thus downloading multiple CRLs over time consumes the users’ storage and requires upkeep. It’s simply not scalable. Even if you get a list of revoked certificates, if you don’t access any of the corresponding websites, handling CRL ends up simply a wasted effort.
  • Availability – when you try to contact the CRL server and it’s not reachable or available, then you’re out of luck (“please try again later…” 😮 ).
  • Timeliness – if the CRL content is not up-to-date, then its content becomes questionable. If the user (browser) believes it is up-to-date, it may end up making wrong judgement about a certificates validity. For example, if you download the latest CRL – which so happens that it was generated 3 days ago – it doesn’t reflect whatever happened in the last 3 days since it was generated. If a certain certificate was revoked yesterday, it’s not in the CRL. You’re looking at stale (and inaccurate) information.
  • User-dependency – even if the latest CRL is available, if you don’t download it, you don’t have the latest one. There is also a time lag issue similar to just mentioned above.
Online Certificate Status Protocol (OCSP)

To address the problems of CRL, OCSP was introduced (but only scalability aspect is addressed). What doesn’t change is that certificate users still need to initiate the necessary checking.

In case of browsers, when you try to access a website, it triggers TLS handshake (https). During the handshake, the web server gives you a server certificate. Your browser contacts the OCSP Responder (a fancy name for a server that keeps track of the revocation status of certificates) and asks if the server certificate you just received from the web server is still valid or not. The response is “yes”/”no”/”I don’t know” (“good/revoked/unknown” according to the OCSP spec).

The obvious pro of the OCSP approach is that it solves the scalability issue I mentioned above – you’re now querying the status of a single specific certificate rather than blindly downloading CRL and checking it. But you (browser) are still responsible for initiating the revocation status check for the said certificate. On top of it, it introduces a new privacy problem – since you’re checking the revocation status of a specific website with the OCSP Responder, you’re effectively revealing your browsing behavior. For example, if you’re accessing an adult content website, you’re essentially saying: “Hey Mr. OCSP Responder, I’m trying to access this xxx.com site. Is this certificate still valid?” Then this OCSP responder knows that you’re visiting that website. The responder may not know your name, but it knows your IP address, and it can reveal certain privacy-related info (e.g. location (address) and your browsing behavior) [Note: there are technologies such as VPN or Tor to masquerade your location, but that’s another story…].

Another problem OCSP introduces is that the OCSP interaction between the certificate user and the Responder is a blocking event. In other words, if the Responder doesn’t respond to your request, your browser has to wait. You may end up staring at a blank screen instead of the website content. It’s not a good user experience, to say the least (and I’d imagine browser vendors care about this kind of negative experience as it can impact negative user perception and eventual market share). To address this, browser vendors introduced a logic to set a timer and ignore this “no response” condition, which is called “soft failure.” In soft failure, the browser times out waiting for the Responder‘s response. Then it considers everything is fine and proceed rendering the website on the screen. It’s essentially ignoring the “no response” condition. IMO, it’s a “duh…” moment. :-0 If you’re going to ignore the negative result, why bother checking to begin with? But unfortunately, browser vendors seem to care more about the user experience rather than security mechanisms (or so I think).

OCSP Stapling

As such, OCSP solution created a new problem while addressing a certain problem in the CRL. So the next solution to this situation is what’s called OCSP stapling. As the name implies, it’s a variation of the original OCSP discussed above.

The twist in this staple solution is to address the privacy-related issue mentioned above. It’s done by the certificate owner (e.g. a web server in case of internet browsing) to voluntarily attach the OCSP Responder‘s check result along with the certificate during the TLS session establishment. This way, you (browser) don’t need to ask the revocation status with the Responder, thus you don’t voluntarily reveal your browsing behavior anymore. This stapling is an analogy that the certificate owner “staples” the certificate and its revocation check result together and gives them to the certificate user (you / browser). To put it another way, OCSP Stapling turned the table around and made the certificate owner responsible for running the certificate’s status checking with the OCSP Responder.

OCSP Must-Staple

To go one step further, OCSP Must-Staple option requires that certificate holders (web servers) support OCSP staple solution discussed above. It’s essentially mandating web servers to carry the burden of doing OCSP checking with the Responder, rather than having certificate users (you/browser) do that work in the original OCSP. As you may imagine, it’s more work for web hosts. It’s probably not surprising to say that the traction of this solution in web servers is not so great.

The Fundamental Issue of Revocation

After discussing all these historical revocation mechanisms, the fundamental issue remains unaddressed. It’s the whole idea of revocation check itself because it is a blacklist-based approach. In revocation checking, you ask a question: “Is this certificate revoked?” The answer can be yes or no. If the answer is yes, then it IS revoked. No ambiguities at all. However, if the answer is no, does that mean that it is still valid? – not necessarily. This way of asking a question introduces ambiguities when you get a negative response. Even if you get a negative response, the one (CRL or OCSP Responder) that says so may not know that it’s actually revoked – it’s the timing issue we discussed earlier in the CRL part. So asking a question to somebody without being certain that he/she knows the real (up-to-date) answer only creates doubt. On top of it, how are you expected to react when you get an “I don’t know” answer from OCSP Responder? It’s not hard to imagine it’s necessary to cover conditions that end up in that situation (e.g. network problem / server unreachable). It doesn’t look to be a good protocol design practice in any case (IMO).

Shift Toward Short-Term Certificates

After this series of trials and errors, standard body decided to change direction on this whole revocation issue. Instead of proactively trying to determine the validity status in real-time (or semi-real-time), the new direction is to set the certificate validity period be short so that they will naturally expire soon. In TLS certificates, the dominant validity period is 90 days. And the idea is to make it down to a few days (~4 days). This 4 day period is equivalent to the validity of OCSP response. So we can say that this short-term certificate tries to replicate the OCSP behavior without having proactive checking.

This idea is called “Short-Term, Automatically Renewed” (STAR), which is based on “Automatic Certificate Management Environment” (ACME) mechanism, which enables automating the issuance of certificates to web servers.

So the idea is this: if something bad happens and a certificate needs to be revoked, then just “don’t worry and be happy.” If you sit around for maximum of ~4 days, that certificate will naturally expire and go away. Problem solved! You don’t need to do anything to proactively revoke those certificates. It’s a very simple idea, and it actually sounds like a good solution. After all, we can move away from the whole revocation business once and for all.

But the implication of shortening the validity period means that CA has to issue many more certificates in much shorter cadence (90 days to 4 days). It’s simply more work for CA to generate many more certificates and for certificate owners to transfer, store, use, and discard them. Certificate users (browsers) also need to discard the expired ones every time it receives a new one from the certificate owners. If it’s a real-world physical materials, then this is pretty much analogous to creating many more garbage over time.

Because it forces CAs and certificate owners to do more work, it’s probably not surprising to see that this short-term certificate idea hasn’t gained much traction in reality as discussed in this paper mentioned earlier.

Certificate Usage and Handling in V-PKI (V2X Comm.) System

V-PKI system has adopted this short-term certificate concept discussed above. Due to the lack of traction in web traffic (TLS) usage, V2X communication may be the first system that actually implements this concept.

In V-PKI/V2X system, V-PKI generates and issues multiple short-term certificates, which are valid at the same time within a given period (some literature suggests examples such as 20 certificates valid for 7 days). This allows vehicles to periodically switch certificates during their validity periods. One possible usage scenario is for a vehicle to use a certificate and switch to another one in the order of minutes and repeat that process. Then after a week (7 days), the vehicle obtains a new set of certificates from the V-PKI system, and this cycle continues.

The idea behind this short-period certificate recycling in V2X is to protect users’ (vehicle owners) privacy by making it difficult to identify and track vehicles over time and space. As each certificate has its own unique ID unrelated to others, doing so can prevent the identification and tracking of the vehicles (owners and their locations, whereabouts, their mobility patterns, etc.). An analogy would be to have multiple face masks and keep changing them. Then you won’t know that me (wearing a mask) and me (wearing another mask) cannot be identified as the same person. One simple example is that the location where your vehicle stays overnight is the place you live, and the location it is during the day on week days are likely the place of your work. If V2X communication makes it easier for someone to know your house and work, then it’s certainly a privacy concern.

Regarding certificate revocation in V2X, as mentioned in the previous section, ones issued to certificate owners (vehicles) are not revoked – they are simply left alone to naturally expire after the short-term validity period ends. In case of events such as positive detection of vehicle’s misbehavior, V-PKI system refuses to issue certificates in the next cycle. This way, this vehicle in question is evicted from the V2X communication channel.

These misbehavior detection and privacy protection mechanisms are topics in themselves, which I plan to write separate blogs in my future blogs. Specifically, I’m going to write about my thoughts on misbehavior detection.

Part 2: Security solution of V2X (1) – digital certificates

In this blog, we’re going to talk about the security solution of V2X communication.

In my last post, we talked about the background of the V2X communication and the reason why basic services are broadcast in clear (i.e., sent to any and all recipients within the communication range without encryption). This doesn’t sound like much of a protection — in fact, there’s no protection at all. After all, any device (legitimate or not) capable of receiving messages in that communication channel can receive, store, and analyze them, either in real time or later on. Furthermore, any device that can transmit in that channel can do so, irrespective of whether those entities and messages are legitimate or not.

In other words, basic messages are wide open to eavesdroppers and adversaries who send bogus messages.

This sounds bad from a security point of view. Thus, it requires some sort of assurance, at minimum, that messages are: (1) transmitted by genuine vehicles, and (2) their contents are genuine. With this, receiving entities can at least verify the authenticity of the sender and the integrity of those messages.

The solution to achieve that is to use digital signatures issued by public key infrastructure (PKI). To put it in another way, the only security solution in the basic services in V2X communication is verifying these 2 points only.

Of course, there are service in V2X above and beyond the basic services. These services include such as remote driving, platooning, etc. For these services, either unicast or multicast can be used. In fact, remote driving is probably one of the few use cases in V2X where unicast is used. In those cases, encryption can be applied to guarantee the confidentiality of the communication. Even in those cases, message integrity through the use of digital signature is an important element to the security of V2X communication.

Digital signature

Digital signature ensures that the message is unaltered (i.e. messages are not tampered). It uses public-key cryptography (PKC) such as the elliptic curve digital signature algorithm (ECDSA).

PKC involves the use of two keys, a public key and a private key. They are like two sides of a mirror. When used in encryption, the key used in encryption and decryption is different. The same principle applies to signature generation: the key used to generate a signature and the one to verify it are different. The following figure illustrates this operation.

Fig.2.1 Signature Generation and Verification

In this figure, Alice sends some message to Bob. After generating a message to send, Alice generates a signature using the message and the private key (often denoted as sk as “secret key”) as the inputs to the signature generation algorithm, such as ECDSA. Alice appends the resulting signature to the message and send them to Bob.

Upon receiving this message (incl. the signature), Bob extracts the message part and, along with the public key, he inputs them to the algorithm. The resulting signature is denoted as signature’. Finally, Bob compares the received signature (attached to the received message) against the calculated signature’. If they match, Bob can assure that the received message is not tampered.

So far so good. But now the question becomes how Bob obtained Alice’s public key to begin with. That’s where PKI comes in, which probably takes a blog by itself. We’ll discuss it in the next blog.