Understanding Bleichenbacher's CRYPTO'98 Attack on RSA Encryption
In 1998, Daniel Bleichenbacher published a method for cracking RSA encryption with PKCS#1v15 padding, given access to a “padding oracle”. I learned of this attack via the cryptopals challenges, where it is featured in challenge #47 and #48.
The instructions for the challenge state: “We recommend you just use the raw math from the paper and not spend too much time trying to grok how the math works.” However, I was not able to solve the challenge without understanding the math involved. For other software practitioners who want to implement, or at least understand, Bleichenbacher’s chosen-ciphertext RSA attack, this article may serve as a more gentle introduction than jumping directly into the research paper.
Prerequisites first, then the details of the attack will follow:
What is PKCS#1v15 padding?
PKCS #1 is a standard for how to use the RSA algorithm to encrypt data. Version 1.5 was published in 1998.
It specifies the following format for a plaintext data block:[1]
Note that the padding bytes are at the beginning of the block; this will be important later on.
What is a “padding oracle”?
A padding oracle is anything which lets an attacker know whether an arbitrary ciphertext block decrypts to a correctly padded plaintext block, or not. With PKCS#1v15 padding, this means that the first byte must be zero, the second byte must be 2, and so on, as shown in the above diagram.
Of course, the attacker doesn’t have the decryption key, or they wouldn’t need to bother messing around with a padding oracle. The attacker can make up any ciphertext block they want, give it to the oracle, and the oracle tells them: “yes, after I decrypt that, the padding looks OK”, or “no, it doesn’t”. For clarity, the padding oracle must have the decryption key, or it couldn’t answer the attacker accurately.
Again: the padding oracle doesn’t directly tell the attacker what any ciphertext block decrypts to; it just tells the attacker whether the decrypted block has the right padding or not.
The most common type of padding oracle which exists in the real world is a network service which receives encrypted requests from clients and returns a status code or message. To be used as a padding oracle, the service must return a different error code or message for a request which fails because of bad padding, or one which fails for any other reason. At the very least, something about the response must be different between requests with good or bad padding, or it’s not a padding oracle. (…which is a good thing, if you are operating that service! A padding oracle is only useful for people who want to crack encryption.)
Because an attacker can only get a very small amount of information by checking a padding oracle once, they will need to check the oracle many, many times, on many different crafted ciphertext blocks, before they can gather enough information to (potentially) crack the encryption.
Refresh my memory on how RSA encryption works, please?
Why, of course! Look up “modular arithmetic” first if you need a refresher on that. Then, here is a highly abbreviated summary of RSA encryption:
- Find two large primes, and call them and . The math for encryption and decryption will be done , where .
- Take a fixed number (the “exponent”); 3 was frequently used in the past, but now 65517 is most common. and together are the “public key”.
- Find a number which relates in a special way to and (see Wikipedia for details). and together are the “private key”.
- To encrypt: . ( is “ciphertext”, is “message”. It may seem strange to take a power of a message, but remember that the message can be thought of as a very large binary number, and we can take powers of that number just like any other. By the way, the message must be a smaller number than . If it’s too big, it must be broken up into blocks, and then the blocks must be encrypted individually.)
- To decrypt: .
⸻But why does it work out that ? In other words, why are the encryption and decryption formulas inverses?
I’m not going to delve into that, since it is not the point of this article. But if you are curious, consider learning the basics of group theory. You don’t have to go extremely deep; once you grasp basic group theory, it will become easy to understand why the above equation is true. This textbook is recommended: Contemporary Abstract Algebra, by Joseph Gallian (chapters 1-8 are enough to understand the principle of RSA encryption).[2]
Overview of the attack
In short, to crack an RSA ciphertext , the attacker must find a series of ciphertext blocks, derived from , which have correct padding when decrypted (as revealed by the padding oracle). For each such block which is found, the attacker can narrow down the possible values of the original plaintext . When only one possibility remains, then the attacker knows exactly what was; the encryption has been cracked.
In a bit more detail:
- Keep a set of ranges which cover all the possible values of . Call this set . Initialize it to cover all correctly padded messages.
- Find an integer (call it a “multiplier”), where is a correctly padded message.
- Convert to a new set of ranges , which includes all the possible values of for which would be correctly padded.
- Combine and , to make a new set of ranges which may be smaller than both of them:
- If the set includes only one -value, congratulations. That’s the original message. If not, go back to step 2, find a different value of which works, and repeat.
If you want to implement this attack, do these things first:
-
Get set up to compute with bignums. For some programming languages, the standard library will include a suitable bignum library. For others, you will have to find one. You only need the most basic operations: addition, subtraction, multiplication, division, remainder after division (“mod”), modular exponentiation, and comparison.
-
Write some code to manipulate sets of bignum ranges; specifically, to initialize such a set from a list of ranges, to take the union or intersection of such sets, to test if they contain exactly one integer, and to test if they contain exactly one contiguous range. Test your code thoroughly enough to be confident that it works. Tip 1: Beware of the possibility that a set could contain overlapping or adjoining ranges. Either canonicalize the sets after every operation, or carefully ensure that your code works even if the representation of a set is non-canonical. Tip 2: Bleichenbacher’s paper is written in terms of closed ranges; in other words, ranges which include both the lower and upper endpoints. But the code is simpler if you use half-open ranges, which include the lower endpoint but exclude the upper endpoint.
How can we manipulate a ciphertext to multiply the plaintext by ?
Good question! Indeed, in step 2 we need to find an integer , where is correctly padded. But we don’t know what is (yet), and anyways, our padding oracle takes ciphertexts as input (not plaintexts).
Given that we have , which is the encryption of , we need a way to find the encryption of (call that ), for any candidate value of . If we pass to the padding oracle, and it returns “true”, then we will know that is correctly padded, and that is a valid multiplier. Then we can proceed to step 3. Here’s the trick:
Remember the formula for RSA encryption:
If we apply the same encryption formula to , we get:
In other words, to test whether an candidate integer is a valid multiplier, multiply the ciphertext by , then pass it to the padding oracle.
How do we convert to a set of intervals which bound the value of the original plaintext?
Remember, the PKCS#1v15 padding scheme requires the first two bytes of the padded message to be 0x00 0x02
. Therefore, if the message is interpreted as a big-endian integer, then messages with correct padding will fall in the range (0x2000000…) up to (0x2FFFFFF…).
We need to give those endpoints names. Call the number (0x1000000…) . Then the range is .[3]
So the question is: for which values of , would fall into that range?
Definitely, must be big enough that is greater than ; needs to be big enough that its remainder after division by falls into .
Call the number of times that goes into ‘’ (for “remainder” after division by ). Then:
Although we know the value of , we don’t know the exact value of (yet). We just know that is correctly padded. So in the above equation, the smallest possible value of is:
And the largest possible value of is:
So all possible values of consistent with the fact that is a valid multiplier are:
We’re making progress! But we don’t know the value of . We know it must be at least 1 or more, but that’s not saying much. The above inequality doesn’t allow us to find a range of -values unless we know .
The breakthrough comes from realizing that we can turn the inequality around and solve it for . But wait a minute; we don’t know , so how can we solve for a specific value of ? Well, while we don’t know the exact value of , we do have the set which tells us possible values of . For each range in the set , we can take its endpoints and feed them into the above inequality to get a range of possible values for .
Take all the possible -values which you just derived, and for each one, solve for a new range of possible -values. Take the union of all those ranges to get the new set .
That reasoning may seem convoluted, but it’s not circular. Although we are using possible values of to obtain possible values of , and then possible values of to obtain possible values of , we don’t just get back to where we started. The discovery of each valid multiplier provides new information, and as a result, we do actually get a new and different set of possibilities , distinct from . (Hopefully is not just a superset of ; every time we set , we hope that the resulting sets become smaller and smaller.)
How do we find , the first value of ?
Just do a linear, brute-force search. Save yourself a few CPU cycles by starting from the smallest value which could potentially cause to wrap around and come back to .
Once we have , how do we find efficiently?
On page 5 (“Step 2.c”), the paper describes a procedure for taking a range from , and using it to narrow down the search for more -values. It’s based on the same inequality as the reasoning above:
Use the ranges in to find possible values of in that inequality; then use those values of to constrain the possible values of . The paper states that this method should be used when there is only one contiguous range in .
This method is far more efficient than doing a brute-force search over all integers to find -values. It’s not just that you find new -values more quickly; for some reason which I don’t understand, it seems that -values found using this strategy tend to narrow down the size of very effectively.
I did actually try implementing Bleichenbacher’s attack without this search strategy, just doing an incrementing linear search over all integers. That code found lots of -values, but the resulting sets seemed random, and the smaller became, the longer it took to find another set which would actually help to reduce the size of . As a result, cracking an encrypted message took approximately forever (or at least, longer than I was willing to wait).
One more tip
If you are implementing Bleicherbacher’s attack, remember that the inequalities in the paper are expressed in terms of real number division, but your bignum library probably gives you truncating integer division. The difference can and will throw endpoints (especially upper endpoints) out by one and cause your attack code to fail.
Reason carefully on those inequalities to avoid this.
Lessons?
- Don’t create padding oracles.
- More generally, don’t give attackers information; avoid even subtle leaks of information in error messages and so on.
- Don’t use PKCS#1v15 padding.
- More generally, don’t use outdated cryptosystems which are known to be crackable.
Thanks for reading! If you want to learn more about this attack, this is a good time to dive into the original research paper. Or if you haven’t done it before, step up to the plate and take on cryptopals challenge #47 and #48.
[1] The RFC says this padding format is only for “public key operations”; that essentially means “for encryption, but not for signing”. ⏎
[2] If you want to order a copy, it doesn't necessarily need to be the newest edition. Personally, I learned from the 8th edition of Contemporary Abstract Algebra. The newest edition may be several times more expensive than the previous one. ⏎
[3] The notation indicates that the range includes but excludes ; this is called a “half-open” range. ⏎