dev-resources.site
for different kinds of informations.
How to truncate CBC ciphertext
Suppose you have a large CBC ciphertext, say AES-256-CBC encrypted, and you only need the first n
bytes of the plaintext. You might know that it is a text document, and you want to display a preview. Or you might want to determine its format by some magic bytes at the beginning of the file.
You could of course decrypt the whole thing and only use the first n
bytes, but obviously that'd be not very good performance-wise.
I'll show two ways of truncating CBC ciphertext so that you don't need to decrypt the whole file to get the first bytes. The idea is very simple: Remove everything from the ciphertext but the first blocks. But there is one gotcha, and that's padding. So I'll write a little about that. To reproduce the example, you need a *nix shell, the OpenSSL command line utility, hexdump, basenc (from GNU coreutils) and dd.
Let's prepare an example plaintext and ciphertext:
$ echo -n "Lorem ipsum dolor sit amet, consetetur" > lorem.txt
$ openssl enc -aes-256-cbc \
-in lorem.txt \
-out lorem.enc \
-iv be154e2343408caa1f11ab3445bdd34c \
-K be154e2343408caa1f11ab3445bdd34cbe154e2343408caa1f11ab3445bdd34c
Decryption is done block by block, instead of decrypting the first n
bytes, we decrypt the first m
blocks, where m
is the smallest multiple of the block size (16 bytes in our case), so that m * { block size } <= n
. In this tutorial, I'll decrypt the first block.
The naive approach
If we only want to decrypt the first block, we should be able to remove everything else from the ciphertext:
$ dd if=lorem.enc of=lorem-truncated.enc bs=16 count=1
And decrypt:
$ openssl enc -aes-256-cbc -d \
-in lorem-truncated.enc \
-iv be154e2343408caa1f11ab3445bdd34c \
-K be154e2343408caa1f11ab3445bdd34cbe154e2343408caa1f11ab3445bdd34c
bad decrypt
408F800302000000:error:1C800064:Provider routines:ossl_cipher_unpadblock:bad decrypt:providers/implementations/ciphers/ciphercommon_block.c:107:
Okay, that didn't work. The last command should result in a "bad decrypt" error message, raised in OpenSSL's providers/implementations/ciphers/ciphercommon_block.c:107. We can see in the source code that OpenSSL looks at the last byte of the given block buf
and raises an error if this byte is 0 or greater than the block size, i.e. greater than 16 or 0x10.
Let's play a little with OpenSSL to find out why this happens.
Playing with OpenSSL
To simplify the following shell commands, we can make tiny shell scripts for encryption and decryption:
# enc.sh
openssl enc -aes-256-cbc \
-iv be154e2343408caa1f11ab3445bdd34c \
-K be154e2343408caa1f11ab3445bdd34cbe154e2343408caa1f11ab3445bdd34c
# dec.sh
openssl enc -aes-256-cbc -d \
-iv be154e2343408caa1f11ab3445bdd34c \
-K be154e2343408caa1f11ab3445bdd34cbe154e2343408caa1f11ab3445bdd34c
Okay, so the first question is: Is buf
a block of the plaintext or the ciphertext? Shouldn't it be part of the ciphertext because the error occured during decryption? Let's find out.
The truncated ciphertext looks like this:
$ hexdump -C lorem-truncated.enc
00000000 81 90 91 f9 c1 33 1b fb 6d 49 3f d5 c5 04 43 89 |.....3..mI?...C.|
00000010
We can change the last byte to something between 0x00 and 0x10, say 0x0a, to see if the error message changes:
$ echo 819091f9c1331bfb6d493fd5c504430a | basenc --base16 -d | ./dec.sh
Nope, still the same.
Okay, now the same with the plaintext. Currently, we have:
$ hexdump -C lorem.txt
00000000 4c 6f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f |Lorem ipsum dolo|
00000010 72 20 73 69 74 20 61 6d 65 74 2c 20 63 6f 6e 73 |r sit amet, cons|
00000020 65 74 65 74 75 72 |etetur|
00000030
Since we truncate the second and third block of the ciphertext anyway, we ignore the corresponding plaintext blocks for the moment. We can reproduce the problem with only one input block:
$ echo 4c6f72656d20697073756d20646f6c6f \
| basenc --base16 -d \
| ./enc.sh \
| dd bs=16 count=1 status=none \
| ./dec.sh
bad decrypt
408F800302000000:error:1C800064:Provider routines:ossl_cipher_unpadblock:bad decrypt:providers/implementations/ciphers/ciphercommon_block.c:107:
Now our little experiment. We change the last byte to 0x0a:
$ echo 4c6f72656d20697073756d20646f6c0a \
| basenc --base16 -d \
| ./enc.sh \
| dd bs=16 count=1 status=none \
| ./dec.sh
bad decrypt
408F800302000000:error:1C800064:Provider routines:ossl_cipher_unpadblock:bad decrypt:providers/implementations/ciphers/ciphercommon_block.c:112:
Okay, we still get a "bad decrypt" error. But: It is raised now in line 112 instead of line 107! It seems that we passed the check in ciphercommon_block.c:106 and now run into a different problem. Let's play a little with this to make sure we got it right:
$ echo 4c6f72656d20697073756d20646f6c00 \
| basenc --base16 -d \
| ./enc.sh \
| dd bs=16 count=1 status=none \
| ./dec.sh
# Last byte is 0x00 -> error in line 107
$ echo 4c6f72656d20697073756d20646f6c10 \
| basenc --base16 -d \
| ./enc.sh \
| dd bs=16 count=1 status=none \
| ./dec.sh
# Last byte is 0x10 (block size) -> error in line 112
$ echo 4c6f72656d20697073756d20646f6c11 \
| basenc --base16 -d \
| ./enc.sh \
| dd bs=16 count=1 status=none \
| ./dec.sh
# Last byte is 0x11 (> block size) -> error in line 107
Great. So the decryption fails after decrypting a block, the problem is the format of the plaintext. First of all, the last byte of the plaintext has to be greater than 0 and smaller than or equal to the block size.
Now, if the last byte is correct, why is an error raised in ciphercommon_block.c:112? The for
loop looks at the last pad
bytes of the block, where pad
is the last byte itself. If one of these bytes is not equal to pad
, an error is raised. So if pad
is 0x01, the for
loop only looks at the last byte itself. Let's try it:
$ echo 4c6f72656d20697073756d20646f6c01 \
| basenc --base16 -d \
| ./enc.sh \
| dd bs=16 count=1 status=none \
| ./dec.sh
Lorem ipsum dol
That works! But, of course, we lost the last byte to satisfy OpenSSL. If pad
is 0x02, the for
loop looks at the last two bytes and requires both of them to be 0x02:
$ echo 4c6f72656d20697073756d20646f0202 \
| basenc --base16 -d \
| ./enc.sh \
| dd bs=16 count=1 status=none \
| ./dec.sh
Lorem ipsum do
Above, we tried 0x10 as last byte. Now we know that if the last byte is 0x10 (the block size), then every byte of the block has to be 0x10:
$ echo 10101010101010101010101010101010 \
| basenc --base16 -d \
| ./enc.sh \
| dd bs=16 count=1 status=none \
| ./dec.sh
As expected, no error occurs, but also no plaintext is left.
Padding
We found out that the last bytes of our plaintext block must be one of these, or else OpenSSL complains:
01
0202
030303
...
10101010101010101010101010101010
These are padding bytes. When encrypting data, OpenSSL adds them to the plaintext so that the length of the data is a multiple of 16 bytes. After all, block ciphers need blocks of a fixed length as input, 16 bytes in the case of AES. If the plaintext length is already a multiple of 16 bytes, a whole block of 0x10 padding bytes is added. So the last byte is always a padding byte. This last byte is B
the last B
bytes are removed from the plaintext after decryption.
This is one of many possible padding schemes, called PKCS #7. You might get a little confused when you try to find its specification. At least, I did. PKCS #7 was the name of a standard for a format to store encrypted data in. It describes something called "envelope encryption", which means data is encrypted, and then the encryption key is itself encrypted and sent together with the encrypted data. RFC 2315 describes this process and mentions in passing the above padding scheme. And even symmetric encryption doesn't necessarily have anything to do with envelope encryption, the padding scheme is now called PKCS #7. Basically the same scheme (only for smaller block sizes) is described in the PKCS #5 standard (RFC 1423), again in a very specific context, here of DES-CBC encryption. To add some more confusion, PKCS #7 is today obsoleted by something called CMS, described in RFC 5652. And this specification describes the same padding scheme.
We can see the padding bytes added by OpenSSL during encryption with the -nopad
flag:
$ openssl enc -aes-256-cbc -d \
-in lorem.enc \
-iv be154e2343408caa1f11ab3445bdd34c \
-K be154e2343408caa1f11ab3445bdd34cbe154e2343408caa1f11ab3445bdd34c \
-nopad \
| hexdump -C
00000000 4c 6f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f |Lorem ipsum dolo|
00000010 72 20 73 69 74 20 61 6d 65 74 2c 20 63 6f 6e 73 |r sit amet, cons|
00000020 65 74 65 74 75 72 0a 0a 0a 0a 0a 0a 0a 0a 0a 0a |etetur..........|
00000030
Back to the truncating problem
I think the best option for truncating the ciphertext is to use the -nopad
flag. Then OpenSSL treats padding bytes as plaintext, meaning it doesn't expect any padding bytes:
$ openssl enc -aes-256-cbc -d \
-in lorem-truncated.enc \
-iv be154e2343408caa1f11ab3445bdd34c \
-K be154e2343408caa1f11ab3445bdd34cbe154e2343408caa1f11ab3445bdd34c \
-nopad
Lorem ipsum dolo
Great, we are done. The naive approach was almost right, we simply had to add -nopad
.
But just for fun, I want to show you another possibility I thought of before I knew the -nopad
option existed.
We know that the last block of the ciphertext contains the encrypted padding bytes. So we can simply keep this last block. However, the last block can only be decrypted properly if the second-to-last ciphertext block is also present, as in CBC mode, any decrypted block is XORed with the ciphertext block before. So we keep the last two blocks. This means that our truncated ciphertext has at least three blocks. Our original plaintext had only three blocks to begin with, so to demonstrate the procedure, we need a slightly longer plaintext.
$ echo "Lorem ipsum dolor sit amet, consetetur sadipscing elitr, sed" > lorem.txt
$ openssl enc -aes-256-cbc \
-in lorem.txt \
-out lorem.enc \
-iv be154e2343408caa1f11ab3445bdd34c \
-K be154e2343408caa1f11ab3445bdd34cbe154e2343408caa1f11ab3445bdd34c
$ hexdump -C lorem.enc
00000000 81 90 91 f9 c1 33 1b fb 6d 49 3f d5 c5 04 43 89 |.....3..mI?...C.|
00000010 a2 47 23 26 af 09 04 6e f3 f6 1a b7 16 b9 95 18 |.G#&...n........|
00000020 93 9c e7 f4 23 a2 2e c6 c5 49 e1 73 65 52 cc 92 |....#....I.seR..|
00000030 48 b3 8f c6 61 50 50 52 1d bf bf 43 b4 c8 d8 8a |H...aPPR...C....|
00000040
Our new lorem-truncated.enc
contains the first and last two blocks of lorem.enc
:
$ dd if=lorem.enc of=lorem-truncated.enc bs=16 count=1
$ dd if=lorem.enc bs=16 skip=2 >> lorem-truncated.enc
$ hexdump -C lorem-truncated.enc
00000000 81 90 91 f9 c1 33 1b fb 6d 49 3f d5 c5 04 43 89 |.....3..mI?...C.|
00000010 93 9c e7 f4 23 a2 2e c6 c5 49 e1 73 65 52 cc 92 |....#....I.seR..|
00000020 48 b3 8f c6 61 50 50 52 1d bf bf 43 b4 c8 d8 8a |H...aPPR...C....|
00000030
OpenSSL can decrypt this file without errors:
$ openssl enc -aes-256-cbc -d \
-in lorem-truncated.enc \
-iv be154e2343408caa1f11ab3445bdd34c \
-K be154e2343408caa1f11ab3445bdd34cbe154e2343408caa1f11ab3445bdd34c \
| hexdump -C
00000000 4c 6f 72 65 6d 20 69 70 73 75 6d 20 64 6f 6c 6f |Lorem ipsum dolo|
00000010 46 a3 d7 ab 1b 48 3f e6 ff db 4c 12 a0 de bf ff |F....H?...L.....|
00000020 67 20 65 6c 69 74 72 2c 20 73 65 64 0a |g elitr, sed.|
0000002d
The second block is garbage because OpenSSL XORed the decrypted block with the first block of lorem-truncated.enc
. But it should be XORed with the second block of lorem.enc
, which isn't there anymore.
But we can of course simple remove the garbage block and the padding block from the plaintext:
$ openssl enc -aes-256-cbc -d \
-in lorem-truncated.enc \
-iv be154e2343408caa1f11ab3445bdd34c \
-K be154e2343408caa1f11ab3445bdd34cbe154e2343408caa1f11ab3445bdd34c \
| dd bs=16 count=1 status=none
Lorem ipsum dolo
Featured ones: