The examples that we’ve given so far have used binary keys. These keys need to be stored in some fashion –  on a hard drive or floppy disk for instance. Password-based encryption (PBE) uses a password as the key. This is convenient because the security then rests with the user rather than in a physical medium. Unfortunately, password-based encryption in general isn’t as secure as algorithms with binary keys like TripleDES or Blowfish. To demonstrate this, let’s say we use Blowfish with a key size of 128 bits. That gives us a keyspace of 2128. Password-based encryption typically uses ASCII characters. An average user’s password might be 6 characters in length. If it’s entirely lowercase, there are only 266 possible keys, or roughly 228. Adding capital characters and symbols helps quite a bit, but it cannot even approach the keyspace of a good symmetric algorithm.

To add to their insecurity, most passwords are simple everyday words. If an attacker were trying to crack a password, they would likely try every English word, which could be done rather quickly on a fast computer. This is known as a dictionary attack and is surprisingly successful. There are a number of tools out there that will automate a dictionary attack –  some even test combinations of words, differing capitalization, and the use of numbers and some symbols. It’s quite interesting to try one of these tools out and see how secure a password is. A good password should be at least 8 characters long and use capitalization, numbers, and symbols, like “m1Nnes0+a”. It’s important that the password be simple enough to memorize, however, as you should never write a password down.

Password encryption uses a combination of hashing and normal symmetric encryption. The password is hashed using a message digest algorithm like SHA-1 (we’ll discuss this topic more in Chapter 6), and then the resulting hash is used to construct a binary key for an algorithm like Blowfish as shown in the following diagram:

As we mentioned, one of the problems with password-based encryption is that it is possible to create a precompiled list of passwords, hash them, and have a set of keys ready to try on encrypted data. This allows an attacker to try all normally used keys very quickly, and so determine if any of them decrypt the data. There are two techniques used to combat this attack: salting and iteration counts. A salt is a random value appended to the password before it is hashed to create a key, and the iteration count is the number of times the salt and password are hashed. Let’s examine each of these in some detail and see how they improve security.

### Salt

By adding a random piece of data to the password before hashing it, we greatly increase the number of possible keys created from a single password. Without a salt, the word “sasquatch” hashes to a specific value. If we add 8 bytes of random data before hashing it, then “sasquatch” can hash to 264 possible values. That means that it is effectively impossible to create a precompiled-key list dictionary, as it would be far too large. Instead, the attacker is forced to perform a hashing computation rather than a simple lookup on the data, which will take quite some time.

The salt is stored with the data that is encrypted. Each time a new piece of data is encrypted, a new salt is generated. This means that the same phrase will encrypt to a different value each time it is encrypted, even if the same password is used every time. To decrypt, the salt must be extracted from the encrypted data, and then combined with the password to create the decryption key.

In just a moment we’ll show an example of using salt and password-based encryption that should help make this process clearer.

### Iteration Count

The iteration count is an attempt to increase the time that an attacker will have to spend to test possible passwords. If we have an iteration count of a thousand, we need to hash the password a thousand times, which is a thousand times more computationally expensive than doing it just once. So now our attacker will have to spend 1000 times more computational resources to crack our password-based encryption.

Let’s write a simple class that does password-based encryption and decryption. We’ll use an iteration count of 1000, and a random 8-byte or 64-bit salt. When we write the ciphertext out, the first 64 bits will be the salt that we need to use to create the key to decrypt it. When we decrypt, we’ll use those 64 bits as the salt and we’ll only decrypt the ciphertext that begins after the 64th bit. Note that the salt isn’t being kept secret –  it’s just different for each batch of text that we’re going to encrypt.

We want the output of the example to be displayable on the screen. To accomplish that, we’re going to BASE 64 encode the output, which transforms binary data into ASCII characters.

### BASE64 Encoding

Binary data is typically stored in bytes of 8-bits. Standard ASCII is only 7 bits though, so if we want to display binary as ASCII, we’re going to lose at least one bit per byte. BASE64 encoding is a way of overcoming this problem. 8-bit bytes are converted to 6-bit chunks and then into characters. Six bits are used so that some control characters can be used indicating when the data ends. The encoded characters can then be displayed on the screen and converted back into binary with no difficulty. Of course, since we’re moving from an 8-bit chunk to a 6-bit chunk, we’re going to have more chunks –  3 bytes becomes 4 characters and vice-versa.

There is a BASE64 encoder and decoder in the sun.misc package. Since this is not included in a java.* package, its location could change in a future release of Java.

For that reason, we have provided a BASE64 encoder and decoder in Appendix C of this book.

We can use it as a drop-in replacement for the sun.misc implementation, by changing the import statement in PBE.java from:

to

and include the BASE64 classes in the classpath.

Our code example will have two options: encryption and decryption. Encryption will require a password and some plaintext, and decryption a password and some encrypted data. We’ll create a salt for the encryption and prepend it to the ciphertext after it’s been encrypted, like so:

When decrypting, we’ll take that block of encrypted data and separate it into the salt and the ciphertext. Then we can use the password and the salt to initialize a cipher that can decrypt the ciphertext into the original plaintext message like this:

Password-based encryption in Java requires that we use two new classes in the `javax.crypto.spec` package, `PBEKeySpec` and `PBEParameterSpec`, as well as `javax.crypto.SecretKeyFactory`.

### PBEKeySpec

We use `PBEKeySpec` to create a key based on a password, using an instance of `SecretKeyFactory`. This is one of the few classes in the JCE that we can actually call the standard constructor on. It takes a char array as an argument due to the fact that passwords are typically not stored as strings, since strings are immutable in Java, and there are times when you would like to be sure that a password is cleared out when you’re done using it. For more information, see Chapter 2 on writing secure code.

### SecretKeyFactory

In order to actually use the `PBEKeySpec` as a key, we need to run it through a `SecretKeyFactory`, which will generate the key given a key specification, by calling `generateSecret()`.

We create an instance of SecretKeyFactory by calling getInstance() with the name of the algorithm we need. Here’s an example of creating a `SecretKey` from a `PBEKeySpec`:

### PBEParameterSpec

To create a cipher that uses PBE, we need to pass the salt and the iteration count to the cipher on instantiation. `PBEParameterSpec` is a wrapper for that salt and iteration count. It is given to the cipher on initialization, along with the key. Assuming the salt and iterations are already defined, here’s how we would get a cipher, using `PBEParameterSpec`:

### PBE Algorithm Names and Providers

You may have noticed the strange name for the algorithm that we used to initialize our key factory and cipher above, `PBEWithSHAAndTwofish-CBC`. This specifies password-based encryption using SHA as a message digest and Twofish in CBC mode as the symmetric encryption algorithm. There are a great number of possible password-based encryption algorithms, including but not limited to:

Most of the above algorithms are equivalently secure. The only possible exception is `PBEWithMD5AndDES`. Since DES uses a 56-bit key, it’s possible that a password would be more secure than the DES key it creates. We recommend using one of the other algorithms if you have a choice in your application.

Unfortunately, no crypto provider supports all PBE algorithms and here we’ll use the Bouncy Castle provider to get support for `PBEWithSHAAndTwofish-CBC`.

Now let’s take a look at our complete example of PBE. It uses SHA-1 and Twofish to encrypt or decrypt some text using a password. The iteration count is set to 1000 and the salt is randomly chosen on encryption. The salt is then written to the first 8 bytes of the output text.

### PBE Example Code

When decrypting, this program will read the first 8 bytes of the ciphertext and use that as the salt:

If you’re not using a Sun-based VM, comment out the import sun.misc.* line shown above, and uncomment the following line. This will switch to our BASE64 encoders:

### Running the Example

After compiling, we can run the example. For example, to encrypt we could type:

This will encrypt the string “Hello World!” using PBE with the password of “sasquatch”. We should see output similar to the following:

We can decrypt the output with:

and of course, we get our original message returned to us:

Now let’s look at a different method of encryption.

Professional Java Security : Symmetric Encryption Part 3