Data Encryption Standard (DES): a pure python module to (de)crypt using this standard

Let's talk again about cryptography. This time let's deal with Data Encryption Standard (DES). Here we won't tell you what every programmer dreams to implement the DES algorithm :-) But if you are studying the basic concepts about encryption, inevitably you come across DES ...

Note: the programming code discussed in this article is available on GitHub at this repository. More specifically the source/des.py and source/nbitarray.py modules.

Premise

The cryptographic concepts that are implemented in DES have deep roots back in time. It has been known for centuries that if you use substitution algorithms [1] to hide the meaning of a text, an attacker who receives a ciphertext can study the frequency of the symbols present in it.

In fact, each language has constant statistical frequencies of its letters, or pairs of letters, or triples ... In the hypothesis that:

a text has been encrypted;
the original language of the plaintext is known;
the ciphertext is long enough to contains quantities of symbols with statistical significance;

an attacker can make very strong assumptions about which symbol replaces what. Slowly reconstructing the plain text and, even worse, rebuilding the used key. Which makes him able to decipher any other messages composed using the same encryption key.

Those who study cryptography understand this problem. For example L.S.Hill used the matrix calculation to disperse the immersed statistical information in the text to be encrypted. It is the Hill's encryption method, which I mentioned in this other article.

More generally in 1945 C. Shannon [2] laid the groundwork for the use of symmetric cryptographic algorithms in which are present the properties of:

diffusion, i.e. every single bit of the cipher text must depend statistically by several bits of the text to be encrypted; in this way they alter the frequencies of recurrence of symbols between plaintext and ciphertext;
confusion, it must not be possible to reconstruct the relationship between the cipher text and the encryption key; i.e. the encryption must not take place with operations that admit an inverse relationship [3].

In 1973 H. Feistel and his colleagues at IBM brought Shannon's theories into live designing Lucifer, an IBM cryptographic product.

In 1977 Feistel's slightly modified work was accepted and published as official standard (FIPS) of the National Institute of Standards and Technology (NIST) US, with the name Data Encription Standard [4]: the story of DES had begun.

DES was superseded in 2005 when it was officially withdrawn by NIST. Up until this date, it has been by far the most used symmetric encryption standard. And also the most attacked. Already in 1997 it was publicly demonstrated a decryption of a DES message, without using the key. Over the years, the increase in computing power of computers and the ever greater sophistication of the methods of analysis have led to the limit the capabilities of the DES, despite the work to make it more robust (Triple DES). So that in 2001 NIST published its heir: the Advanced Encryption Standard (AES), and, as mentioned, withdrew it in 2005.

However it must be said that Feistel's work remains a milestone, and inspired subsequent similar algorithms, such as e.g. Blowfish, released in 1993.

Observations

The DES algorithm is relatively complex.

The text Cryptography and Network Security by Behrouz A. Forouzan, at chapter 6 [5] gives a clear description of how this algorithm works.

The general structure is illustrated in the following figure.

Text to be encrypted (plaintex 64 bit) undergoes an initial permutation and then undergoes 16 times in the treatment (round) described in the figure below.

Basically the right half of the text is combined with the subkey Ki in the des function, the detail of which is described in the previous figure. The result of this operation is put in exclusive or (xor) with the half left of the incoming text, to then be output as the right half. While the left half of the output text is a copy of the right half of the incoming text.

Only round 16 is an exception, as there is no final trade between the two halves of text [6].

The generation of the subkeys K1 ... K16 is another important point, and has the following logic.

The parity bits (positions 8, 16, 24, ...) are removed from the 64-bit key. The remaining 56 bits are the actual key to use for encryption. This is divided into two halves (28 bits each), and at each generation phase of a subkey, the two halves are rotated to the left by one or two bits (depending on the turn) and merged into a compressor that returns 48 bits: the desired subkey.

Implementing this algorithm in Python is relatively complex. The point critical is the decision of how to implement a bit array. At first glance the simpler way is to use bytearray: byte vectors, with each element consisting of 8 bits. Except that you run into half keys of 28 bits, hence 3.5 bytes.

Personally, for simplicity of reading and studing, we preferred to implement a special support class, the NBitArray, which uses a bytearray as a container for the bits. For documentation, below we report the first lines of the module, which indicates the main operations available:

# :filename: nbitarray.py         a naive buffer
#
# main methods:
#
#   import nbitarray as nba
#
#   ba = nba.NBitArray(list_of_hex)    # create instance
#   bb = nba.NBitArray(list_of_bits)   # create instance
#   len(ba)                    # number of bits
#   ba[ndx]                    # bit at index ndx
#   ba[ndx] = bit_as_integer   # set bit at index ndx
#   ba == bb                   # eq operator
#   str(ba)                    # string of bits
#   ba + bb                    # concatenation operator
#   ba ^ bb                    # xor operator (ba and bb of the same length)
#   ba << n                    # left shift, note: ba.__lshift__(n, circular=True) does a circular left shift
#   ba >> n                    # right shift, note: ba.__rshift__(n, circular=True) does a circular right shift
#   ba.get_byte(bit_ndx|byte_ndx=)  # return one byte as integer from indicated position
#   ba.set_byte(x, bit_ndx|byte_ndx=, lenght=)  # set x as one byte at indicated position for the indicated length in bits
#   ba.permutate(permutation_table)  # return a permutated NBitArray obeying to the given permutation table. ...
#                                    #  ... permutation table is a list of integers where index indicate the position of the output bit ...
#                                    #  ... and value at the index is the position of the input bit.
#   ba.bit_list()              # return the bit array content as a list of integers with values 0|1
#   ba.hex()                   # return the bit array content as a string of hex numbers
#   ba.swap_lr()               # return an NBitArray with left and right halves inverted. len(ba) must be even

With such a class the operations requests are simple to make. For example, this:

import nbitarray as nba

#...<cut>...

class Key(object):
    #...<cut>...

    def __init__(self, key):
        self._key64  = nba.NBitArray(key)
        self._keys48 = self.calculate_keys()
        self._curr = -1

    def calculate_keys(self):
        '''calculate all 16 keys of 48 bits to use during rounds'''
        try:
            key56 = self.key_core     # drop parity bits and permutate
        except:
            raise ValueError('cannot calculate keys')
        result = []
        left  = key56[:len(key56)//2]
        right = key56[len(key56)//2:]
        for n in range(1, 17):          # rem.round is from 1 to 16 included
            if n in type(self)._one_key_shifting:
                left  = left.__lshift__(1, circular=True)
                right = right.__lshift__(1, circular=True)
            else:
                left  = left.__lshift__(2, circular=True)
                right = right.__lshift__(2, circular=True)
            tmp56 = left + right
            key48 = tmp56.permutate(type(self)._key_compression)
            result.append(key48)
        return result

    #...<cut>...

is the structure of a class that implements the computation of subkeys.

When instantiated with a 64-bit key, it is stored in _key64, while in _keys48 there is a list with the 16 subkeys.

The subkeys are calculated by the calculate_keys method which discards the parity bits [7], split the cipher key into two equal parts and then loop 16 times making the necessary circular shifts, recombining the key in tmp56 and compressing it by applying the permutation table _key_compression, creating the key48 subkey which is then added to the list.

As you can see, with about twenty lines of code you get the subkeys to be used for DES encryption. OK. If we are to be completely honest, the whole Key class takes about fifty lines :-) More than twenty, but still not shocking.

The whole des.py module, which implements the encryption and decryption of 64-bit text is about 400 lines of heavily commented code.

While the support module nbitarray.py is less than 300 lines long.

Conclusions and caveats

Implementing code for DES (de)encryption greatly helps to understand the details of how it works. It isn't a small commitment, but the satisfaction, once finished, is proportional to the effort made.

Here I spend a word of warning about the contents of the tables to be used to carry out the various operations. DES is a whole series of permutations, possibly with expansion or compression of input bits. Permutations that are documented by numeric vectors that match the output bit with the input bit [8].

While the S-Boxes, which I have not described here, are tables that put in relation a sextet of bits in input with a quartet of bits in output.

Good. Getting a value wrong in one of these (many) tables is really easy. And it is definitely expensive to hunt down these kinds of mistakes, since we programmers tend to assume that we have the wrong code, and not the wrong data. Not only. We also found errors in the tables written in the book indicated above. So, if you want to work personally at this exercise, it will be better to equip yourself with a couple of essential tools:

the document of the standard, where the tables are correct;
and a friend/relative with a lot of patience, who will take action after you have wrote the various tables; he will have to dictate you the values one by one to double check them.

Enjoy. ldfa

[1]

That is, constantly changing a PS symbol with another CS. The encryption key is formed by the set of all pairs of symbols (PS, CS) and is usable both to produce the ciphertext starting from the plaintext, and vice versa: the plaintext starting from the ciphertext. For this reason we speak of symmetric encryption. Because you use the same key for both encryption than to decryption.

[2]	It's the Shannon's report entitled A mathematical theory of cryptography published in confidential form in 1945 and made public in 1957.

[3]	This is where Hill's encryption is lacking.

[4]	The relative document that defines it can be downloaded from this URL.

[5]	This chapter can be found on the Internet on Cleveland State University's teaching server.

[6]	Or, as done in py_naive_cryptology's des.py, the 16th round is standard, only to be followed by a further exchange of left/right parts of the text.

[7]	This is done by the key_core property, whose listing is not shown here. See the code of the module des.py.

[8]

A permutation table is a vector (or, in Python, a list) of integers. For example pt = [2, 3, 4, 5, 6, 7, 8, 1] can be a permutation table valid for an 8-bit byte, with the convention that positions of the bits and the indices of the vector take the values from 1 to 8, counting from left to right. In this case the index of an element identifies the position of the output bit, while its value element identifies the position of the input bit. Suppose we have the byte b = [1,0,0,1,1,0,0,0]. By applying pt, the bit in position 2 must pass in position 1, the 3 in 2, and so on until the bit in position 1 passes into position 8. Our permuted byte will be pb = [0,0,1,1,0,0,0,1].