NKS-SJSU: Template Page

Research Paper

By Haile Eyob

Cryptographic Randomness

Presentation

Abstract

1. Introduction

2. How Cryptography Works

3. Classic Cryptography

3.1 Caesar Cipher

3.2 Vigenere Cipher

3.3 Rotor Machines

4. Symmetric Cryptography

4.1 One-Time Pad

4.2 Data Encryption Standard

5. Public-Key Cryptography

6. Pseudorandom Numbers

6.1 Pseudorandom Numbers Generation

6.2 Linear Congruential Generators

6.3 Linear Feedback Shift Register

6.4 Blum-Blum-Shub Random Number Generator

7. Random Number Generation Using Rule 30

7.1 Intrinsic Randomness

7.2 Properties of Rule 30 Cellular Automata

8. Encryption Using Cellular Automata

9. Conclusion

10. References

Abstract

The concept of using cellular automata computing in the fields of cryptography has been identified as a possible technology that may bring forward a new hope for computationally simple and reliable algorithms.

Wolfram has studied Rule 30 in relation to cryptography and random number generation. The use of cellular automata computing is still limited and information security is mainly based on other encryption methods for its development.

1. Introduction

Cryptography is a science of using mathematics to encrypt and decrypt data. Cryptography enables us to store sensitive information or transmit so that the intended recipient cannot read it. Data that can be read and understood without any special measures is called plaintext. The method of disguising plaintext in such a way as to hide its substance is called encryption. Encrypting plaintext results in unreadable text called ciphertext. The process of reverting ciphertext to its original plaintext is called decryption.

Encryption and decryption require the use of some secret information, or a key. Depending on the encryption mechanism used, the same key might be used for both encryption and decryption. In other mechanisms, public key cryptography for example, the keys used for encryption and decryption are different.

2. How Cryptography Works

A cryptographic algorithm, or cipher, is a mathematical function used in the encryption and decryption process. A cryptographic algorithm uses a key to encrypt the plaintext. The security of encrypted data depends on the strength of the cryptographic algorithm and the secrecy of the key.
Mathematically,
                 Encryption is given by: E(M) = C

          and Decryption is given by: D(C) = M
where M is the encrypted message,
          C is the ciphertext,
           E is the encryption function, and
           D is the decryption function
A cryptographic algorithm and all possible keys and all the protocols that make it work comprises a cryptosystem.

3. Classic Cryptography

3.1 Caesar Cipher

One of the classic and simple examples of a substitution cipher is the Caesar cipher. The Caesar cipher is a shift cipher since the ciphertext alphabet is derived from the plaintext alphabet by shifting each letter a certain number of units. For example, “A” when shifted three units to the right is encrypted as “D.”

The encryption algorithm is:

C = E(P) = (P + K) mod(N); for k an integer, 1 £ k £ N - 1. N is the number of letters in the text language. The decryption algorithm is:

P = D(C) = (C - K) mod(N).

Cryptanalysis is easily performed using brute force. We simply try all the N - 1 possible keys. The secret key can also be inferred using frequency analysis, based on the knowledge of the frequency of the various letters in the text language.

Figure 1 shows the relative Frequency of Letters in English language. The most frequent letter in the English language is e, then letter ‘t’ and son on.

Figure 1 Relative Frequency of Letters in English Text [8].

3.2 Vigenere Cipher

One of the main problems with simple substitution ciphers is that they are so vulnerable to frequency analysis. One of the most common approaches is to suppress the normal frequency data by using more than one alphabet to encrypt the message. The Vigenere cipher is a polyalphabetic cipher. A polyalphabetic substitution cipher involves the use of two or more cipher alphabets. Instead of there being a one-to-one there is a one-to-many relationship between each letter and its substitutes. Figure 2 shows the substitution based on Vigenère cipher.

Figure 2. The modern Vigenère tableau [8].

The cipher letter is the letter in the tableau determined by the keyword column and the plaintext row. For example if “S” is the key and the plaintext letter is “D” then the ciphertext letter is “V.”

By looking for common factors in the displacement of the various sequences. One should be able to make a good guess of the keyword length. If the keyword length is k, then the cipher consists of k monoalphabetic substitution ciphers. One can use the frequency characteristics of the language to find out the key.

3.3 Rotor Machine

Another technique of encryption was the Rotor Machines. The Enigma Machine was an example of a rotor based cipher machine. Each rotor in a rotor cipher machine permutes the letters of the alphabet. The rotors are mechanically linked together, so that the first rotor advances one position with each press of a key; the second rotor advances one position after the first completes one rotation; and so forth.

Figure 3. Three-Rotor Machine [8].

Each cylinder is independent of the others. A single cylinder represents a monoalphabetic substitution. Each algorithm has a period of N, where N is the number of letters in the language. The result is that we have N^k different substitutions, where k is the number of cylinders used. For example in figure 1, since there are three cylinders, the number of substitutions is 26³ = 17576.

The multiple stages of encryption makes cryptanalysis difficult than the previous ciphers.

3. Symmetric Cryptography

In conventional cryptography, also called secret-key or symmetric-key encryption, one key is used for both encryption and decryption. Symmetric-key encryption is one of the most common types of encryption prior to the emergence of public-key encryption. One of the most secure symmetric ciphers is the one-time pad.

4.1 One-time pad.

In this cryptosystems, a random key as large ass the message is shared between the send and receiver. To compute the ciphertext the sender calculates as follows:

C = M Å K

The cipher text C is computationally indistinguishable from a random string. To decrypt the ciphertext the receiver computes as follows:

C Å K = (M Å K) Å K = M Å (K Å K) = M Å 0 = M

The one-time pad is computationally efficient and secure. The problem is the sender and the receiver must share a very large secret key. The security of the cryptosystem depends on the fact that the secret key K is used only once.

4.2 The Data Encryption Standard (DES)

The Data Encryption Standard is an example of a conventional cryptosystem that is widely employed. DES uses a unique 56-bit key to transform a 64-bit plaintext message into a 64-bit ciphertext message [NBS77]. The three operations used in DES are, XOR, substitution, and permutation.

When used for communication, both sender and receiver must know the same secret key, which can be used to encrypt and decrypt the message. The same algorithm is used with the same key to convert ciphertext back to plaintext. The DES consists of 16 rounds of operations that mix the data and key together using permutation and substitution. The data and the key are scrambled so that every bit of the ciphertext depends on every bit of the data and every bit of the key. After sufficient rounds, there should be no correlation between the ciphertext and either the original data or key.

Figure 4. Key Generation for Simplified DES [7].

A simplified DES designed by Ed Schaefer from Santa Clara University is presented in figure 2. Only the key generations is shown. The key is 10 bits long. The two keys K₁ and K₂ are generated by applying first permutation followed by shifting some bits of the original key.

5. Public-Key Cryptography

The main problem in symmetric cryptosystem is getting the sender and receiver to agree on the secret key. If they are in separate locations, they must use some kind of communication medium to prevent the disclosure of the secret key. Anyone who intercepts the key can later decrypt the ciphertext using that key. The cryptosystem must deal with key management issues. Because all keys in a secret-key cryptosystem must remain secret, secret-key cryptography often has difficulty providing secure key management, especially in open systems with a large number of users.

The concept of public-key cryptography was introduced in 1976 by Whitfield Diffie and Martin Hellman to solve the key management problem. Each person gets a pair of different but related keys: the public key and the private key. The public key is published while the private key is kept secret. Anyone can send a confidential message using public information, but the message can only be decrypted by someone who has the private key. It is not possible to determine the secret key from the public key.

Public key cryptosystems ensure secrecy between communicating parties without the need to distribute secret keys. The most famous public key cryptosystem is that devised by Rivest, Shamir, and Adleman (RSA). With RSA, the scheme relies on the inability to factor n where n = pq, and p and q are large strong prime numbers. When p and q are of roughly equal length, the modulus becomes harder to factor. Factoring is assumed to be a hard problem upon which several public-key cryptosystems are based. But no known polynomial time algorithm [5] will get deciphering keys given the public information.

In RSA algorithm encryption and decryption take the following forms:

C º M^e mod n

M º C^d mod n            (C^d º (M^e)^d mod n º M^ed mod n )

where M - plaintext

C - ciphertext

e - encryption key

d - decryption key

Finding the factors of an RSA modulus means finding the decryption key. If an adversary finds out the factors of n, he or he can easily compute the decryption key, using the following steps:

     n = p ´ q

f(n) = (p – 1)(q – 1)

\ d º e^-¹ mod(f(n))

The RSA algorithm has the following characteristics [8]:

It is computationally infeasible to determine the decryption key given only knowledge of the algorithm and the encryption key.

Either of the two keys can be used for encryption, with the other used for decryption.

6. Pseudorandom Numbers

The numbers generated by software functions are referred to as pseudorandom numbers because the sequence is deterministic. Given a particular function and a seed value, the same sequence of numbers will be generated by the pseudorandom number function. If the pseudorandom number generation function is well designed, the sequence of numbers appears to be statistically random.

A random number [9] is a number chosen from some specified distribution such that selection of a large set of these numbers reproduces the underlying distribution. Such numbers are also required to be independent, so that there are no correlations between successive numbers. The quality of random numbers can be measured in a variety of ways. One common method is to compute the entropy, in a series of numbers. The higher the entropy in a series of numbers is, the more difficult it is to predict a given number on the basis of the preceding numbers in the series.

6.1 Pseudorandom Number Generation

It is difficult to produce true random numbers in software. Physical noise generator [9], such as pulse detectors pf ionizing radiation events, gas discharge tubes, and leaky capacitors, are of potential source. Other source could be hardware device, such as a noisy diode, the keystroke timings measured in microseconds, the spinning of hard disks. A good source of entropy is a radioactive source. The points in time at which a radioactive source decays is completely unpredictable.

6.2 Linear Congruential Generator (LCG)

The widely used pseudorandom number generator is the linear congruential generator. LCG produces a pseudorandom sequence of numbers according to the linear recurrence relation:

x_n = ax_n-1+ b mod m

x₀ is the secret seed, and a, b, and m are integers which characterize the generator. The modulus m is an upper bound on the number of different values x_ican take on. The next result, x_n+1, depends only on the previous integer, x_n. We can have a full period if a, band m are carefully chosen.

Many multiplicative linear congruential generators are descendants of the infamous RANDU (IBM), where m = 2³¹, a = 65539(2¹⁶ +3), b = 0, and x₀ = 1. The non-prime modulus was selected to facilitate the mod operation and the multiplier was selected primarily because of the simplicity of its binary representation. RANDU does not have a full period and it has some distinctly non-random characteristics.

Figure 5. Distribution of points obtained from RANDU

Figure 5 shows the distribution of points obtained from RANDU. This LCG produces a fine lattice structure suggesting a strong correlation between the numbers generated. The points all lined up on parallel planes. All linear congruential random number generators have this flaw.

If m is prime and c = 0, then for certain values of a (for example, a = 7^5 = 16807), the period of the generating function is m – 1, with only the value of 0 missing. For 32-bit arithmetic, a convenient prime would be 2^31 – 1. The bigger the modulus is the maximum the randomness we get. The sequence is still deterministic.

6.3 Linear Feedback Shift Register (LFSR)

A Linear Feedback Shift Register is a mechanism for generating a sequence of binary bits. The register consists of a series of cells that are set by most often the secret key. The behavior of the register is regulated by a clock and at each clocking instant, the contents of the cells of the register are shifted right by one position, and the exclusive-or of a subset of the cell contents is placed in the leftmost cell. One bit of output is usually derived during this update procedure.

Figure 6. Linear Feedback Shift Register (LFSR)
In an LFSR, the bits contained in selected positions in the shift register are combined in some sort of function and the result is fed back into the register's input bit. By definition, the selected bit values are collected before the register is clocked and the result of the feedback function is inserted into the shift register during the shift, filling the position that is emptied because of the shift.

Since there are 2ⁿ possible states for an n bit register, the maximum period for a LFSR is given as 2ⁿ - 1, since any LFSR initialized with a string of 0s will produce an infinite string of 0s as output.

6.4 Blum-Blum-Shub pseudorandom generator (BBS)

The Blum-Blum-Shub pseudorandom bit generator is a cryptographically secure pseudonumber bit generator (CSPRBG) under the assumption that integer factorization is intractable [3]. The assumption is that it is infeasible to factor the modulus m if it is the product of two distinct primes of approximately equal length that are both congruent to 3 modulo 4 (such composites are called Blum integers). The problem of factoring composites is widely assumed to be an infeasible task, and Blum integers are thought to be among the hardest composites to factor.

The BBS algorithm is as follows:

Generate two large secret and distinct random primes p and q,                                                                  p º q º 3 mod 4, and compute n = p´q.

Select a random integer s (the seed) in [1, n-1] such that gcd(s; n) = 1, and                                                compute x₀ = s² mod n

For i from 1 to k do the following:
      x_i = x²_i-1 mod n.

            z_i = the least significant bit of x_i.

The output sequence is z₁; z₂;, z₃ . . . , z_k.

Given n, but not p and q, the problem of determining whether or not x_i is a quadratic residue (Quadratic Residue Problem) is intractable. The security lies in the difficulty of factoring n, unpredictable to the left (previous bit) and unpredictable to the right (next bit). One disadvantage is that it is computationally intensive compared with the traditional techniques. This

7. Random Number Generation Using RULE 30

The security of many cryptographic systems depends upon the generation of random numbers. The number generated must be big and the probability of any particular value being selected must be sufficiently small. Random numbers obtained from a physical process are in principle good for cryptography. A noisy diode, the keystroke timings measured in microseconds, the spinning of hard disks, noise from the leak current of a diode or transistor, least significant bits of audio inputs, times between interrupts, etc. are all good sources of randomness.

The problem with these systems [7] is that there could be nonrandomness in the generated sequence. The underlying physical processes might be random, but many kinds of measuring instruments are between the **** computer and the physical process. These physical processes [9] seem random and appear to follow no definite rules, and to be governed merely by probabilities. However, all fundamental physical laws, at least outside of quantum mechanics, are thought to be deterministic. The randomness can be attributed to the effects of external random input.

As described earlier, the sequence generated by any multiplicative linear congruential generator shows some regularities. All multiplicative linear congruential generator produce lattice structure that suggest strong correlations between the numbers generated. In LFSR, when the maximum period of a LFSR is reached (the length of the output sequence), the sequence starts repeating. Thus, the sequence produced shows correlation or regularity. This property [3] is common to all sequences having the general form x_n+1 = f(x_n), when f transforms a finite set to itself.

Recognizing patterns [10] in the sequences generated by BBS random sequence generator, or factoring the modulus in RSA are assumed to require more than polynomial time. But their cryptanalysis is not known to be NP-Complete.

7.1 Intrinsic Randomness

In intrinsic randomness generation, every cell in effect actively contributes to the randomness. This randomness occurs as a consequence of the dynamics of the system, even though the initial conditions are simple. It may well be that this kind of chaos is central to physical phenomena such as fluid turbulence. Any changes done anywhere in the system affects all parts of the cellular automata, which is a good property for cryptography.

The patterns generated by rule 30 cellular automata yield a complicated pattern even if the ca is started from a single nonzero site. Whether there is randomness in the initial conditions, rule 30 , randomness s always generated. With rule 22, for example, there is randomness only if initial condition is randomness. Rule 30 does not repeat with any short period or show any obvious structure for almost all initial conditions.

Figure 7. Pattern produced by Rule 30

In rule 30 cellular automaton, every cell in effect actively contributes to the randomness. But in a system that just amplifies randomness from the environment. The more components that are involved in the process of amplification, the slower it will typically be to get each new piece of random output.

7.2 Properties of Rule 30 Cellular Automata

A cellular automaton is a discrete dynamical system. Space, time, and the states of the system are discrete. Each point in a regular spatial lattice, called a cell, can have any one of a finite number of states. The states of the cells in the lattice are updated according to a local rule. The state of a cell at a given time depends only on its own state one time step previously, and the states of its nearby neighbors at the previous time step. Thus the state of the entire lattice advances in discrete time steps. Since the update rule is simple, local, and discrete, it can be executed in easily-constructed massively-parallel hardware at extraordinary speeds, without round-off errors.

Rule 30 can expressed in binary as (0001110)₂.

The rules associated with rule 30a re:

111     110     101     100     011     010     001     000

0           0         0          1          1          1          1          0

Rule 30: a_i’ = a_i-1 Å (a_i Ú a_i+1)

Rule 30 – the pattern produced always repeats, but the period of repetition can rapidly increase rapidly with the size of the system. Keeping a limited number of digits at each step makes it inevitable that the sequences produced will eventually repeat.

8. Encryption Using Cellular Automata

The inherent properties of rule 30 to generate randomness so easily and simply would make it an excellent cipher candidate for cryptography. The security of a cryptographic system based on rule 30 relies on the difficulty of finding the seed from a time sequence of cell values. The idea [9] is to generate an encrypting sequence by sampling the evolution of the CA, starting from initial conditions that are define by a key.

Figure 8. Encryption using a repeated encryption key [9].

Figure 8 is a simple example of substitution cipher, where is the encryption sequence is repeated keyword as long as the message.

The encryption sequence is: 1000000

The message is:                   1100110011001100110011001100110011001100

(Encryption sequence)         0001000000100000010000001000000100000010

The ciphertext is:                   1101110011101100100011000100110111001110

Cryptanalysis is simple: one needs to use frequency analysis if one knows the language text used. Sometimes at the start of a message the presence of greetings might help to find out the relationship by comparing the message and its corresponding ciphertext.

The idea is to generate an encrypting sequence by sampling the evolution of the CA, starting from initial conditions that are define by a key. In figure 9, the key is the initial condition.

Figure 9. Encryption using rule 60 [9].

The cryptanalysis is based on the properties of additive cellular automata. In additive cellular automaton, the nested structures make it possible to recognize regularities. An additive cellular automaton is a cellular automaton whose rule is compatible with an addition of states. The addition is derived from modular arithmetic. Additive rules allow the evolution for different initial conditions to be computed independently, then the results combined by simply adding.

Figure 10. Evolution of rule 60 both downward and sideways [9].

In additive cellular automata the underlying rule allows to deduce the form of a particular row and column from the row above and from the column to the right respectively. One can find the form of a column in the cellular automaton, if given some segment of the encrypting sequence corresponding to a particular column. These information is enough to deuce the form of a row in the original cellular automata, and the latter leads to the key.

Figure 11 shows the encryption based on rule 30 cellular automata. Here the encryption sequence is taken as the center column of the automata starting with a simple initial condition, a single black color and a width of 15 cells.

The sequences appear random with respect to all analyses. In order to determine the color of a cell from the colors of its neighbor columns is the same as enumerating all possible initial conditions.

The security of a cryptographic system based on rule 30 relies on the difficulty of finding the seed from a time sequence of cell values. The problem of finding whether initial conditions exist to make the cellular automaton produce a certain outcome is NP-complete. Wolfram argues that the only problem factoring of integers is not NP-complete. Problems whose only solution algorithms have at least exponential order are called intractable [1]. NP problems are intractable with deterministic (conventional or serial) computers, but can be solved using non-deterministic (massively parallel) computers, in NP time.

Figure 11. Encryption using rule 30 [9].

Since rule 30 is not additive we cannot do cryptanalysis based on the additive rules. However, it has one sided additive property. A cryptanalysts needs two adjacent columns to the right instead of one to produce a single column.

The idea is to generate an encrypting sequence by sampling the evolution of the CA, starting from initial conditions that are define by a key. In additive cellular automaton, the nested structures make it possible to recognize regularities.

Figure 12. Evolution of rule 30 both downward and sideways [9].

In a) the ordinary evolution of row is shown. In b) show evolution to the to the left starting form two columns. This leads to some level of cryptanalysis because of the one-sided additive property of rule 30. c) shows how a second column can be filled from a row of cells to the right.

9. Conclusion

The security of many cryptographic systems depends upon the generation of random numbers. The number generated must be big and the probability of any particular value being selected must be sufficiently small.

Cryptography is extremely sensitive to the properties of random number generators. For cryptographic randomness [7]:

The sequence should be unpredictable. It must be computationally infeasible to predict the next random bit will be given complete knowledge of the algorithm generating the sequence and all the previous bits in the stream.

A sequence is real random if it cannot be reliably reproduced. If we can run the sequence generator twice with the exact same input we will get two completely unrelated random sequences.

We have looked at various techniques of generating random numbers. Random sequences produced by physical processes, since they are governed by physical laws, are deterministic. The sequence generated by LCG looks random. To avoid shorter period, one has to make the modulus greater. Having maximal repetition does not imply maximum randomness. One can essentially recover the seed from the sequence in polynomial time [7].

The cellular automaton generator’s behavior [7] appears to be quite random. However, there is a known plaintext attack against these generators. He continues to say that it is proved that the output of a CA can also be generated by LFSR of equal length and therefore no more secure.

10. References

[1] Epp, S. Discrete Mathematics with Applications. Pacific Grove. Brooks, 1995.

[2] Knudsen, J. Java Cryptography. Sebastopol: O’Reilly, 1998.

[3] Knuth, D. The Art of Computer Programming. Vol II. 1981.

[4] Menezes, A. van Oorschot, P., Vanstone, S. Handbook of Applied Cryptography CRC, 1997.

[5] National Bureau of Standards. Data Encryption Standard. US Department of

Commerce, FIPS, pub. 46, January 1977.
[6] Schaefer, E. An introduction to cryptography. Santa Clara University

eschaefer@scuacc.scu.edu

[7] Schneier, B. Applied Cryptography Wiley, 1996.

[8] Stallings, W. Cryptography and Internet Security: Principles and Practices. Upper Saddle River: Prentice, 1999.

[9] Wolfram, S. A New Kind of Science. Canada: Wolfram Media, 2002.

[10] Wolfram, S. Cellular Automata and Complexity. 1994.

All NKS-SJSU Applets are Open Source Shareware, Papers are Copyrighted to their Authors.