Tea, milk, no
sugar. Thanks

Stuart Rutter, a developer at Square Enix in London. Enjoys cryptography, search algorithms, competitive coding, and start-up ventures.

Javascript
100%

PHP 5.x
100%

Objective-C (iOS)
100%

HTML5 & CSS3
100%

Node JS
90%

NRDBMS
90%

RDBMS
90%

Adobe CS
80%

Python
60%

Ruby
60%

Cryptography
50%

Chess
30%

Disclaimer: Howsoever expressed or interpereted, the opinions found herein do not represent those of my professional employer.

Pigeon message cipher identified

March 17th 2013

I was at the National Archives in Kew recently with the intention of researching the British WW2 cipher machine TypeX. Firstly to see what type of message indicator was used, and secondly to find out more about the structure of messages enciphered using a TypeX.

Furtunately WO 208/5109 came up with a few answers.

Bearing in mind that these are setup and usage instructions for the TypeX Mark VI - which was a portable hand-wound derivative of the TypeX.

before the message is commenced, a five-letter message setting on the scrambler, as indicated by the letters appearing at the windows in the cover of the scrambler, should be selected and the corresponding disguised indicator noted at the top of the message form.
Using the left-hand while turning the handle with the right, type the message exactly as it appears on the message form, giving the address, serial number, date and text of the message, the space key being depressed between every word of the plain language message. The message printer automatically spaces after every five letters of cypher have been printed. At the conclusion of the message sufficient letters must be added to the cypher message to make the last group into a complete five-letter group. Therefore note the counter reading and depress the space key, letters shift key, or figure shift key or a combination of the these until the last figure shown on the counter is either 5 or 0 (both coloured red). No use should be made of other keys for this purpose as their printing in the decyphered message might possibly lead to confusion. Finally, stick the tape on the pad and write the message setting in the manuscript as the first and final groups of the message.

The last bit of this extract is the important bit - it details that when constructing a cipher message using the TypeX, the first and last groups of the message are a five letter indicator. Just like we see in our pigeon message.

Preparation of message for transmission – Withdraw and detach the tape from the message printer and insert in manuscript, as the first group, the disguised message setting used at the commencement of the message. If the message is complete in one section write this disguised message setting and groups of five Q's must be inserted appropriately in the spaces left for that purpose. Each section must begin and end with the disguised message setting, in manuscript, appropriate to it. The tape should then be gummed to a message form so that there are ten groups in each line. This is done in order to facilitate the counting of the total number of groups.

So, I am fairly confident in saying that we are dealing with a message enciphered using a TypeX. What makes this all the more interesting is that I cannot find reference that the TypeX had ever been cracked by the enemy. So for someone to break this code, it would also carry the prize of being the first to crack a message enciphered using a TypeX.

At the very least, we can be happy about the fact that it is unlikely a one-time-pad cipher, because then we truly could presume it indecipherable.

comments powered by Disqus

TypeX Emulator in Javascript

March 13th 2013

This is a Javascript implementation of the British Rotor Cipher Machine - TypeX.

The code for this I have put on GitHub, usage is as follows.

Enciphering

var demo1 = TypeX.init('01234', '00100', 'AOAKN', 'This is the string to be encoded');
//demo1 == KLHESNYNIMQAZHIZROBHDZHKWRQFFRTY

Deciphering

var demo2 = TypeX.init('01234', '00100', 'AOAKN', 'KLHESNYNIMQAZHIZROBHDZHKWRQFFRTY');
//demo2 == THISXISXTHEXSTRINGXTOXBEXENCODED

Brute Force Decipher

The code in brute_force.js loops through every possible rotor position and orientation. It then inserts (in batches) the results into a MongoDB collection.

To run this Javascript you will need Node with the native Mongo module installed.

//Rotor orientations (5^2)
for (r1 = 0; r1 <= 1; r1++) {
for (r2 = 0; r2 <= 1; r2++) {
for (r3 = 0; r3 <= 1; r3++) {
for (r4 = 0; r4 <= 1; r4++) {
for (r5 = 0; r5 <= 1; r5++) {

//Rotor positions (5^7)
for (p1 = 0; p1 <=7; p1++) {
for (p2 = 0; p2 <=7; p2++) {
for (p3 = 0; p3 <=7; p3++) {
for (p4 = 0; p4 <=7; p4++) {
for (p5 = 0; p5 <=7; p5++) {

rotorPos = p1.toString() + p2 + p3 + p4 + p5;
rotorOri = r1.toString() + r2 + r3 + r4 + r5;

//Continue if rotor positions are not unique
if((/(\d)(?=.*\1)/).test(rotorPos)) continue;

var cipher_text = TypeX.init(rotorPos, rotorOri, 'AOAKN', 'KLHESNYNIMQAZHIZROBHDZHKWRQFFRTY');

The brute force method assumes that we know the indicator key.

In the above example, there are 215,000 possible rotor settings, after inserting them into our Mongo collection we can then reduce them by searching for cribs or by ordering them by a common bigram / trigram count.

Searching for ngrams

The ngram function I have included in brute_force.js uses an array of common English bigrams / trigrams that were outlined by an updated Mayzner frequency list, available here.

ngram = function(str,n) {
   var ngrams = [[],[],
                       ["th","he","in","er","an"], // 2grams
                       ["the","and","ing","ion","tio"] // 3grams
                ];

   var total = 0;
   for (var i in ngrams[n]) {
       var re = new RegExp(ngrams[n][i], 'gi');
       if((didMatch = str.match(re))) {
           total += didMatch.length;
       }
   }
   return total;

};

By adding this count to our collection when doing a brute force attack, we can more easily distinguish between a string that contains predominantly valid English words and one that contains random letters simply by sorting the results by their ngram count.

Further work

The majority of this code is based on that of the Enigma - their workings are very similar, with the exception of the rotor stepping intervals, and the rotor / stator arrangement. Also, rotors could be inserted in reverse.

A very helpful example, and one which I used to base my code on is available here (in C), which was a help in finding the rotor stepping intervals and reflector setup. However, without much solid documentation on the TypeX to go off we cannot assume that these are correct. For example, the reflector settings used: YRUHQSLDPXNGOKMIEBFZCWVJAT, is that of the Enigma Reflector "B". I know that the TypeX was very similar to the Enigma but I very much doubt it would have used the same reflector configuration.

Without a TypeX wiring diagram we cannot be sure what the correct settings are. Permuting through all possible rotor configurations is the easy bit, although if we cannot determine the reflector setup and stepping intervals of the rotors then the results would surely end up being spurious.

WW2 Pigeon Code - a machine cipher? Quite likely..

March 1st 2013

I think that the biggest hint toward finding out which method of encryption had been used to encode our pigeon message is the five letter indicator group seen at the beginning and end of the cipher text. That along with a 26 letter alphabet would allow us to tentatively rule out some possibilities.

Nick Pelling has, as ever, been hot on the trail of such enciphering possibilities, and has come to the probable conclusion that it isn't a Slidex.

I am now under the impression, however, that a cipher machine had been used... for reasons I shall try and articulate below.

Cipher machines that used a 5 letter indicator are the American ECM Mark II, and British TypeX.

The M-209 possibly used a five letter indicator by using the first letter twice to set the first two rotors. In the Operating Instructions for CSP-1500 (aka M-209):

In choosing a message indicator the operator writes down a random selection of five letters, subject to the limitations of the letters actually present on the key wheels. Since the first of these five letters governs the setting of the first two wheels, counting from left to right, any letter from A to Z, except W, can be used as the first letter of the indicator. For the second letter, governing wheel number 3, the choice ranges from A to X, omitting W. For wheel number 4 (third letter) any letter front A to U may be selected, for wheel number 5 (fourth letter) any letter from A to S, and for wheel number 6 (fifth letter) any letter from A to Q. An example of such a message indicator would be HILQD and the wheels would be adjusted so that the letters HHILQD would be lined up along the bench mark.

From the operating instructions of the ECM Mark II:

The message indicator consists of five letters selected at random by the operator. Bona fide five-letter words will not be used as message indicators even though such words occur by chance. The message indicator will be different for each message or part. When it is necessary, as in the case of a service, to reencipher a particular message or part, or any portion thereof, a different message indicator will be selected. The message indicator is used to determine the message rotor alignment.

The TypeX was similar to the ECM in that it had five rotors, but the last two rotors (stators) did not advance, so it was in effect equivalent to a three rotor Enigma with the added complexity of variable stepping between rotors, and rotors could be inserted in reverse.

Image reference: Kew HS 7/71.

TypeX models (previous to 1941) didn't have a plugboard, instead the two stators served as this complexity. There were also portable versions for use in the field which were similar to the American M-209.

The TypeX was used by the British, similarly America had their ECM or SIGABA cipher machine, yet for Allied operations where the Allied Forces were to share intelligence there was no common cipher machine, one that could encode / decode each other's messages.

The British wanted the Americans to use their TypeX, yet the Americans didn't want to share their top secret ECM Mark II with the British. The ECM was never broken during WWII, which goes to stand why they were so protective of their cipher machine.

So to bridge the gap between the TypeX / ECM and as a way of sharing Allied intelligence another cipher machine was developed called the Combined Cipher Machine (CCM). The CCM was developed in America and was in use by the British and Allied forces by April 1944. CCM was therefore built to be compatible with both the British TypeX and American ECM via adaptations to their existing machines, borne the TypeX 23 and the ASAM 5.

An interesting snippet of information on the TypeX offered by the Offices of The Cypher Policy Board, which is a retrospect to the use of cipher machines during WWII (Kew HS 7/71):

“we have fought this war on one type of Cypher Machine only, namely, TypeX”

Unlike SOE agents, who relied heavily on WOK's, Playfair, and One-Time-Pads, it was imperative that the Army had quick methods of encoding and relaying information, one without the time consuming and error prone nature of these hand written substitution ciphers. Even the double transposition cipher fell under the category of being too cumbersome to be practical.

For low-grade ciphers like double transposition and Linex, a numerical indicator was used which referenced keys held by both parties, similarly a WOK, Playfair, and OTP used very different indicators than what we see in this message.

A five letter indicator such as AOAKN therefore fits the bill for a five rotor cipher machine quite nicely. This would indicate the initial rotor configuration (similar to that of ECM's message indicator above), and in combination with the time stamp seen at the end of the message which would be used by the receiver to look up the rotor orientation and rotor positions from a pre-compiled book of day keys.

The TypeX (or TypeX with CCM adaptation) is therefore a largely plausible candidate which could have been used to encode our message.

As the below image suggests (Kew HW 40/89), there could easily be added complexities to this message, e.g. a disguised indicator, or letter shifts. We also cannot be sure which model of TypeX this was encoded on.

A cryptographic book mystery.

February 11th 2013

Having now read a number of books on cryptography, it seems that a few authors in particular have left hidden mysteries within their pages.

Dictionary codes, or numbered / lettered pages can be seen at regular intervals, so far I have come across four books which contain these intriguing alphanumeric patterns.

The four books that I have found to contain these dictionary codes are:

  1. Fletcher Pratt – Secret And Urgent, The story of codes and ciphers (1939)
  2. John Laffin - Codes and Ciphers (1964)
  3. Alexander D'Agapeyeff – Codes and Ciphers (1939 and 1949 editions)
  4. Alexander D'Agapeyeff & E. C. R. Hadfield – Maps (1942)

I'm sure that there are other books out there that contain these dictionary codes, although the reason that they are there is indeed puzzling. Not in any of the aforementioned titles is there an explanation, or hint toward their use by their authors.

Could they be some form of easter egg, set as a challenge for the reader?

Maybe it was the in-thing to do, when writing a book on the subject of cryptography, that only certain circles of authors were privy to.

I am no antiquarian, so to my untrained eye they could be some form of appendices, however, there is a consistency of between 8 or 16 pages from one letter to the next. In D'Agapeyeff's book Codes and Ciphers, the first edition has a spacing of 8 pages, whereas subsequent editions have 16.

Do you know of any other cryptographic books containing these mysterious dictionary codes? Or do you have any information as to why they exist?

Here are some examples:

Codes and Ciphers by John Laffin (1964)

A numerical postfix, 2-9, at intervals of 16 pages, starting on page 7.

Codes and Ciphers by Alexander D'Agapeyeff (1939)

An alphabetical postfix, B-U (omitting J), at intervals of 8 pages, starting on page 9.

Codes and Ciphers by Alexander D'Agapeyeff (1949, revised and reset edition)

An alphabetical postfix, B-I, at intervals of 16 pages, starting on page 17.

Secret and Urgent, The story of Codes and Ciphers by Fletcher Pratt (1939)

An alphabetical postfix, B-S, (omitting J) at intervals of 16 pages, starting on page 17.

Maps by Alexander D'Agapeyeff & E.C.R. Hadfield (1942)

An alphabetical postfix, B-I, at intervals of 16 pages, starting on page 17.

D'Agapeyeff Cipher not solved, FACT!

January 4th 2013

The D'Agapeyeff cipher is a cryptography challenge set by Alexander D'Agapeyeff in First Editions of his book Codes and Ciphers. It has since been removed from later editions and is yet to be broken.

75628 28591 62916 48164 91748 58464 74748 28483 81638 18174 74826 26475 83828 49175 74658 37575 75936 36565 81638 17585 75756 46282 92857 46382 75748 38165 81848 56485 64858 56382 72628 36281 81728 16463 75828 16483 63828 58163 63630 47481 91918 46385 84656 48565 62946 26285 91859 17491 72756 46575 71658 36264 74818 28462 82649 18193 65626 48484 91838 57491 81657 27483 83858 28364 62726 26562 83759 27263 82827 27283 82858 47582 81837 28462 82837 58164 75748 58162 92000

Upon trawling the internet for clues and attempts at breaking this cipher, I was led to a blog article Alexander D'Agapeyeff Cryptogram is SOLVED, which to all intents and purposes seems like a reasonable attempt, however the author couldn't have been more wrong!

Here is an explanation of the authors workings and why it is ultimately flawed.

Firstly he has split the cipher digits into bigrams:

75 62 82 85 91 62 91 64 81 64 91 74 85 84 64 74 74 82 84 83 81 63 81 81 74 74 82 62 64 75 83 82 84 91 75 74 65 83 75 75 75 93 63 65 65 81 63 81 75 85 75 75 64 62 82 92 85 74 63 82 75 74 83 81 65 81 84 85 64 85 64 85 85 63 82 72 62 83 62 81 81 72 81 64 63 75 82 81 64 83 63 82 85 81 63 63 63 04 74 81 91 91 84 63 85 84 65 64 85 65 62 94 62 62 85 91 85 91 74 91 72 75 64 65 75 71 65 83 62 64 74 81 82 84 62 82 64 91 81 93 65 62 64 84 84 91 83 85 74 91 81 65 72 74 83 83 85 82 83 64 62 72 62 65 62 83 75 92 72 63 82 82 72 72 83 82 85 84 75 82 81 83 72 84 62 82 83 75 81 64 75 74 85 81 62 92 00 0

And then by doing a frequency count of the first and second digits we can see that:

First:

  • 8 (80)
  • 6 (56)
  • 7 (41)
  • 9 (18)
  • 0 (4)

Second:

  • 2 (46)
  • 5 (45)
  • 4 (43)
  • 3 (39)
  • 1 (33)

I have listed these in frequency order, however the order used to construct a grid in the authors implementation was 8,6,7,9,0 and 2,4,5,1,3. I do not know what the reasoning behind this order was, but for the time being I'm going to roll with it.

 24513 8ABCDE 6FGHIKI/J Combined 7LMNOP 9QRSTU 0VWXYZ

Using this grid, we then take the original cipher and substitute the digit pairs for their corresponding letters in the grid (e.g. 7,5 = N):

NFACT FTGDG TMCBG MMABE DKDDM MAFGN EABTN MHENN NUKHN DKDNC NNGFA QCMKA NMEDH DBCGC GCCKA LFEFD DLDGK NADGE KACDK KKWMD TTBKC BHGCH FRFFC TCTMT LNGHN OHEFG MDABF AGTDU HFGBB TECMT DHLME ECAEG FLFHF ENQLK AALLE ACBNA DELBF AENDG NMCDF Q

Coincidentally (in my opinion), this generates a cipher text containing the keyword FACT. Tendentious as it may be, it seems like a genuine enough keyword, so lets go with it.

The next step of working was to overlay the word FACT with the above cipher text and extract the corresponding values (ignoring the first block of letters assumed to be N = null, followed by FACT the keyword):

FACT FTGD GTMC BGMM ABED KDDM MAFG

I've not listed them all for the sakes of brevity. From this we can see that column F is: F, G, B.. Column A: T, T, G.. etc.

The next step is where this working is significantly flawed. By laying the keyword FACT on top of the alphabet and then transposing the corresponding letter from our columns F,A,C,T.

FACT FACT FACT FACT FACT FACT FA ABCD EFGH IJKL MNOP QRST UVWX YZ

The author has then taken the columns F,A,C,T, and read the corresponding value from the above alphabet in a rather peculiar way:

F:

F = A1st letter, 1st group G = E1st letter, 2nd group B = I1st letter, 3rd group

A:

T = B2nd letter, 1st group T = F2nd letter, 2nd group G = J2nd letter, 3rd group

C:

G = C3rd letter, 1st group M = G3rd letter, 2nd group M = K3rd letter, 3rd group

T:

D = D4th letter, 1st group C = H4th letter, 2nd group M = L4th letter, 3rd group

When we reconstruct this it is no surprise that we then get the alphabet in return! Oh dear me.

FTGDABCD GTMCEFGH