I recently saw some fancy Perl one-liners and ed/sed scripts that do some incredible stuff with little to no effort. This inspired me to create one myself: A one-liner that decrypts a substitution cipher. In short, a substitution cipher is one of the more simpler (and also more unsafe) ciphers. It works by simply substituting one letter by another. Even though it is practically not breakable by brute-forcing it (due to its large keyspace: 288), it is prone to crypto analysis. Calculating the letter frequency of a cipher allows one to replace each letter’s position with the corresponding frequency position in a given language1.
The following is the substitution cipher aka. 00_chiffrat2.txt:
ac tpr p hkamec qdng gpw af pxkan, pfg cei qndqlr tiki
rckalafm ceakciif. tafrcdf roace, ear qeaf fvbbnig afcd ear
hkiprc af pf iuudkc cd irqpxi cei sani tafg, rnaxxig jvaqlnw
cekdvme cei mnprr gddkr du saqcdkw opfradfr, cedvme fdc
jvaqlnw ifdvme cd xkisifc p rtakn du mkaccw gvrc ukdo ifcikafm
pndfm tace eao.
cei epnntpw roinc du hdanig qphhpmi pfg dng kpm opcr. pc dfi
ifg du ac p qdndvkig xdrcik, cdd npkmi udk afgddk garxnpw, epg
hiif cpqlig cd cei tpnn. ac gixaqcig raoxnw pf ifdkodvr upqi,
odki cepf p oicki tagi: cei upqi du p opf du phdvc udkcw-uasi,
tace p eipsw hnpql odvrcpqei pfg kvmmignw epfgrdoi uipcvkir.
tafrcdf opgi udk cei rcpakr. ac tpr fd vri ckwafm cei nauc.
isif pc cei hirc du caoir ac tpr ringdo tdklafm, pfg pc
xkirifc cei iniqckaq qvkkifc tpr qvc duu gvkafm gpwnamec
edvkr. ac tpr xpkc du cei iqdfdow gkasi af xkixpkpcadf udk
epci tiil. cei unpc tpr risif unamecr vx, pfg tafrcdf, ted tpr
ceakcw-fafi pfg epg p spkaqdri vnqik phdsi ear kamec pflni,
tifc rndtnw, kircafm risikpn caoir df cei tpw. df ipqe
npfgafm, dxxdraci cei nauc-repuc, cei xdrcik tace cei ifdkodvr
upqi mpbig ukdo cei tpnn. ac tpr dfi du cedri xaqcvkir teaqe
pki rd qdfckasig cepc cei iwir udnndt wdv phdvc teif wdv odsi.
ham hkdceik ar tpcqeafm wdv, cei qpxcadf hifipce ac kpf
The one-liner to decrypt given cipher:
tr $(sed 's/[.[:space:],:\-]//g' 00_chiffrat2.txt | fold -w1 | sort | uniq -c | sort -nr | awk '{ printf "%s", $2 }') 'etoainshrldwfcugympbvkzq' < 00_chiffrat2.txt
To be fair, this is a pretty long one-liner and it also looks rather obfuscated. Lets break it down one by one and see what the individual parts do.
Since command substitution is used → $(…), it is best to start there. This
is also the part that does the most work, that is calculating the frequency
of letters:
sed 's/[.[:space:],:\-]//g' 00_chiffrat2.txt | fold -w 1 | sort | uniq -c | sort -nr | awk '{ printf "%s", $2 }'
Lets break this down even further and start with the sed part:
sed 's/[.[:space:],:\-]//g' 00_chiffrat2.txt
Here we use sed to sanatize the the chipher text. By using «s» which is the
substitution command of sed. We are replacing all given characters with
nothing, thereby deleting them. The form looks as follows:
s/regexp/replacement/scope. This means deleting «.», «[:space:]» (whitespaces),
«,» and «-» which we escaped using: «\». The scope is
«g» which means global—so everywhere the regex appears, it is replaced.
This leaves us with the following output, which is only text:
actprphkamecqdnggpwafpxkanpfgceiqndqlrtikirckalafmceakciiftafrcdfroaceearqeaffvbbnigafcdearhkiprcafpfiuudkccdirqpxiceisanitafgrnaxxigjvaqlnwcekdvmeceimnprrgddkrdusaqcdkwopfradfrcedvmefdcjvaqlnwifdvmecdxkisifcprtakndumkaccwgvrcukdoifcikafmpndfmtaceeao
ceiepnntpwroincduhdanigqphhpmipfgdngkpmopcrpcdfiifgduacpqdndvkigxdrcikcddnpkmiudkafgddkgarxnpwepghiifcpqligcdceitpnnacgixaqcigraoxnwpfifdkodvrupqiodkicepfpoickitagiceiupqidupopfduphdvcudkcwuasitacepeipswhnpqlodvrcpqeipfgkvmmignwepfgrdoiuipcvkirtafrcdfopgiudkceircpakractprfdvrickwafmceinaucisifpcceihircducaoiractprringdotdklafmpfgpcxkirifcceiiniqckaqqvkkifctprqvcduugvkafmgpwnamecedvkractprxpkcduceiiqdfdowgkasiafxkixpkpcadfudkepcitiilceiunpctprrisifunamecrvxpfgtafrcdftedtprceakcwfafipfgepgpspkaqdrivnqikphdsiearkamecpflnitifcrndtnwkircafmrisikpncaoirdfceitpwdfipqenpfgafmdxxdraciceinaucrepucceixdrciktaceceiifdkodvrupqimpbigukdoceitpnnactprdfiducedrixaqcvkirteaqepkirdqdfckasigcepcceiiwirudnndtwdvphdvcteifwdvodsihamhkdceikartpcqeafmwdvceiqpxcadfhifipceackpf
For the next step we need to put all letters on a single line. For this we pipe
the outputs from the previous sed command into fold
using the «|»-symbol. The -w 1 specifies the width (1 in our case) to be used
instead of the default 80 columns. This is required to sort them alphabetically
afterwards:
sed 's/[.[:space:],:\-]//g' 00_chiffrat2.txt | fold -w 1
The output:
a
c
t
p
r
…
a
c
k
p
f%
Now we can sort all letters of the cipher alphabetically by piping everything into sort:
sed 's/[.[:space:],:\-]//g' 00_chiffrat2.txt | fold -w 1 | sort
The result is:
a
a
a
a
…
x
x
x
x
x
Using uniq with the -c flag we can count repeating occurrences of each letter:
sed 's/[.[:space:],:\-]//g' 00_chiffrat2.txt | fold -w 1 | sort | uniq -c
Now every letter has its frequency displayed to the left:
1
73 a
3 b
103 c
82 d
59 e
66 f
35 g
14 h
110 i
2 j
57 k
9 l
25 m
41 n
21 o
81 p
30 q
60 r
13 s
31 t
30 u
26 v
21 w
19 x
Lets sort it so that the frequency is descending. Again using sort but now
using the -nr flag—which you guessed it, sorts numbers:
sed 's/[.[:space:],:\-]//g' 00_chiffrat2.txt | fold -w 1 | sort | uniq -c | sort -nr
Sorted with the highest frequency at the top:
110 i
103 c
82 d
81 p
73 a
66 f
60 r
59 e
57 k
41 n
35 g
31 t
30 u
30 q
26 v
25 m
21 w
21 o
19 x
14 h
13 s
9 l
3 b
2 j
1
The last step in our letter frequency analysis is to create a string on a single line. This string represents the order starting with the most frequent and ending with the least frequent letter. Here we leverage the powers of AWK which is a programming language by its own.
sed 's/[.[:space:],:\-]//g' 00_chiffrat2.txt | fold -w 1 | sort | uniq -c | sort -nr | awk '{ printf "%s", $2 }'
This gets us a letter frequency representation:
icdpafrekngtuqvmwoxhslbj%
Finally we can use tr which lets us substitute each letter from the first string (calculated frequency of the cipher) with each letter from the second string (relative letter frequency). For the sake of readability, lets insert the calculated letter frequency instead of displaying the entire command substitution:
tr 'icdpafrekngtuqvmwoxhslbj' 'etoainshrldwfcugympbvkzq' < 00_chiffrat2.txt
The decrypted text:
it was a bright cold day in april, and the clocks were striking thirteen. winston smith, his chin nuzzled into his breast in an effort to escape the vile wind, slipped quickly through the glass doors of victory mansions, though not quickly enough to prevent a swirl of gritty dust from entering along with him.
the hallway smelt of boiled cabbage and old rag mats. at one end of it a coloured poster, too large for indoor display, had been tacked to the wall. it depicted simply an enormous face, more than a metre wide: the face of a man of about forty-five, with a heavy black moustache and ruggedly handsome features. winston made for the stairs. it was no use trying the lift. even at the best of times it was seldom working, and at present the electric current was cut off during daylight hours. it was part of the economy drive in preparation for hate week. the flat was seven flights up, and winston, who was thirty-nine and had a varicose ulcer above his right ankle, went slowly, resting several times on the way. on each landing, opposite the lift-shaft, the poster with the enormous face gazed from the wall. it was one of those pictures which are so contrived that the eyes follow you about when you move. big brother is watching you, the caption beneath it ran. %
Notes
Keep in mind strings representing the letter frequencies need to be the same length when using tr. Otherwise this won’t work. Using the official relative letter frequency of the English language is slightly different to the one used by me—I swapped letters that have similar frequencies. Here the theory doesn’t match reality. Using a longer text might solve this.
Paar, C. (2016). Kryptografie verständlich. Springer-Verlag, Berlin, Heidelberg. S. 7-10. isbn: 9783662492970. doi: 10.1007/978-3-662-49297-0. ↩︎