Angular shitstorm: What's your opinion on the controversial plans for Angular 2.0?
Getting Clojure

Clojure: a whistle stop tour

NealeSwinnerton
clojure.1

Neale Swinnerton guides us through sequences, lazy evaluation and higher order functions, without going around in circles. Or maybe we will?

 More and more people are getting on board with Clojure for functional programming. Used right, it can speed up development, and simply the coding process – and although it sits on the JVM, looks nothing like Java – which may seal the deal for many a developer. In this article from August’s JAX Magazine, Neale Swinnerton gives us a deep-dive into this increasingly popular language. 

Clojure is a dynamic programming language, that targets the Java Virtual Machine and, increasingly, other virtual machines such as the CLR and various Javascript engines.

 

It’s a dialect of Lisp and thus predominantly a functional language with a strong focus on solving the sorts of problems developers face day-to-day designing modern applications. It has powerful built-in immutable data structures and consistent abstractions for processing data in those structures.

This tutorial will introduce some of the key concepts in Clojure. We will start by defining a simple abstraction to encrypt and decrypt text. We will then develop several implementations of the abstraction and explore sequences, lazy evaluation, higher order functions and much more.We will meet common functions from the standard library and see how Clojure provides Alan Perlis’ idea that: It is better to have 100 functions operate on one data structure than 10 functions on 10 data structures.

Defining our data

We’re going to be dealing with text. Let’s start off by defining an alphabet of allowed characters. For clarity in the examples, let’s assume we have an alphabet which is a set containing only the 26 uppercase characters. This can be defined thus:

 

(def ALPHABET (into (hash-set) "ABCDEFGHIJKLMNOPQRSTUVWXYZ"))

 

LISTING 1

(deftype SubstitutionCipher [efn dfn]
 Encrypter ; provide implementation of Encrypter
 (encrypt [this text]
 (apply str (map efn (keep ALPHABET text))))
 (decrypt [this text]
 (apply str (map dfn (keep ALPHABET text)))))
LISTING 2

(defn- to-int
 “Convert char to an index starting at 0. if x is nil, return nil†
 [x]
 (when x
 (- (int x) (int A)))) ; subtract the ASCII value of 'A' from x


(defn- to-char
 “Convert an index into a char A-Z, if x is nil, return nil†
 [x]
 (when x
 (char (+ (int x) (int A))))) ; add the ASCII value of 'A' to x, convert to char
(defn- char-offset
 "Offset the character x by n, handling wrapping over the end of the alphabet."
 [n x]
 (-> (to-int x) (+ n) (mod 26) Math/abs to-char))

A few things are happening here. Firstly def is defining a var (a storage location basically) named ALPHABET. Next we are invoking the function into, passing two arguments, firstly (hash-set), which is a function call returning an empty set, and secondly a string containing the characters required. This is our first meeting with a very powerful feature in Clojure, namely sequences. There are dozens of standard functions that process sequences. In this case, into takes the given sequence (in this case the string, a sequence of characters) and adds each element to the set. Any duplicates would be eliminated. We could have generated the list of characters programmatically, but in this case it seems clearer to simply list them.

The choice of a set here is deliberate, because sets in Clojure have a useful property – in addition to being a data structure, they also act as a function that tests whether its ’ argument is a member of the set, if the element is a member the element is returned, else nil is return. In clojure everything is ‘truthy’ apart from nil and false.

 

(def abc (hash-set A B C)) ; a set containing chars A B C
(abc A) => A ; a ‘truthy’ value.
(abc Z) => nil ; a falsey value.

We’ll see how useful this is later.

Defining our abstraction

We need two operations. We define a protocol called Encrypter thus:

 

(defprotocol Encrypter
(encrypt [this text]) ; ‘this’ refers to the implementing type
(decrypt [this text]))

Protocols in Clojure are somewhat analogous to interfaces in Java, but avoid some of the drawbacks. For instance, it is possible to extend protocols to types that have no knowledge of the protocol.

Next we define a type which partially implements the mechanics of the process, as shown in Listing 1.

We are assuming here that we need to apply some transformation to each character of the plain text sequence to generate the cipher text sequence. Applying a function to each character of a sequence is achieved with the map function. map takes a function and a sequence and it returns a new sequence, where each element of the new sequence is the result of invoking the function on each element of the original sequence.

SubstitutionCipher is a type that take two arguments, both functions, an encrypt function (efn) and a decrypt function (dfn) and we implement each function in the protocol by map-ing the appropriate function over each element of the input sequence. We exercise a little defensive programming by passing the text through the keep function. keep is a variant of map, but it generates a sequence where each element is the function (the ALPHABET set in this case) applied to each element of the input sequence. Only values returning a non-nil value will be retained. This means we will ignore any characters not in our known alphabet. Finally we convert the cipher sequence back into a string for display.

Now we must define the transformation functions. A very simple transformation is ‘ROT13’ which imagines the alphabet arranged evenly around a circle, each character maps to the character opposite it on the circle. This has the useful property that the encryption is symmetric – the encrypt and decrypt functions are the same. We define some helper functions as shown in Listing 2: The char-offset function takes an offset and a character and returns the offsetted character, handling wrapping over the end of the alphabet. The implementation of char-offset uses the -> threading operator. This is some syntactic sugar, useful when applying multiple operations in turn. It translates to ‘take the first function call and thread it into the next call as the first argument, then take that and thread it into the next call. So the following occurs:

Now we must define the transformation functions. A very simple transformation is ‘ROT13’ which imagines the alphabet arranged evenly around a circle, each character maps to the character opposite it on the circle. This has the useful property that the encryption is symmetric – the encrypt and decrypt functions are the same. We define some helper functions as shown in Listing 2:

The char-offset function takes an offset and a character and returns the offsetted character, handling wrapping over the end of the alphabet. The implementation of char-offset uses the -> threading operator. This is some syntactic sugar, useful when applying multiple operations in turn. It translates to ‘take the first function call and thread it into the next call as the first argument, then take that and thread it into the next call. So the following occurs:

 

(-> (a) (b z) (c y) (d)) <=> (d (c (b a z) y)

So a function that created a rot13 encryptor/decryptor would be

 

(defn rot13-cipher []
(let [fn (partial char-offset 13)]
(->SubstitutionCipher fn fn)))

 

LISTING 3 

(deftype PolySubstitutionCipher [key]
 Encrypter
 (encrypt [this text]
 (apply str
 (map char-offset
 (keep (comp to-int ALPHABET)
 (cycle key)) text)))
 (decrypt [this text]
 (apply str
 (map char-offset
 (keep (comp negative to-int ALPHABET)
 (cycle key)) text))))

(defn vigenere-cipher [key]
 (->PolySubstitutionCipher key))

Here we create the actual encrypt/decrypt function and assign in the local name ‘fn’. partial is a function than returns a new function which calls the function char-offset with first argument ‘13’ and any other arguments passed in. Essentially we are capturing the offset such that the containing type doesn’t need to know about it. We then pass fn to the constructor of the ‘SubstitutionCipher’ type as both the encrypt and decrypt function. We can invoke the cipher thus:

 

(def rot13 (rot13-cipher))
(assert (= "PLAINTEXT" (decrypt rot13 (encrypt rot13 "PLAINTEXT"))))

when the encrypt or decrypt functions are called they are dispatched according to the type of their first argument, we could supply a different type or even extend the protocol to a java type that knew nothing about Clojure.

If we were so inclined we could generalise the offset by passing it as an argument. This would allow us to create an arbitrary caesar cipher, but note that we need to supply a different decrypt function because the encryption is not symmetric if the offset isn’t 13.

 

(defn caesar-cipher [n]
(let [efn (partial char-offset n)
dfn (partial char-offset (- n))]
(->SubstitutionCipher efn dfn)))

If a simple offset cipher wasn’t sufficient, we could generate a lookup table and create a deranged cipher:

 

(defn deranged-cipher []
(let [m (zipmap ALPHABET (shuffle ALPHABET))
m' (set/map-invert m)]
(->SubstitutionCipher (partial get m) (partial get m'))))

Here we use the zipmap function to create a new map (a key-> value data store, not to be confused with the map function earlier). zipmap takes two sequences. It returns a map where each element of the first sequence maps to the equivalent value in the second sequence. We then invert the map to generate the reverse lookup table. Note that the actual mappings are captured in the function, so only by having the particular instance of the SubstitutionCipher could you call decrypt and get correct results.

The ciphers discussed so far all share a weakness that each letter in the plaintext always maps to the same ciphertext letter.

 

(encrypt (rot13-cipher) "HELLO") => "URYYB")

This leaves them vulnerable to frequency analysis, where with knowledge of letter frequencies in the source language it’s possible to discover the mappings. The Vigenere cipher (aka ‘the undecipherable key’) addresses this by varying the map-ping for each letter in the plaintext, a so called polyalphabetic cipher. A repeating key phrase is used to determine the offset of each letter. To achieve this we need to alter our algorithm slightly. We do this with a new type, but implement the same protocol, as shown in Listing 3.

Note that in Listing 3, this type still implements the original protocol, so callers of this cipher are non the wiser.

We handle repeating the key phrase with the cycle function, which produces an infinite lazy sequence of the elements of key, returning to the start of the sequence as necessary. The ‘lazy’ part is crucial, each element is only realized as necessary.

A useful feature of the map function is that it can process more than one sequence at a time. Passing two sequences to mapresults in the mapping function being called with two arguments, one from each sequence. The function terminates when any input sequence is exhausted. Finally we compose a higher order function as the mapping function containing the functionality to check for membership of ALPHABET, convert to-int and (for decrypt) alter the direction of the offset. An invocation of this cipher, looks similar to those seen earlier…

 

(def venc (vigenere-cipher "PETER PIPER PICKED A PECK OF PICKLED PEPPER"))
(decrypt venc (encrypt venc "PLAINTEXT")) => "PLAINTEXT"
(encrypt venc "PLAINTEXT") => "EPTMEIMMX" ; letter frequency is obscured.

That concludes the whistle stop tour of Clojure, you’ve learnt about sequences, lazy evaluation and higher order functions. Find out more at the official Clojure site and the community maintained documentation website [1] [2].

References

[1] Official Clojure Site: http://clojure.org

[2] The Community maintained Documentation site: http://clojure-doc.org


Author
Neale Swinnerton has been writing code most days for the last 30 years or so. He is currently working at a startup using clojure to solve data science and machine learning problems to make the world a better place. Follow him on Twitter
Comments
comments powered by Disqus