Getting Clojure

Clojure: a whistle stop tour

NealeSwinnerton
clojure.1

Neale Swinnerton guides us through sequences, lazy evaluation and higher order functions, without going around in circles. Or maybe we will?

 More
and more people are getting on board with Clojure for functional
programming. Used right, it can speed up development, and simply
the coding process – and although it sits on the JVM, looks nothing
like Java – which may seal the deal for many a developer. In this
article from August’s JAX Magazine, Neale Swinnerton gives us a
deep-dive into this increasingly popular language. 

Clojure is a dynamic programming language, that targets the Java
Virtual Machine and, increasingly, other virtual machines such as
the CLR and various Javascript engines.

 

It’s a dialect of Lisp and thus predominantly a
functional language with a strong focus on solving the sorts of
problems developers face day-to-day designing modern applications.
It has powerful built-in immutable data structures and consistent
abstractions for processing data in those structures.

This tutorial will introduce some of the key
concepts in Clojure. We will start by defining a simple abstraction
to encrypt and decrypt text. We will then develop several
implementations of the abstraction and explore sequences, lazy
evaluation, higher order functions and much more.We will meet
common functions from the standard library and see how Clojure
provides Alan Perlis’ idea that: It is better to have 100
functions operate on one data structure than 10 functions on 10
data structures.

Defining our data

We’re going to be dealing with text. Let’s start
off by defining an alphabet of allowed characters. For clarity in
the examples, let’s assume we have an alphabet which is a set
containing only the 26 uppercase characters. This can be defined
thus:

 

(def ALPHABET (into (hash-set) "ABCDEFGHIJKLMNOPQRSTUVWXYZ"))

 

LISTING 1

(deftype SubstitutionCipher [efn dfn]
 Encrypter ; provide implementation of Encrypter
 (encrypt [this text]
 (apply str (map efn (keep ALPHABET text))))
 (decrypt [this text]
 (apply str (map dfn (keep ALPHABET text)))))
LISTING 2

(defn- to-int
 “Convert char to an index starting at 0. if x is nil, return nil†
 [x]
 (when x
 (- (int x) (int A)))) ; subtract the ASCII value of 'A' from x


(defn- to-char
 “Convert an index into a char A-Z, if x is nil, return nil†
 [x]
 (when x
 (char (+ (int x) (int A))))) ; add the ASCII value of 'A' to x, convert to char
(defn- char-offset
 "Offset the character x by n, handling wrapping over the end of the alphabet."
 [n x]
 (-> (to-int x) (+ n) (mod 26) Math/abs to-char))

A few things are happening here. Firstly
def is defining a var (a storage location basically) named
ALPHABET. Next we are invoking the function into, passing two
arguments, firstly (hash-set), which is a function call
returning an empty set, and secondly a string containing the
characters required. This is our first meeting with a very powerful
feature in Clojure, namely sequences. There are dozens of standard
functions that process sequences. In this case, into takes
the given sequence (in this case the string, a sequence of
characters) and adds each element to the set. Any duplicates would
be eliminated. We could have generated the list of characters
programmatically, but in this case it seems clearer to simply list
them.

The choice of a set here is deliberate, because
sets in Clojure have a useful property – in addition to being a
data structure, they also act as a function that tests whether its
’ argument is a member of the set, if the element is a member the
element is returned, else nil is return. In clojure everything is
‘truthy’ apart from nil and false.

 

(def abc (hash-set A B C)) ; a set containing chars A B C
(abc A) => A ; a ‘truthy’ value.
(abc Z) => nil ; a falsey value.

We’ll see how useful this is later.

Defining our abstraction

We need two operations. We define a protocol
called Encrypter thus:

 

(defprotocol Encrypter
(encrypt [this text]) ; ‘this’ refers to the implementing type
(decrypt [this text]))

Protocols in Clojure are somewhat analogous to
interfaces in Java, but avoid some of the drawbacks. For instance,
it is possible to extend protocols to types that have no knowledge
of the protocol.

Next we define a type which partially implements
the mechanics of the process, as shown in Listing 1.

We are assuming here that we need to apply some
transformation to each character of the plain text sequence to
generate the cipher text sequence. Applying a function to each
character of a sequence is achieved with the
map
function. map takes a function and a
sequence and it returns a new sequence, where each element of the
new sequence is the result of invoking the function on each element
of the original sequence.

SubstitutionCipher is a type that take
two arguments, both functions, an encrypt function (efn) and a
decrypt function (dfn) and we implement each function in the
protocol by
map-ing the appropriate
function over each element of the input sequence. We exercise a
little defensive programming by passing the text through the

keep function. keep is a
variant of map, but it generates a sequence where each element is
the function (the ALPHABET set in this case) applied to each
element of the input sequence. Only values returning a non-nil
value will be retained. This means we will ignore any characters
not in our known alphabet. Finally we convert the cipher sequence
back into a string for display.

Now we must define the transformation functions.
A very simple transformation is ‘ROT13’ which imagines the alphabet
arranged evenly around a circle, each character maps to the
character opposite it on the circle. This has the useful property
that the encryption is symmetric – the encrypt and decrypt
functions are the same. We define some helper functions as shown in
Listing 2: The char-offset function takes an offset and a
character and returns the offsetted character, handling wrapping
over the end of the alphabet. The implementation of
char-offset uses the -> threading operator. This is
some syntactic sugar, useful when applying multiple operations in
turn. It translates to ‘take the first function call and thread it
into the next call as the first argument, then take that and thread
it into the next call. So the following occurs:

Now we must define the transformation functions.
A very simple transformation is ‘ROT13’ which imagines the alphabet
arranged evenly around a circle, each character maps to the
character opposite it on the circle. This has the useful property
that the encryption is symmetric – the encrypt and decrypt
functions are the same. We define some helper functions as shown in
Listing 2:

The char-offset function
takes an offset and a character and returns the offsetted
character, handling wrapping over the end of the alphabet. The
implementation of
char-offset uses the ->
threading operator. This is some syntactic sugar, useful when
applying multiple operations in turn. It translates to ‘take the
first function call and thread it into the next call as the first
argument, then take that and thread it into the
next 
call. So the following occurs:

 

(-> (a) (b z) (c y) (d)) <=> (d (c (b a z) y)

So a function that created a rot13
encryptor/decryptor would be

 

(defn rot13-cipher []
(let [fn (partial char-offset 13)]
(->SubstitutionCipher fn fn)))

 

LISTING 3 

(deftype PolySubstitutionCipher [key]
 Encrypter
 (encrypt [this text]
 (apply str
 (map char-offset
 (keep (comp to-int ALPHABET)
 (cycle key)) text)))
 (decrypt [this text]
 (apply str
 (map char-offset
 (keep (comp negative to-int ALPHABET)
 (cycle key)) text))))

(defn vigenere-cipher [key]
 (->PolySubstitutionCipher key))

Here we create the actual encrypt/decrypt
function and assign in the local name ‘fn’.
partial
is a function than returns a new function which calls the
function char-offset with first argument ‘13’ and any other
arguments passed in. Essentially we are capturing the offset such
that the containing type doesn’t need to know about it. We then
pass fn to the constructor of the ‘SubstitutionCipher’
type as both the encrypt and decrypt function. We can invoke the
cipher thus:

 

(def rot13 (rot13-cipher))
(assert (= "PLAINTEXT" (decrypt rot13 (encrypt rot13 "PLAINTEXT"))))

when the encrypt or decrypt functions are called
they are dispatched according to the type of their first argument,
we could supply a different type or even extend the protocol to a
java type that knew nothing about Clojure.

If we were so inclined we could generalise the
offset by passing it as an argument. This would allow us to create
an arbitrary caesar cipher, but note that we need to supply
different decrypt function because the encryption is
not symmetric if the offset isn’t 13.

 

(defn caesar-cipher [n]
(let [efn (partial char-offset n)
dfn (partial char-offset (- n))]
(->SubstitutionCipher efn dfn)))

If a simple offset cipher wasn’t sufficient, we
could generate a lookup table and create a deranged
cipher:

 

(defn deranged-cipher []
(let [m (zipmap ALPHABET (shuffle ALPHABET))
m' (set/map-invert m)]
(->SubstitutionCipher (partial get m) (partial get m'))))

Here we use the zipmap
function to create a new map (a key-> value data store,
not to be confused with the
map function
earlier).
zipmap takes two sequences. It
returns a map where each element of the first sequence maps to the
equivalent value in the second sequence. We then invert the map to
generate the reverse lookup table. Note that the actual mappings
are captured in the function, so only by having the particular
instance of the SubstitutionCipher could you call decrypt and get
correct results.

The ciphers discussed so far all share a
weakness that each letter in the plaintext always maps to the same
ciphertext letter.

 

(encrypt (rot13-cipher) "HELLO") => "URYYB")

This leaves them vulnerable to frequency
analysis, where with knowledge of letter frequencies in the source
language it’s possible to discover the mappings. The Vigenere
cipher (aka ‘the undecipherable key’) addresses this by varying the
map-
ping for each letter in the plaintext, a so
called polyalphabetic cipher. A repeating key phrase is used to
determine the offset of each letter. To achieve this we need to
alter our algorithm slightly. We do this with a new type, but
implement the same protocol, as shown in Listing 3.

Note that in Listing 3, this type still
implements the original protocol, so callers of this cipher are non
the wiser.

We handle repeating the key phrase with
the
cycle function, which produces an
infinite lazy sequence of the elements of key, returning to the
start of the sequence as necessary. The ‘lazy’ part is crucial,
each element is only realized as necessary.

A useful feature of the map function is that it
can process more than one sequence at a time. Passing two sequences
to mapresults in the mapping function being called
with 
two arguments, one from each sequence. The
function terminates when any input sequence is exhausted. Finally
we compose a higher order function as the mapping function
containing the functionality to check for membership of ALPHABET,
convert to-int and (for decrypt) alter the direction of the offset.
An invocation of this cipher, looks similar to those seen
earlier…

 

(def venc (vigenere-cipher "PETER PIPER PICKED A PECK OF PICKLED PEPPER"))
(decrypt venc (encrypt venc "PLAINTEXT")) => "PLAINTEXT"
(encrypt venc "PLAINTEXT") => "EPTMEIMMX" ; letter frequency is obscured.

That concludes the whistle stop tour of Clojure,
you’ve learnt about sequences, lazy evaluation and higher order
functions. Find out more at the official Clojure site and the
community maintained documentation website [1] [2].

References

[1] Official Clojure Site: http://clojure.org

[2] The Community maintained Documentation site:
http://clojure-doc.org


Author
Neale Swinnerton has been writing code most days for the last 30 years or so. He is currently working at a startup using clojure to solve data science and machine learning problems to make the world a better place. Follow him on Twitter
Comments
comments powered by Disqus