Akka anti-patterns: Too many actors
Manuel Bernhardt’s Akka anti-patterns series continues. This time, he takes a closer look at a very frequent anti-pattern that can be found in codebases written by developers who have just discovered the actor model; and that is to have too many actors. Whilst Akka is entirely capable and designed to run many actors, this isn’t always the best approach.
It occurred to me that I haven’t written yet about this very frequent anti-pattern. It is to be found in codebases written by developers who have just discovered the actor model.
There are two ways in which you can have too many actors:
- designing your system with too many different actor types, many of which are unnecessary
- creating an overwhelming amount of actors at runtime when it isn’t necessary or even counter-effective
Let’s have a look at these two in detail.
Too many actor types
The line of thinking goes roughly like this: “We have actors, therefore everything must be an actor”.
The actor model makes it easier to write asynchronous applications. It does this by providing the illusion of synchronous execution inside of an actor – there’s no need to worry about concurrent access to the state of one actor because only the actor can access its state, and messages are processed one at a time.
Now that being said, not everything needs to be done asynchronously. Method calls that are purely CPU bound (and aren’t “blocking” in the sense that they don’t completely overwhelm the CPU, such as e.g. calculating the value of PI) need not be executed asynchronously.
What I see quite frequently is codebases with many different actors interacting with one another and not performing anything that has much of an advantage to being done asynchronously or concurrently. In these designs the same state needs to be held by each of those actors, or passed along in each message.
There are two disadvantages to this approach:
- you don’t gain anything in terms of performance – to the contrary, there’s an overhead associated to creating messages and passing them around
- with each actor type and its associated messages, the system becomes more difficult to understand and to maintain
When designing actor systems it is therefore useful to think in terms of what really needs to be asynchronous, mainly:
- calls to external systems (outside of your JVM)
- calls to blocking operations (legacy APIs, heavy computations, …)
Too many runtime actors for the use-case
The link of thinking goes roughly like this: “The more actors we have, the faster things will go”.
It is true that actors are lightweight and you can run millions of them on a single JVM. You can. But should you?
The short answer is: not always – it depends on what you’re doing with your actors.
If your system has many long-lived objects that each hold a bit of state and are going to be interacting with one another now and then, you may well end up with a million actors – and that’s a legitimate use case very well supported by Akka. You could for example have a system with a large number of users, where each user is represented by an actor. A raw Akka actor takes only 300 bytes of heap so it is entirely possible to create millions of them on a single machine and to leave them running side-by-side without having to worry about anything. And if you end up having that many actors or having actors with larger state so that they do not fit in the memory of a single machine any longer, Cluster Sharding makes it easy to distribute the actors across several machines.
If, however, you have a few actor types that are involved in computing something – say for example parsing an XML document – it is questionable as to whether you should create a million of them (either directly or via a router).
A CPU has a certain number of cores (hardware threads) at its disposal and the processing of messages by Akka actors is scheduled by
ExecutionContext-s, often backed by a thread pool of sorts. By default, this is the
fork-join-executor backed by the ForkJoinPool introduced in Java 7.
Now, for all of its technical prowess, the fork-join pool is not a magical entity of sorts that renders the laws of physics obsolete. If you have one million actors each parsing an XML document (already loaded into memory) and 4 hardware threads, your system won’t perform much better than if you had just 4 actors parsing those XML documents (assuming homogenous load). In fact, your system will perform a lot better with just 4 actors, because there will only be minimal overhead in terms of scheduling and memory juggling. By the way, if your actor system only has a few actors, check out the affinity pool dispatcher which tries to re-use the same hardware thread for the same actor.
In essence, it isn’t because you have many actors that things will necessarily be faster.
That’s it for this anti-pattern – if you are curious about more, check out the list of anti-patterns.
This post was originally published on Manuel Bernhardt’s blog.