Rewriting Playdar: C++ to Erlang, massive savings
I’ve heard many anecdotes and claims about how many lines of code are saved when you write in Erlang instead of [C++/other language]. I’m happy to report that I now have first-hand experience and some data to share.
I initially wrote Playdar in C++ (using Boost and Asio libraries), starting back in February this year. I was fortunate to be working with some experienced developers who helped me come to terms with C++. There were three of us hacking on it regularly up until a few months ago, and despite being relatively new to C++, I’ll say that we ended up with a well designed and robust codebase, all things considered.
On Feeling Smug
I’ll admit I felt rather smug making it all work in C++ with Boost and ASIO. Getting it to build on all three platforms and dynamically load extensions (DLLs etc) at runtime in a cross-platform way was also quite satisfying (I had plenty of help with that side of things). I learned a lot about C++, Boost, ASIO and CMake. But, as the codebase grew, I began to seriously question my decision to use C++.
My initial reasons for choosing C++ were twofold:
- Distribution – shipping the Erlang VM didn’t sound like fun
- Taglib – *the* library to read metadata from audio files (mp3, m4a, ogg etc) is C++
It turns out Playdar is naturally a good fit for Erlang – it does lots in parallel, and lots of stuff it does is asynchronous and event based. Even with all the stuff you get with Boost, multithreaded stuff in C++ is inelegant, to put it kindly.
SLOCed and Loaded
Anyway, a couple of weeks ago I sat down to re-implement Playdar from scratch in Erlang. I thrashed out the guts of it in a couple of days, and by the end of the week I almost had it 1:1 features with the C++ codebase. There’s still a bit of C++ left – code to interface with taglib.
Using the SLOCcount tool (SLOC=source lines of code) I counted the lines of code in various modules from both codebases, here are the results:
| Erlang Version | C++ Version | Savings | |
| Core Daemon | 1,100 | 4,491 | 75% |
| Library + Scanner | 197 + 167.cpp | 1,355 | 73% |
| LAN Resolver | 105 | 427 | 75% |
| P2P | 463 | 1,762 | 74% |
| TOTAL | 2,032 | 8,035 | 75% |
75% less lines of code using Erlang compared to C++ to implement the same thing – not too shabby :)
The second time around writing in Erlang I knew exactly what I was building, so it’s unfair to compare development time of the two codebases, but given how fast I can type I reckon I saved a good few hours of just pounding the keyboard to input the code (and countless hours of debugging: Erlang tends to work first time, really). Well I’m not sure if “saved” is the right word, considering It was working in C++ already, but it’s my time to waste :)
If you count the third party code bundled with both codebases (excluding boost/asio!) then the erlang codebase saves a whopping 92%. I’m more interested in the savings in code I had to write, however.
Memory and CPU Usage
I’ve done some preliminary comparisons between both projects, when it comes to CPU and memory usage both projects are pretty similar. The Erlang codebase uses slightly more memory than C++ at the moment, but I’m convinced I can get that down to at least as low as the C++ project was. I picked up a few optimization tricks from my three-part Million-user comet experiment in Erlang earlier this year. I’ll post more about this if I learn any new tricks.
One thing I’ve realised about the Erlang codebase is that I’ve used processes to encapsulate state (active queries, specifically) where I didn’t really need to. It seemed sensible at the time, but it’s probably just a waste of memory. I’m going to change it to spawn processes to get the work done (ie, a process that runs the query) but not necessarily just to maintain state.
Distribution to the desktop
C++
You just have to make sure that you build everything and ship with any DLLs along with checks in the installer for system libraries needed (runtime dlls). Oh, and make sure you don’t change the plugin binary interface in the main app, or new plugins will crash and burn when you load them. Add a check for that. Oh and be careful about compiling taglib and stuff with mingw and the rest with VC++, or things might mysteriously crash. Also I heard a horror story about allocating memory in plugin code but deallocating it in the main app when the plugin was compiled against a different stdlib than the main app. This is all par for the course, and the experienced C++ developers I asked for help had no trouble making it work. Size of installable pacakge: 2.5MB
Erlang
Compiling, and building/loading plugins in the Erlang codebase is straightforward on all platforms, as is often the way with VMs. I was against shipping the Erlang VM originally because I figured it would be a lot of hassle and increase the download size substantially. Packaging an Erlang app for the desktop involves taking the installed VM directory structure and stripping out all the docs, source and parts of the Erlang stdlib we don’t use, then packaging it along with the compiled Playdar code. CouchDB does something like this too, and RabbitMQ ships the Erlang VM without stripping unneeded libs. We’ll work on packaging some more (for all platforms), but to date Max has crafted a package that contains the necessary bits of the Erlang VM, a sexy Prefpane to start/stop the daemon on OS X, and the compiled Playdar code all weighing in under 10MB.
We’ll put together a Windows installer soon that’ll probably be around the same size. A 10MB download isn’t so bad nowadays, and I expect we can optimize the packaging process some more. Linux users will get a package that depends on the erlang VM in their package manager.
Seems like shipping Erlang apps to the desktop isn’t so hard after all.
tl;dr
Someone rewrote a C++ app in Erlang: 75% less lines of code for same functionality.
You should read this blog post about Playdar, by Paul Lamere, and take a look at the Playdar website.
C++ codebase (deprecated)
Erlang codebase
Playdar is the future, and the future is written in Erlang :)
14 Comments to Rewriting Playdar: C++ to Erlang, massive savings
Leave a comment
About Me
Tags
Recent Posts
- Rewriting Playdar: C++ to Erlang, massive savings
- Erlang talk at London Hackspace
- Anti-RDBMS: A list of distributed key-value stores
- How we use IRC at Last.fm
- Getting to know ejabberd and writing modules
- ssh hack: connect directly to machine via a firewall box
- A Million-user Comet Application with Mochiweb, Part 3
- A Million-user Comet Application with Mochiweb, Part 2
- A Million-user Comet Application with Mochiweb, Part 1
- On bulk loading data into Mnesia
Heretic!
Now I finally have no excuse to get into Erlang and start contributing. ;-)
The few days I spent learning C++ and tinkering with Playdar was all for nothing… nothing!
~75% is definitely way more than I was expecting after you mentioned this post in the pub.
Norman, your job is now to reimplement everything you’ve every done in C++ to Erlang. Ready?… Go.
Now i can feel like my money wasn’t wasted on the erlang book since everyone and their dog is talking about scala.
EP의 생각…
Rewriting Playdar: C++ to Erlang, massive savings 코드 크기가 1/4로 줄어들었고 메모리와 CPU 사용량은 비슷하다. C++, Erlang 코드도 볼 수 있다….
But… but…
I was curious to see if erlang had some kind of support for binding to c libraries, for how you got taglib on erlang, but you’re just spawning a process and talking on stdio! That’s cheating! :p
That’s about the biggest nit I can find to pick. And I have a C++ server that’s using python right now by running ./server myfifo so I can’t really say anything :)
Pretty awesome. I guess I’m with Norman, a big pain in the ass VM was my last excuse. I’m out of excuses!
Wah-hey it broke my code, let’s try html-escaping:
./server < myfifo | script.py > myfifo
It probably doesn’t help that I don’t know Erlang, but I don’t understand why any time I read other people’s Erlang code I see all these one letter variable names and weird, inconsistent capitalization. E.g.:
handle_info({udp, _Sock, {A,B,C,D}=Ip, _InPortNo, Packet}, State) ->
?LOG(debug, “received msg: ~s”, [Packet]),
{struct, L} = mochijson2:decode(Packet),
case proplists:get_value(<>,L) of
<> ->
Qid = proplists:get_value(<>, L),
case resolver:qid2pid(Qid) of
Qpid when is_pid(Qpid) ->
{struct, L2} = proplists:get_value(<>, L),
What is all this A,B,C,D, Qid, L, L2, etc.? And why is your C++ code so vertically spaced out in some places?
unsigned short port = DEFAULT_LAN_PORT;
string ip;
if( v.type() == str_type )
{
ip = v.get_str();
}
else if( v.type() != array_type )
{
continue;
}
else
{
This could as easily and as (more?) readably be written
unsigned short port = DEFAULT_LAN_PORT;
string ip;
if( v.type() == str_type ) { ip = v.get_str(); }
else if( v.type() != array_type ) { continue; }
else {
Voila! 50% savings, C++ versus . . . C++.
By the way, I’m not trying to knock your choice in language by any means. From everything I’ve heard, Erlang is a great language. And there may very well be a difference between it and C++ in terms of conciseness. But when I see all these line count comparisons sometimes I get a sense that it’s less that one language offers more conciseness than another and more that the author _wants_ it to, and thus codes differently such that the conclusion they wish to draw is supported.
If you want to interface to external code there are three options: ports (stdin/stdout piping) which are dead simple and prevent a crash in your external code from taking down the Erlang VM at the cost of some speed due to serialization/de-serialization, linked-in drivers which can load up DLLs and shared libraries and make their functions available with none of the port overhead at the risk of a null pointer ref or some other bug in the library taking down the whole Erlang VM, or nodes that interface for a specific language. The last option is basically a process running in language X (C, Python [including one option that uses the Twisted event loop], Ruby, etc.) that knows how to speak the erlang node protocol and can basically pretend to be another distributed node in the system. This option is less well known than the other two but is often a good one to look at; you get somewhat faster data transfer by only needing to convert data structures to something specific for your preferred language when/if you actually need the data and you can call specific functions across the node boundary (e.g. call an Erlang function from Python or call a C function from Erlang.)
The reason the C++ code exists and is run as a separate process (for taglib) is because that’s one of the three Erlang ways to integrate with external code. It’s the simplest and cleanest way. evgen covered the three ways in his comment above. I’d actually claim that as one of the great things about Erlang – it’s easy to interface with external code in a standard, supported way that makes the external code look like an Erlang process (Ports).
Regarding the SLOCcount for the LAN plugin, i adjusted the C++ linecount down when collecting these stats because I didn’t implement the PING/PONG stuff in the Erlang code. (ie, i removed that code from C++ then counted the lines). So I still think it’s a reasonable comparison.
I’ll admit the style/newline proliferation in some of the C++ code will have inflated the line-count a little, and it could certainly be written with less newlines (and less readability, some would say), but we’re still in the right ballpark.
Playdar is often network/IO bound, but it also does a lot concurrently with plugins doing things in parallel then notifying the main resolver when they find something. Erlang style concurrency is perfect for this.
@James
Concerning “inconsistent” capitalization, Erlang is 100% consistent with capitalization. It is *enforced*. In Erlang, variables are *always* capitalized, whereas “atoms” are *always* lower-case. An atom (like anything in the world that is *truly* an atom) is something that is meant to be indivisible: you can’t reduce it. An atom is like a variable name where you use the name itself, there is no value associated with the name.
Concerning “L” versus “L2″, Erlang is a single-assignment language. These are variables, since they are capitalized, but the naming is used to show versioning of variables (e.g. making change explicit). Within the same *scope*, once a value is bound to a variable, the variable cannot be reassigned. This design philosophy is meant to eliminate whole categories of programming errors, which is important since *reliability* is Erlang’s primary goal. With multiple-assignment that most languages use, it’s almost as if you have to track the state of variables in addition to the state of objects, because the same name can be bound to different values at different times within the same scope. Erlang’s need for distributed programming in order to allow fail-over and similar features requires reliable concurrency. Reliable concurrency can’t happen if you have to track a lot of messy state. Therefore, at every opportunity, Erlang tries to be as stateless as possible. Only each process as a whole has state by continuously passing its variables back to itself via a recursive function that acts as a main loop (it doesn’t run out of stack/memory due to tail-call optimization being required).
Concerning “A”, “B”, “C”, and “D”, it looks like the author is pattern-matching in order to assign values to these, so that if you pass into the function an IP address of “127.0.0.1″, the result is: A=127, B=0, C=0, D=1. Since you didn’t include the entire function code, I don’t see where these variables are used unless I dig into the source myself.
[...] [...]
[...] [...]
“I’ve used processes to encapsulate state (active queries, specifically) where I didn’t really need to. It seemed sensible at the time …” — what are you using now instead? Ets or Mnesia? The OO/Actor equation seems to encourage the encapsulation of state in processes. After the experience you had there, any suggestions along what lines to think one’s way out of that? Back to separation of instructions and data – half way? I come to think that Mnesia is more integral than it looks at first glance. Even though it ‘feels’ like too big for being the standard way of state handling, without its transactions something is missing. Ets are not sufficient. The abolition of locks and synchs may simply requiring for transactions in common state handling or it’s merely a truncation of applicability where shared state is part of the requirements?
Is anyone looking for Erlang work in London?
Please contact emma@e-macrecruitment.com for further details.