I’ve been building something in Erlang recently, provisionally called IRCCloud.com (mention this post if you request an invite!) - it’s an in-browser IRC client that stays connected for you all the time, so you never miss the conversation. You can reopen your browser later and still have all the backlog. IRC is damn useful, and James and I are building IRCCloud to give you the advantages of IRC bouncer-esque functionality, with the ease of just opening a webpage.

The Erlang backend is connected to various IRC servers on behalf of our users, so it’s critical that I can deploy new versions of the app without restarting. Erlang’s capability to do live upgrades to running applications means it’s possible to deploy new versions of the backend without disconnecting everybody. I’ve done a fair amount of Erlang before, but I’ve only recently managed to get proper live-updates to running OTP applications working.

Previous upgrade experience ranged from just copying over a new beam and running l(my_module). in the shell, to manually triggering code_change like this:

sys:suspend(Pid),
{module, my_module} = code:load_file(my_module),
sys:change_code(Pid, my_module, undefined, [], 10000),
sys:resume(Pid);

If you want to start doing complex updates that change internal state, amongst other things, you really want to be doing it in the proper OTP way.

I started with this excellent series of articles about Erlybank, on spawnlink.com. I definitely recommend following along the last couple in the series to get a feel for packaging up erlang releases. I’m not going to provide a step-by-step tutorial in this post, since Mitchell already did a great job in those articles.

Unfortunately I fell at the last hurdle - I ran into the same problem as Ricardo in the comments, namely that release_hander was looking for the .appup file in the wrong place. I don’t know why this happens, but it seems that because I had the erlang system itself as a “release” (R13B etc), that somehow screwed things up. I found one other post about the same problem. If you know what’s going on here, please leave a comment and put me out of my misery.

Upgrade to R14!

Let me get this out the way first: upgrade your Erlang to R14 right now, since R13B has a bug that prevents release_handler from figuring out which modules need updating when you install a new release. I lost some time to that, although consequently I did end up taking a nice tour of the sasl/release_handler/systools code.</p> Oh, and also “rebar generate” (mentioned later in this post) will fail with some distro packaged versions of R13B04, with a message like: ERROR: Unable to generate spec: read file info /usr/lib/erlang/man/man5/modprobe.d.5 failed.

First Target System

So I decided to roll my own First Target System. The fine manual says:

Often it is not desirable to use an Erlang/OTP system as is. A developer may create new Erlang/OTP compliant applications for a particular purpose, and several original Erlang/OTP applications may be irrelevant for the purpose in question. Thus, there is a need to be able to create a new system based on a given Erlang/OTP system, where dispensable applications are removed, and a set of new applications that are included in the new system. Documentation and source code is irrelevant and is therefore not included in the new system.

Another advantage of creating your own target system is that you get a guaranteed environment to deploy your code into - you can simply untar your system onto any machine (of the same platform/architecture) and it will run just like it ran on your test machine. Same exact version of Erlang, same exact libraries, same config.

After following along with the manual, I had a shiny new target system - but much to my dismay I had a R13B directory under releases/, and trying to do an application upgrade resulted in the same problem as before: release_handler was still looking for an appup file under releases/R13B/ instead of releases/myapp/.

I tried copying my .appup to R13B, and creating a valid blank appup, to no avail.  So..

Rebar to the rescue… sort of

Rebar is an erlang build tool with a few nifty tricks (thanks Dave and the gang). Documentation/examples are still a little sparse, but one thing it does have in spades is a way to build Erlang target systems in one command: ./rebar generate

To use rebar to make a target system you’ll need a working OTP style application in the usual layout (src,priv,ebin,include) with a valid .app file. Here are the rebar release handling docs.

Okay. So now we have a nice target system (courtesy of rebar); your rel/<appname> directory will look like this:

drwxr-xr-x  2 rj rj   21 2010-09-15 02:49 bin
drwxr-xr-x  8 rj rj   70 2010-09-15 02:48 erts-5.8
drwxr-xr-x  2 rj rj   37 2010-09-15 02:49 etc
drwxr-xr-x 27 rj rj 4096 2010-09-15 02:49 lib
drwxr-xr-x  3 rj rj   17 2010-09-15 02:49 log
drwxr-xr-x  3 rj rj   43 2010-09-15 02:48 releases

The lib/ directory contains stdlib, sasl, kernel, your application and any other deps you specified; The releases/ directory is reassuring empty; bin/ contains a handy nodetool script to start and stop your app; sasl is configured to log to a file. Rebar just saved you a few hours of fiddling to get things up and running. But you’re not out of the woods yet..

Deploying updates to rebar-packaged target system

This was the tricky non-obvious bit. Rebar was great at getting the first system with my app deployed, however it doesn’t really offer much to help you install an update to your running system.

I wrote a script regen.sh to use instead of “rebar generate”, which post-processes the directory structure rebar generates, to make it more amenable to deploying upgrades:

!/bin/bash
echo "This will nuke rel/irccloud and regenerate using rebar.. [enter to continue]"
read
rm -rf rel/irccloud
set -e
./rebar compile
./rebar generate
cd rel/irccloud/lib/
echo -n "Unpacking .ez files"
for f in *.ez
do
echo -n "."
unzip $f &gt; /dev/null
rm $f
done
echo
cd ../releases/
# Get the version of the only release in the system, our new app:
VER=`find . -maxdepth 1 -type d | grep -vE '^\.$' | head -n1 | sed 's/^\.\///g'`
echo "Ver: ${VER}, renaming .rel + .boot files correctly"
cd "${VER}"
mv irccloud.boot start.boot
mv irccloud.rel "irccloud-${VER}.rel"
cd ../../../
echo "OK"

This does three important things:

  1. Unpack the *.ez files (zip files of applications in lib/), since upgrades with release_handler didn’t work otherwise;
  2. rename irccloud.boot to start.boot, since systools has “start.boot” hardcoded as the correct name of this file for packages;
  3. rename irccloud.rel to irccloud-<VERSION>.rel, since this is the standard layout for future packages.

NB: I think I also had to change the bin/irccloud script to -boot with $RUNNER_BASE_DIR/releases/$APP_VSN/start instead of $RUNNER_BASE_DIR/releases/$APP_VSN/$SCRIPT

Tooling up for easier releases

Now in the top level directory (alongside rel,src,ebin..) I made a releases/ directory, from which I package up new versions ready for deployment.

I’m using a bash script, and some erlang, to facilitate packaging up new releases. Note that it adds, using erl -pz, the paths to the current .app file and beams, and the previous .app file and beams.

#!/bin/bash
set -e
ERL_ROOT=$1
OLDVER=$2
VER=$3
cd ..
./rebar compile
cd -
OLDEBIN="${ERL_ROOT}/lib/irccloud-${OLDVER}/ebin/"
echo "Fetching previous rel file: irccloud-${OLDVER}.rel from ${ERL_ROOT}/releases/${OLDVER}"
cp "${ERL_ROOT}/releases/${OLDVER}/irccloud-${OLDVER}.rel" "irccloud-${OLDVER}.rel"

erl -pz ../ebin/ -pz "${OLDEBIN}" -pz ../deps/*/ebin -noshell \
-run release_helper go irccloud $VER ../ebin $OLDVER "${OLDEBIN}" \
| grep -v 'Source code not found'

echo "Release ${VER} packaged"

release_helper.erl

% Check some stuff, write the .rel file, generate boot scripts and relup, make tar
-module(release_helper).
-export([go/1]).

-define(RELAPPS, [  kernel, 
                    stdlib, 
                    sasl, 
                    crypto, 
                    ssl, 
                    inets,
                    public_key, 
                    compiler,
                    syntax_tools,
                    edoc,
                    eunit,
                    xmerl,
                    epgsql, 
                    mochiweb]).

appver(A) when is_atom(A) ->
  application:load(A),
  io:format("Version of '~p'..", [A]),
  {value, {A, _, Ret}} = lists:keysearch(A, 1, application:loaded_applications()),
  io:format("~p~n", [Ret]),
  Ret.

check_appfile_version(File, AppName, ExpectedVer) ->
  io:format("Verifying .app file contains correct version..",[]),
  {ok, [{application, AppName, Props}]} = file:consult(File),
  case proplists:get_value(vsn, Props) of
    ExpectedVer -> io:format("ok~n",[]), ok;
    FoundVer    -> io:format("FAIL~nApp file contains ver: ~s but expected: ~s~n", 
                             [FoundVer, ExpectedVer]),
                   halt(1)
  end.

go(Args) -> 
  [Name, Version, Ebin, PVersion, _PEbin] = Args,
  io:format("release_helper running for: ~p, oldversion: ~s, new version: ~s~n", 
            [Name, PVersion, Version]),
  ok        = check_appfile_version(Ebin ++ "/" ++ Name ++ ".app", 
                                    list_to_atom(Name), Version),
  Erts      = erlang:system_info(version),
  Vsn       = io_lib:format("~s-~s", [Name, Version]),
  PrevVsn   = io_lib:format("~s-~s", [Name, PVersion]),
  io:format("version: '~s'~n", [Version]),
  AppVers   = [{A,appver(A)} || A <- ?RELAPPS],
  Rel       = 
"{release, 
    {\"~s\", \"~s\"}, 
    {erts, \"~s\"}, 
    [   {~w, \"~s\"},~n        "
    ++
    string:join([io_lib:format("{~w, \"~s\"}",[A,AV]) || {A,AV} <- AppVers], 
                ",\n        ")
    ++
"\n    ]\n}.\n",
  RelText   = io_lib:format(Rel, [Name, Version, Erts, list_to_atom(Name), Version]),
  Relfile   = io_lib:format("~s.rel", [Vsn]),
  io:format("Writing ~s~n", [Relfile]),
  {ok, Fs}  = file:open(Relfile, [write]),
  io:format(Fs, RelText, []),
  file:close(Fs),
  io:format("make_script(~s)..~n", [Vsn]),
  ok = systools:make_script(Vsn),
  case PVersion of 
    "0" -> io:format("Previous version 0, not generating relup!~n", []);
    _   -> ok = systools:make_relup(Vsn, [PrevVsn], [PrevVsn])
  end,
  ok = systools:make_tar(Vsn),
  halt().

Making a second release

Here's how I currently make a new release that can be deployed with a live-upgrade:

  1. Fix bugs, change some stuff, add features etc;
  2. Update ebin/irccloud.app to increase the version number, update the modules list if needed;
  3. Create/modify ebin/irccloud.appup to tell release_handler how to upgrade from the previous version to this new version. (See: Appup Cookbook);
  4. cd releases/
  5. ./release_helper.sh "../rel/irccloud" "1" "2" # (where "1" was the version of the first release using rebar generate, "2" is the new one we want to package)

Now I have irccloud-2.tar.gz in the releases/ directory, ready to be deployed. My packaging scripts also tag the release in git and a few other things I've omitted for clarity.

Deploying the release

  1. $ cp irccloud-2.tar.gz ../rel/irccloud/releases # or copy to whereever you put your target system on the production box instead of ../rel/irccloud
  2. erl> release_handler:unpack_release("irccloud-2").
  3. erl> release_handler:install_release("2").
  4. erl> release_handler:which_releases().
  5. erl> release_handler:make_permanent("2").

Step 3 does the upgrade, and runs the appup stuff - hope you tested on your development rig before deploying to production!
If you wrote your .appup properly, it is also safe to downgrade - just do release_helper:install_release("1"). after step 3.

Note, you must run erl with "-boot start_sasl" to use release_manager. I ran this from the same shell as "bin/irccloud console" which already had sasl running.

Mochiweb caveat

I'm using the excellent Mochiweb HTTP library in my application. Here's the child specification from my top level supervisor:

{irccloud_web,
    {irccloud_web, start, [WebConfig]},
     permanent, 5000, worker, dynamic}

Note that it says 'dynamic' in place of the modules list. Here's what the fine manual says:

Modules is used by the release handler during code replacement to determine which processes are using a certain module. As a rule of thumb Modules should be a list with one element [Module], where Module is the callback module, if the child process is a supervisor, gen_server or gen_fsm. If the child process is an event manager (gen_event) with a dynamic set of callback modules, Modules should be dynamic. See OTP Design Principles for more information about release handling.

What it doesn't say is that release_handler will timeout and screw your install because it can't find the list of modules when dynamic is specified, unless you answer correctly.

Here's how to appease release_handler if you have a dynamic modules list: http://github.com/RJ/mochiweb/commit/931c5fb769be844c307a51596898ca6c55998219

In Conclusion

In general, this was a rather painful experience.

What I'd like to be able to do is something like this:

  1. "rebar make_tar newver=<newversion>"
  2. version in the .app file is automatically incremented for me, modules list is updated. (open $EDITOR for confirmation/additional tweaks)
  3. Unless I manually wrote it already, .appup is automatically populated with {load_module, Mod} for each module that changed since last version was packaged, opened in $EDITOR so I can tweak/review.
  4. irccloud-<newver>.tar.gz is created

Maybe I've missed something, and there's a much easier way to package subsequent releases with rebar, or something else I've overlooked.
How do you package and deploy your Erlang applications to do live-updates?