hacks

ssh hack: connect directly to machine via a firewall box

UPDATED 23/03/2009: added “-q0″ option to clean up netcat after session terminates, and left another useful ssh tip in the comments.

It’s common to have to ssh to firewall / gateway machine, then ssh to the machine you want to work on within a server network.
Typically you’d do this from your local machine:
$ ssh firewall.example.com
Password:
$ ssh my-private-host

I finally got bored of doing this, and created the following file, /usr/bin/sssh

#!/bin/bash
ssh -oproxycommand="ssh -q firewall.example.com nc -q0 %h %p" $*

Now I can use the sssh command to connect to hosts using the firewall machine as a proxy. Like most good hacks, this uses netcat.

Eg:
$ sssh 10.1.2.3
Will connect me directly to a machine on the server network, via the firewall box. Seeing as it passes all parameters to ssh (the $* bit) you can do port forwards and X-forwarding as usual too:

$ sssh -L 5432:localhost:5432 my-vm

This lets me tunnel the port for a PostgreSQL running on my development vm (my-vm) in a single command. I have all my keys installed, so no passwords needed – I estimate this will save me about 60 seconds every day.

Tags: ,

Monday, November 17th, 2008 hacks 8 Comments

On bulk loading data into Mnesia

Consider this a work-in-progress; I will update this post if I find a ‘better’ way to do fast bulk loading

The time has come to replace my ets-based storage backend with something non-volatile. I considered a dets/ets hybrid, but I really need this to be replicated to at least a second node for HA / failover. Mnesia beckoned.

The problem:

  • 15 million [fairly simple] records
  • 1 Mnesia table: bag, disc_copies, just 1 node, 1 additional index
  • Hardware is a quad-core 2GHz CPU, 16GB Ram, 8x 74Gig 15k rpm scsi disks in RAID-6
  • Takes ages* to load and spews a load of “Mnesia is overloaded” warnings

* My definition of ‘takes ages’: Much longer than PostgreSQL \copy or MySQL LOAD DATA INFILE

At this point all I want is a quick way to bulk-load some data into a disc_copies table on a single node, so I can get on with running some tests.

Here is the table creation code:
mnesia:create_table(subscription,
[
{disc_copies, [node()]},
{attributes, record_info(fields, subscription)},
{index, [subscribee]}, %index subscribee too
{type, bag}
]
)

The subscription record is fairly simple:
{subscription, subscriber={resource, user, 123}, subscribee={resource, artist, 456}}

I’m starting erlang like so:
erl +A 128 -mnesia dir '"/home/erlang/mnesia_dir"' -boot start_sasl

The interesting thing there is really the +A 128 – this spreads the cpu load better between the 4 cores.

Attempt 0) ‘by the book’ one transaction to rule them all

Something like this:
mnesia:transaction(fun()-> [ mnesia:write(S) || S <- Subs ] end)

Time taken: Too long, I gave up after 12 hours
Number of “Mnesia overloaded” warnings: lots
Conclusion: Must be a better way
TODO: actually run this test and time it.

Attempt 1) dirty_write

There isn’t really any need to do this in a transaction, so I tried dirty_write.
[ mnesia:dirty_write(S) || S <- Subs ]

And here’s the warning in full:
=ERROR REPORT==== 13-Oct-2008::16:53:57 ===
Mnesia('mynode@myhost'): ** WARNING ** Mnesia is overloaded: {dump_log,
write_threshold}

Time taken: 890 secs
Number of “Mnesia overloaded” warnings: lots
Conclusion: Workable, but nothing to boast about. Those warnings are annoying

Attempt 2) dirty_write, defer index creation

A common trick with traditional RDBMS would be to bulk load the data into the table and add the indexes afterwards. In some scenarios you can avoid costly incremental index update operations. If you are doing this in one gigantic transaction it shouldn’t matter, and I’m not really sure how mnesia works under the hood (something I plan to rectify if I end up using it for real).
I tried a similar approach by commenting out the {index, [subscribee]} line above, doing the load, then using mnesia:add_table_index(subscriber, subscribee) afterwards to add the index once all the data was loaded. Note that mnesia was still building the primary index on the fly, but that can’t be helped.
Time taken: 883 secs (679s load + 204s index creation)
Number of “Mnesia overloaded” warnings: lots
Conclusion: Insignificant, meh

Attempt 3) mnesia:ets() trickery

This is slightly perverted, but I tried it because I was suspicious that incrementally updating the on-disk data wasn’t especially optimal. The idea is to make a ram_only table and use the mnesia:ets() function to write directly to the ets table (doesn’t get much faster than ets). The table can then be converted to disc_copies. There are caveats – to quote The Fine Manual:

Call the Fun in a raw context which is not protected by a transaction. The Mnesia function call is performed in the Fun are performed directly on the local ets tables on the assumption that the local storage type is ram_copies and the tables are not replicated to other nodes. Subscriptions are not triggered and checkpoints are not updated, but it is extremely fast.

I can live with that. I don’t mind if replication takes a while to setup when I put this into production – I’ll gladly take any optimisations I can get at this stage (testing/development).

Loading a list of subscriptions looks like this:
mnesia:ets(fun()-> [mnesia:dirty_write(S) || S <- Subs] end).
And to convert this into disc_copies once data is loaded in:
mnesia:change_table_copy_type(subscription, node(), disc_copies).

Time taken: 745 secs (699s load + 46s convert to disc_copies)
Number of “Mnesia overloaded” warnings: none!
Conclusion: Fastest yet, bit hacky

Summary

At least the ets() trick doesn’t spew a million warnings. I also need to examine the output of mnesia:dump_to_textfile and see if loading data from that format is any faster.

TODO:

  • Examine / test using the dum_to_textfile method
  • Run full transactional load and time it
  • Try similar thing with PostgreSQL

Tags: ,

Monday, October 13th, 2008 hacks, programming 7 Comments

Updated bash PS1

Made a minor tweak to my .bashrc after browsing dotfiles.org for some ideas. One neat trick I gleaned was detecting when the exit code of the last command ($?) was non-zero and altering the prompt. This will be useful for quickly seeing at a glance if some enormous load of output from make was successful or not.

Note the prompt goes red on failure

Note the prompt goes red on failure

Here are the bits from my updated .bashrc:

  1. # define useful aliases for color codes
  2. sh_norm="\[\033[0m\]"
  3. sh_black="\[\033[0;30m\]"
  4. sh_darkgray="\[\033[1;30m\]"
  5. sh_blue="\[\033[0;34m\]"
  6. sh_light_blue="\[\033[1;34m\]"
  7. sh_green="\[\033[0;32m\]"
  8. sh_light_green="\[\033[1;32m\]"
  9. sh_cyan="\[\033[0;36m\]"
  10. sh_light_cyan="\[\033[1;36m\]"
  11. sh_red="\[\033[0;31m\]"
  12. sh_light_red="\[\033[1;31m\]"
  13. sh_purple="\[\033[0;35m\]"
  14. sh_light_purple="\[\033[1;35m\]"
  15. sh_brown="\[\033[0;33m\]"
  16. sh_yellow="\[\033[1;33m\]"
  17. sh_light_gray="\[\033[0;37m\]"
  18. sh_white="\[\033[1;37m\]"
  19.  
  20. case `hostname` in
  21.     "livehost"|"production_server"|"sauron")
  22.         HOSTCOLOUR=${sh_red}
  23.         ;;
  24.     "staging-node")      HOSTCOLOUR=${sh_yellow} ;;
  25.     *)              HOSTCOLOUR=${sh_green} ;;
  26. esac
  27.  
  28. export PROMPT_COMMAND=‘if [ $? -ne 0 ]; then ERROR_FLAG=1; else ERROR_FLAG=; fi; ‘
  29. export PS1=${sh_white}\u@’${HOSTCOLOUR}\h${sh_norm}\w\n${sh_norm}‘${ERROR_FLAG:+’${sh_light_red}‘}\$${ERROR_FLAG:+’${sh_norm}‘} ‘



I’m also using the hostname to decide what colour the host appears in the prompt. My home directory, and thus .bashrc, is mounted on most hosts I log in to, and this serves as a reminder if I’m logged in to a production host. Green is the default, and it’s overridden for various special hosts.

Tags:

Saturday, October 11th, 2008 hacks 2 Comments

Transcoding HTTP mp3 streaming proxy in bash

Here’s how to make a proxy for streaming mp3s. It transcodes on-the-fly to 64kpbs MP3 using lame. When transcoding is finished, it calls the ./posthandler.sh script, which can either just delete the file, or potentially archive it so you don’t need to transcode it again.

  1. #!/bin/bash
  2. read method url version
  3.  
  4. method="${method%$CR}"
  5. url="${url%$CR}"
  6. version="${version%$CR}"
  7.  
  8. echo -ne "HTTP/1.0 200 OK\r\nContent-type: audio/mpeg\r\n\r\n"
  9.  
  10. BR=64 #birate to transcode to.
  11. PIPE="/tmp/$$.pipe"
  12. mkfifo "$PIPE"
  13.  
  14. OUTFILE="./tmp.$$.$BR.mp3"
  15. rm $OUTFILE
  16. url=`echo "$url" | sed ’s/\///’`
  17. echo "** GET $url" >&2
  18.  
  19. nohup lynxsource "$url" \
  20.     | (lame –preset cbr $BR –mp3input – - 2>/dev/null \
  21.       && (echo "** Finished transcoding $url" >&2 ; \
  22.           ./posthandler.sh "$OUTFILE"&))\
  23.     | tee -i "$PIPE" > $OUTFILE &
  24.  
  25. cat < $PIPE
  26. rm $PIPE


One interesting limitation seems to be the buffer size of a fifo pipe in linux. Even though the transcoding step is pretty quick, if a client is connected the transcoding only manages to fill the pipe a couple of hundred k ahead of what is being read.

The -i flag to `tee` means it ignores interrupts, and will finish transcoding the file and call the posthandler even if the client disconnects.

Run is like this:

while [ 1 ]; do nc -vlp 8080 -c './transstreamer.sh' ; done

Then hit up a url of your choice using your awesome new proxy:

mpg321 "http://localhost:8080/http://freedownloads.last.fm/download/105468518/Letters%2BFrom%2BThe%2BBoatman.mp3"

Not the most scalable solution, but a mildly amusing quick hack.

Tags: , , ,

Monday, September 29th, 2008 hacks, programming No Comments