Practical Bioinformatics the halting thought process of a working bioinformatician

2Jul/120

Things I’m working on today

Amazon AWS

I'm using Amazon AWS for so many things these days.  I use EC2 to host a MIRA Assembler image, it'll assemble anything, fast, and cheap.  Today's task is just an EC2 instance for users to RDP into to deploy Acrobat X.  I know, it's not very BNFO, but if you can run Acrobat on it, you can run ClustalX or MAST on it.

Apache Solr

I don't know much about Solr yet, except that it's Apache's search engine.  It's available as a Tomcat servlet, but I'm trying to run it up using the Blaze Appliance by initcron.  I'm very much into appliances these days.  Even on my desktop machine (6-core Phenom II), the VT-x extensions are a great way to run things in VirtualBox without a big performance penalty.  I like the idea of Amazon's CloudSearch, but I'm doing such a small index that it's probably not worth paying for.  Solr looks like a good way to index and search things over NFS and CIFS shares in the lab.

FreeNAS 8.0.4-p3

I do occasionally sully my hands with non-cloud storage.  My current in-lab NAS is a Western Digital ShareSpace.  It's a 4Tb unit, which is plenty of space even in RAID5, but the throughput's not fast at all.  It'll max out at 12Mb/sec read over NFS, and writes about 4Mb/sec at best.  Since I had an older IBM Raid Server with 12 SAS drives in it, I figured I'd use FreeNAS to replace it.  It works great, ZFS is an excellent filesystem, and the snapshots mesh nicely with Amazon S3.

The problem I encountered is in moving the data between.  I mounted the old and new filesystems, and then tried to copy the contents over:

cp --force --preserve --recursive /mnt/old/* /mnt/new/

I kept getting prompts to continue, even though I specified --force.  Turns out at least in CentOS 6.2, 'cp' is aliased to 'cp -i', in an attempt to keep you from overwriting things unintentionally.  The easy fix?

\cp --force --preserve --recursive /mnt/old/* /mnt/new/

The \ prefix avoids the alias without using 'unalias'. Now things copy smoothly. And throughput is amazing, I've seen up to 50 Mb/sec write speeds, and pulling data is much faster than that.