Practical Bioinformatics the halting thought process of a working bioinformatician


The Record Separator

I teach people Perl.  And I enjoy it very much, since it makes me think about Perl from a beginner's perspective.  Imagine you didn't have the advantage of CPAN and BioPerl.  As a bioinformatician, you accumulate a lot of FASTA files, which come in the format:

>gi|5524211|gb|AAD44166.1| cytochrome b [Elephas maximus maximus]

You have to find a way to read in this file in your Perl script.  You know the usual suspects, open, filehandles, and the while(<$fh>) construction.  But the default behavior of Perl is to handle things one line at a time, leading to loops that sometimes have to deal with the results of the last iteration before doing new work.  Modules and BioPerl are widely available to just read the damn thing in, but remember, you’re a new Perl programmer.

Enter the record separator, ‘$/’  The record separator allows you to change that default line-by-line behavior to something else.  If it’s reading in line by line, that means Perl is breaking up the input by “\n” newline characters.  But it doesn’t have to be that way.  Try using this statement in your declarations:

local $/ = ‘\n>’;

Suddenly, your input is broken up into whatever’s between the > symbols, when they start a new line.  Now, instead of feeding in sequences line-by-line, you’re reading them in one sequence at a time.  You can use your Perl arsenal to handle any operations you need on the sequence, and deal with them in a simple, procedural way.

At the end of your script, everything is returned to normal.

Some words of caution: If you declare this at the beginning of your file, it changes some things you can’t see, namely the behavior of chomp.  Now, chomp will remove a leading > from your lines, so be aware of it.

Further Reading:

Filed under: Perl No Comments