Friday, October 31, 2008

Running MATLAB on UNIX

nohup matlab -nodisplay -nosplash -nodesktop -nojvm -r "matlab_command;exit;" > logfile

The nohup command essentially allows you to run MATLAB from a remote terminal without worrying about connection drops or other hang up issues. However, sometimes it doesn't behave as expected on some UNIX systems. It might be better to use the 'screen' command

A simple tutorial on how to use the screen command is here.

All you need to do is from your terminal type
>screen %This will open up a new screen (Duh!)
>Type your favorite commands

You can now comfortably disconnect your session and reconnect to it any time.

If you want to get out of this screen back to the original terminal press Cntrl + a + d

To reconnect to a screen session simply type
>screen -r

This will either bring up the screen session (if you have just one session going) or give you a list of screen ids.

To connect to a particular screen session
> screen -r

Wednesday, October 29, 2008

Hash functions for sequence scanning

INPUT: A set of sequences (DNA/Protein etc.)
OUTPUT: A motif matrix of all possible k-mers and gapped elements (dimers for example) in the set of sequences

MATLAB doesn't have any built in hashing functions that run in O(1) time. You would want something that can do a quick array index lookup for each k-mer or dimer into the motif matrix. There are several hacks u can pull off.
  1. You can use a for loop. This simply sucks. Wayyyy to slow.
  2. If you are scanning DNA sequences then u can encode A = 1, C = 2, G = 3, T = 4 ... In this way every kmer automatically becomes an number which can used as an index into a sparse matrix. U can then prune the sparse matrix to remove indices that donot match any kmer sequence. This is extremely fast. However it doesn't work for dimers or very long kmers or more complex sequence elements such as regular expressions. It also won't work for protein sequence cuz there are 21 amino acids and so you would start generating very large array indices for k-mers with k>8.
  3. I feel the best option though is to use the JAVA hash object ht = java.util.Hashtable
More on (3) ...

You create the hash table object as ht = java.util.Hashtable . Check out member functions here

The keys would be the kmers/dimers etc. and the values will be the motif matrix indices. The only problem with this is that u can add only a single (key,value) pair and get the value corresponding to a single key. So it would be better to write JAVA code that would take a set of kmers and add them to the hash table and return indices ... basically a vectorized version of get() and put().

I need to do this.