Monday, March 16, 2009

Appending to .MAT files

You can append variables to a .mat file using

>> save(oFname,'var','-append');

Consider 2 scenarios:
1) The variable 'var' is being added to the .mat file for the first time
2) The variable 'var' already exists in the .mat file and is being overwritten or updated

If 'var' takes up a lot of memory ie it is large matrix or array, (2) is significantly slower than (1) by orders of magnitude.

Moral of the story: As far as possible avoid overwriting or updating a variable in a .mat file, especially if the variable takes up a lot of memory.

Sparse vectors - ALWAYS use Column Vectors

I was working on some 'signal' data that I obtained from a ChIP-seq experiment that measures the binding affinity of a transcription factor to every nucleotide in the human genome. I was trying to manipulate this signal data using sparse vectors in MATLAB.

Most of the time I use column vectors by default. For some reason I decided to switch to row vectors. What a difference!

An empty (all-zeros) sparse column vector of length 2 million barely takes a few bytes of memory. However, an empty sparse row vector of the same length gives an 'out of memory' error. While I was aware of the space efficiency of column-based sparse matrices in MATLAB, this was the first time I actually observed such a vast difference.

Moral of the story: If you are manipulating sparse vectors ALWAYS use column vectors!