When dealing with extremely massive files such as entire genomes, it is pretty much impossible to fit it all in memory. For situations like this MATLAB has an extremely slick function called memmapfile.
The main advantages are
The main advantages are
- The file is not loaded in memory
- You can access the entire file or a portion of the file as if it were a standard MATLAB array using indexing operations. Let say the file had the sequence for an entire genome. Now if you say a = memmapfile('genome.dat') then doing something like a.Data(1:10) gives you the first 10 nucleotides of the genome.
- It can handle single formats or multiple formats
- Much faster than fread and fwrite.
4 comments:
Anshul -
Thanks for sharing this with the community. I'm curious if you or your readers also have ASCII (text) files that you want to be able to read with a method like memmapfile.
@Scott: Yes absolutely. I use the memmap function mainly on massive chromosome FASTA files. I have to convert them to binary first in order to memory map them. Would be fantastic to be able to use it with ASCII files as well.
That is interesting and potentially very useful. Thanks for sharing this information!
@Will: Ur welcome! I hope to find and post several MATLAB gems that are hidden in the documentation.
Post a Comment