tag:blogger.com,1999:blog-33695783173873005622024-03-12T19:44:36.383-07:00MATLAB for compbioAnshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.comBlogger13125tag:blogger.com,1999:blog-3369578317387300562.post-2275898938114013812011-01-13T16:06:00.001-08:002011-01-13T16:09:02.278-08:00Compiling matlab to a standalone with no display optionYou might often want to compile matlab files to a standalone executable using the mcc command. However, by default you will obtain annoying warning messages about no display being available. To avoid these messages you should use the compiler directive -R <div><br /></div><div>Before Matlab 2010b</div><div>mcc -R -nodisplay ...</div><div><br /></div><div>For Matlab 2010b, although the documentation says it should be the same it isnt. You need to drop the - . i.e.</div><div>mcc -R nodisplay ...</div>Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com2tag:blogger.com,1999:blog-3369578317387300562.post-69584624517379446882010-12-22T00:30:00.000-08:002010-12-22T00:32:21.739-08:00isdeployed( )isdeployed() is a handy function in matlab to check whether a piece of matlab code is running as a standalone deployed app or whether it is running in native matlab.Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com0tag:blogger.com,1999:blog-3369578317387300562.post-68235213177051647172010-06-29T11:23:00.000-07:002010-06-29T11:30:29.432-07:00Truncating or rounding off a decimal value/array to user-specified number of decimal placesSometimes, you want to truncate long floating point numbers to keep just the first few digits following the decimal point. The easy way to do this is<br /><br />xr = round(x/n) * n<br /><br />where<br />x = original floating point number<br />n = 10^(-[number of digits after decimal])<number of="" digits="" after="" decimal=""><br /><br />e.g. x=1.5673454, n = 0.01 (2 digits after decimal point)<br />xr = 1.57</number>Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com1tag:blogger.com,1999:blog-3369578317387300562.post-27661096932933528552010-04-17T01:25:00.000-07:002010-04-17T01:27:20.126-07:00Passing data in and out of MATLAB and PythonCame across this great package that allows direct exchange between MATLAB and Python.<div><br /></div><div><a href="http://vader.cse.lehigh.edu/~perkins/pymex.html">http://vader.cse.lehigh.edu/~perkins/pymex.html</a></div>Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com1tag:blogger.com,1999:blog-3369578317387300562.post-15605711545403053182010-04-05T11:13:00.000-07:002010-04-05T11:19:20.669-07:00How to solve MCR cache access problems on a clusterOften when I run compiled matlab applications on a cluster, I get the error message <b><br /><br />"Could not access the MCR component cache."<br /></b><p> </p><p>This tends to happen because matlab is not able to access the MCE cache directory. By default this happens to be your home directory. When a large number of compiled matlab programs are starting off/running simultaneously (e.g. you submit a job array), the load on the file system is too great giving rise to the problem.</p><p>The simplest way to solve this problem, if to point the MCR_CACHE_ROOT environment variable to a local temporary directory on each node on the cluster.<br /></p> <pre>export MCR_CACHE_ROOT=$TMPDIR<br /></pre> <p>This redirects the cache to a temp directory that is able to handle the traffic.</p>Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com6tag:blogger.com,1999:blog-3369578317387300562.post-77239589984784495102010-01-09T21:37:00.001-08:002010-01-09T21:48:52.369-08:00High density scatter plotsThe scatter(x,y) function in MATLAB is useful to visualize the joint distribution of two variables x and y. But this function breaks down (gets too slow and memory intensive) if the number of data points in x/y is large.<br /><br />A nice trick to visualize high density scatter plots is to bin the data and smooth the 2-D histogram. Then one can use the image function or surf function with alpha transparency to view the joint distribution. Darker regions could represent high density of points and light regions could represent low density of points.<br /><br />R and several other programming languages have built in functions of this. It is a little surprising that MATLAB doesn't have it built in yet. Anyway, <a href="http://bioinformatics.oxfordjournals.org/cgi/content/abstract/20/5/623">here</a> is a paper that gives a very efficient way of creating these smoothed high-density scatter plots and <a href="http://www.mathworks.com/matlabcentral/fileexchange/13352-smoothhist2d">here</a> is an implementation.Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com2tag:blogger.com,1999:blog-3369578317387300562.post-13228070168990193642009-10-29T17:36:00.000-07:002009-10-29T18:18:05.914-07:00PSI-BLAST and BLAST background probabilitiesThis post is not directly related to MATLAB but I felt it was important to post this.<br /><br />I recently realized that it is not trivial to find the background amino acid probabilities that are used in BLAST and PSI-BLAST. Google didn't find it. None of the papers referenced in the BLAST papers actually have the frequencies in a tabular form. I would have thought this should have been documented by NCBI in BLAST help or something! Anyway after a few hours of searching and reading papers and eventually code, I found the actual values used. They can be found in this file<br /><br /><a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c">http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c</a><br /><br />Below are the tables which contain the frequencies. They need to be normalized (divide by the sum of the frequencies = 1000) to convert the frequencies to probabilities.<br /><br /><a href="http://spreadsheets.google.com/ccc?key=0Am6FxqAtrFDwdDh6WWhabTRyaThJNFBDMV9LZmJkVVE&hl=en">Google doc spreadsheet</a><br /><br />NOTE: PSI-BLAST uses the Robinson values by default<br /><br /><pre><a name="L2345" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2345">2345</a> #if <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=STD_AMINO_ACID_FREQS">STD_AMINO_ACID_FREQS</a> == <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=Dayhoff_prob">Dayhoff_prob</a><br /><a name="L2346" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2346">2346</a> <b><i>/* M. O. Dayhoff amino acid background frequencies */</i></b><br /><a name="L2347" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2347">2347</a> static <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=BLAST_LetterProb">BLAST_LetterProb</a> <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=Dayhoff_prob">Dayhoff_prob</a>[] = {<br /><a name="L2348" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2348">2348</a> { <i>'A'</i>, 87.13 },<br /><a name="L2349" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2349">2349</a> { <i>'C'</i>, 33.47 },<br /><a name="L2350" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2350">2350</a> { <i>'D'</i>, 46.87 },<br /><a name="L2351" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2351">2351</a> { <i>'E'</i>, 49.53 },<br /><a name="L2352" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2352">2352</a> { <i>'F'</i>, 39.77 },<br /><a name="L2353" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2353">2353</a> { <i>'G'</i>, 88.61 },<br /><a name="L2354" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2354">2354</a> { <i>'H'</i>, 33.62 },<br /><a name="L2355" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2355">2355</a> { <i>'I'</i>, 36.89 },<br /><a name="L2356" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2356">2356</a> { <i>'K'</i>, 80.48 },<br /><a name="L2357" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2357">2357</a> { <i>'L'</i>, 85.36 },<br /><a name="L2358" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2358">2358</a> { <i>'M'</i>, 14.75 },<br /><a name="L2359" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2359">2359</a> { <i>'N'</i>, 40.43 },<br /><a name="L2360" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2360">2360</a> { <i>'P'</i>, 50.68 },<br /><a name="L2361" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2361">2361</a> { <i>'Q'</i>, 38.26 },<br /><a name="L2362" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2362">2362</a> { <i>'R'</i>, 40.90 },<br /><a name="L2363" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2363">2363</a> { <i>'S'</i>, 69.58 },<br /><a name="L2364" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2364">2364</a> { <i>'T'</i>, 58.54 },<br /><a name="L2365" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2365">2365</a> { <i>'V'</i>, 64.72 },<br /><a name="L2366" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2366">2366</a> { <i>'W'</i>, 10.49 },<br /><a name="L2367" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2367">2367</a> { <i>'Y'</i>, 29.92 }<br /><a name="L2368" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2368">2368</a> };<br /><a name="L2369" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2369">2369</a> #endif<br /><a name="L2370" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2370">2370</a><br /><a name="L2371" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2371">2371</a> #if <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=STD_AMINO_ACID_FREQS">STD_AMINO_ACID_FREQS</a> == <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=Altschul_prob">Altschul_prob</a><br /><a name="L2372" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2372">2372</a> <b><i>/* Stephen Altschul amino acid background frequencies */</i></b><br /><a name="L2373" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2373">2373</a> static <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=BLAST_LetterProb">BLAST_LetterProb</a> <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=Altschul_prob">Altschul_prob</a>[] = {<br /><a name="L2374" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2374">2374</a> { <i>'A'</i>, 81.00 },<br /><a name="L2375" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2375">2375</a> { <i>'C'</i>, 15.00 },<br /><a name="L2376" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2376">2376</a> { <i>'D'</i>, 54.00 },<br /><a name="L2377" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2377">2377</a> { <i>'E'</i>, 61.00 },<br /><a name="L2378" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2378">2378</a> { <i>'F'</i>, 40.00 },<br /><a name="L2379" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2379">2379</a> { <i>'G'</i>, 68.00 },<br /><a name="L2380" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2380">2380</a> { <i>'H'</i>, 22.00 },<br /><a name="L2381" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2381">2381</a> { <i>'I'</i>, 57.00 },<br /><a name="L2382" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2382">2382</a> { <i>'K'</i>, 56.00 },<br /><a name="L2383" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2383">2383</a> { <i>'L'</i>, 93.00 },<br /><a name="L2384" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2384">2384</a> { <i>'M'</i>, 25.00 },<br /><a name="L2385" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2385">2385</a> { <i>'N'</i>, 45.00 },<br /><a name="L2386" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2386">2386</a> { <i>'P'</i>, 49.00 },<br /><a name="L2387" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2387">2387</a> { <i>'Q'</i>, 39.00 },<br /><a name="L2388" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2388">2388</a> { <i>'R'</i>, 57.00 },<br /><a name="L2389" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2389">2389</a> { <i>'S'</i>, 68.00 },<br /><a name="L2390" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2390">2390</a> { <i>'T'</i>, 58.00 },<br /><a name="L2391" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2391">2391</a> { <i>'V'</i>, 67.00 },<br /><a name="L2392" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2392">2392</a> { <i>'W'</i>, 13.00 },<br /><a name="L2393" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2393">2393</a> { <i>'Y'</i>, 32.00 }<br /><a name="L2394" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2394">2394</a> };<br /><a name="L2395" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2395">2395</a> #endif<br /><a name="L2396" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2396">2396</a><br /><a name="L2397" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2397">2397</a> #if <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=STD_AMINO_ACID_FREQS">STD_AMINO_ACID_FREQS</a> == <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=Robinson_prob">Robinson_prob</a><br /><a name="L2398" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2398">2398</a> <b><i>/* amino acid background frequencies from Robinson and Robinson */</i></b><br /><a name="L2399" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2399">2399</a> static <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=BLAST_LetterProb">BLAST_LetterProb</a> <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=Robinson_prob">Robinson_prob</a>[] = {<br /><a name="L2400" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2400">2400</a> { <i>'A'</i>, 78.05 },<br /><a name="L2401" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2401">2401</a> { <i>'C'</i>, 19.25 },<br /><a name="L2402" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2402">2402</a> { <i>'D'</i>, 53.64 },<br /><a name="L2403" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2403">2403</a> { <i>'E'</i>, 62.95 },<br /><a name="L2404" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2404">2404</a> { <i>'F'</i>, 38.56 },<br /><a name="L2405" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2405">2405</a> { <i>'G'</i>, 73.77 },<br /><a name="L2406" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2406">2406</a> { <i>'H'</i>, 21.99 },<br /><a name="L2407" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2407">2407</a> { <i>'I'</i>, 51.42 },<br /><a name="L2408" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2408">2408</a> { <i>'K'</i>, 57.44 },<br /><a name="L2409" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2409">2409</a> { <i>'L'</i>, 90.19 },<br /><a name="L2410" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2410">2410</a> { <i>'M'</i>, 22.43 },<br /><a name="L2411" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2411">2411</a> { <i>'N'</i>, 44.87 },<br /><a name="L2412" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2412">2412</a> { <i>'P'</i>, 52.03 },<br /><a name="L2413" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2413">2413</a> { <i>'Q'</i>, 42.64 },<br /><a name="L2414" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2414">2414</a> { <i>'R'</i>, 51.29 },<br /><a name="L2415" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2415">2415</a> { <i>'S'</i>, 71.20 },<br /><a name="L2416" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2416">2416</a> { <i>'T'</i>, 58.41 },<br /><a name="L2417" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2417">2417</a> { <i>'V'</i>, 64.41 },<br /><a name="L2418" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2418">2418</a> { <i>'W'</i>, 13.30 },<br /><a name="L2419" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2419">2419</a> { <i>'Y'</i>, 32.16 }<br /><a name="L2420" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2420">2420</a> };<br /><a name="L2421" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2421">2421</a> #endif<br /><a name="L2422" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2422">2422</a><br /><a name="L2423" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2423">2423</a> static <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=BLAST_LetterProb">BLAST_LetterProb</a> <a href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/ident?i=nt_prob">nt_prob</a>[] = {<br /><a name="L2424" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2424">2424</a> { <i>'A'</i>, 25.00 },<br /><a name="L2425" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2425">2425</a> { <i>'C'</i>, 25.00 },<br /><a name="L2426" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2426">2426</a> { <i>'G'</i>, 25.00 },<br /><a name="L2427" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2427">2427</a> { <i>'T'</i>, 25.00 }<br /><a name="L2428" href="http://www.ncbi.nlm.nih.gov/IEB/ToolBox/C_DOC/lxr/source/tools/blastkar.c#L2428">2428</a> };<br /></pre>Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com0tag:blogger.com,1999:blog-3369578317387300562.post-46486640779557894352009-03-16T02:50:00.000-07:002009-03-16T03:00:44.979-07:00Appending to .MAT filesYou can append variables to a .mat file using<br /><br />>> save(oFname,'var','-append');<br /><br />Consider 2 scenarios:<br />1) The variable 'var' is being added to the .mat file for the first time<br />2) The variable 'var' already exists in the .mat file and is being overwritten or updated<br /><br />If 'var' takes up a lot of memory ie it is large matrix or array, (2) is significantly slower than (1) by orders of magnitude.<br /><br /><span style="font-weight: bold;">Moral of the story:</span> As far as possible avoid overwriting or updating a variable in a .mat file, especially if the variable takes up a lot of memory.Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com0tag:blogger.com,1999:blog-3369578317387300562.post-611593551368961682009-03-16T00:47:00.000-07:002009-03-16T01:38:55.781-07:00Sparse vectors - ALWAYS use Column VectorsI was working on some 'signal' data that I obtained from a ChIP-seq experiment that measures the binding affinity of a transcription factor to every nucleotide in the human genome. I was trying to manipulate this signal data using sparse vectors in MATLAB.<br /><br />Most of the time I use column vectors by default. For some reason I decided to switch to row vectors. What a difference!<br /><br />An empty (all-zeros) sparse column vector of length 2 million barely takes a few bytes of memory. However, an empty sparse row vector of the same length gives an 'out of memory' error. While I was aware of the space efficiency of column-based sparse matrices in MATLAB, this was the first time I actually observed such a vast difference.<br /><br /><span style="font-weight: bold;">Moral of the story</span>: If you are manipulating sparse vectors ALWAYS use column vectors!Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com0tag:blogger.com,1999:blog-3369578317387300562.post-71134553841626578232009-02-28T17:48:00.000-08:002009-03-16T01:36:23.792-07:00Dealing with massive files with limited memory<div style="text-align: left;">When dealing with extremely massive files such as entire genomes, it is pretty much impossible to fit it all in memory. For situations like this MATLAB has an extremely slick function called<span style="font-family:arial;"></span> <a href="http://www.mathworks.com/access/helpdesk/help/techdoc/index.html?/access/helpdesk/help/techdoc/ref/memmapfile.html">memmapfile</a>.<br /><br />The main advantages are<br /><ul><li>The file is not loaded in memory</li><li>You can access the entire file or a portion of the file as if it were a standard MATLAB array using indexing operations. Let say the file had the sequence for an entire genome. Now if you say a = memmapfile('genome.dat') then doing something like a.Data(1:10) gives you the first 10 nucleotides of the genome.</li><li>It can handle single formats or multiple formats<br /></li><li>Much faster than <a href="jar:file:///C:/Program%20Files/MATLAB/R2008b/help/techdoc/help.jar%21/ref/fread.html"><tt>fread</tt></a> and <a href="jar:file:///C:/Program%20Files/MATLAB/R2008b/help/techdoc/help.jar%21/ref/fwrite.html"><tt>fwrite</tt></a>.</li></ul>This is extremely useful for handling large binary files.<br /></div><p></p><blockquote></blockquote><p></p>Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com4tag:blogger.com,1999:blog-3369578317387300562.post-29798007616038912722009-01-24T10:56:00.000-08:002009-02-05T11:58:18.655-08:00Vectorized ROC curve code + AUCROC curves are often used to display the predictive performance of binary classifiers. The area under the ROC curve (AUC) is a way to compare various classifiers. A perfect classifier has an AUC of 1 and a completely bogus (random) classifier has an AUC of 0.5. You can read more about ROC curves <a href="http://en.wikipedia.org/wiki/Receiver_operating_characteristic">here</a>.There is a ton of code for plotting ROC curves and calculating AUC. But most use 'for' loops. And as we all know, loops slow everything down in MATLAB. You can download my vectorized code for plotting multiple ROC curves from multiple classifiers and calculating AUC curves for each.<br /><br /><a href="http://sites.google.com/site/anshulkundaje/icode/calc_roc.m">Download Link</a>Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com0tag:blogger.com,1999:blog-3369578317387300562.post-31458977860206747632008-10-31T05:49:00.001-07:002009-02-05T11:37:41.588-08:00Running MATLAB on UNIXnohup matlab -nodisplay -nosplash -nodesktop -nojvm -r <matlab_command mfile="">"matlab_command;exit;" > logfile<br /><br />The nohup command essentially allows you to run MATLAB from a remote terminal without worrying about connection drops or other hang up issues. However, sometimes it doesn't behave as expected on some UNIX systems. It might be better to use the 'screen' command<br /><br />A simple tutorial on how to use the screen command is <a href="http://kb.iu.edu/data/acuy.html" target="blank">here.</a><br /><br />All you need to do is from your terminal type<br />>screen %This will open up a new screen (Duh!)<br />>Type your favorite commands<br /><br /></matlab_command>You can now comfortably disconnect your session and reconnect to it any time.<br /><br /><matlab_command mfile="">If you want to get out of this screen back to the original terminal press Cntrl + a + d<br /><br />To reconnect to a screen session simply type<br />>screen -r<br /><br />This will either bring up the screen session (if you have just one session going) or give you a list of screen ids.<br /><br />To connect to a particular screen session<br />> screen -r <screen_id><br /></matlab_command>Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com0tag:blogger.com,1999:blog-3369578317387300562.post-82398371297686927932008-10-29T19:58:00.000-07:002008-10-29T20:15:33.856-07:00Hash functions for sequence scanningINPUT: A set of sequences (DNA/Protein etc.)<br />OUTPUT: A motif matrix of all possible <span style="font-style: italic;">k</span>-mers and gapped elements (dimers for example) in the set of sequences<br /><br />MATLAB doesn't have any built in hashing functions that run in O(1) time. You would want something that can do a quick array index lookup for each <span style="font-style: italic;">k</span>-mer or dimer into the motif matrix. There are several hacks u can pull off.<br /><ol><li>You can use a for loop. This simply sucks. Wayyyy to slow.<br /></li><li>If you are scanning DNA sequences then u can encode A = 1, C = 2, G = 3, T = 4 ... In this way every kmer automatically becomes an number which can used as an index into a sparse matrix. U can then prune the sparse matrix to remove indices that donot match any kmer sequence. This is extremely fast. However it doesn't work for dimers or very long kmers or more complex sequence elements such as regular expressions. It also won't work for protein sequence cuz there are 21 amino acids and so you would start generating very large array indices for <span style="font-style: italic;">k</span>-mers with <span style="font-style: italic;">k</span>>8.<br /></li><li>I feel the best option though is to use the JAVA hash object ht = java.util.Hashtable</li></ol>More on (3) ...<br /><br />You create the hash table object as ht = java.util.Hashtable . Check out member functions <a href="http://java.sun.com/j2se/1.4.2/docs/api/java/util/Hashtable.html">here</a><br /><br />The keys would be the kmers/dimers etc. and the values will be the motif matrix indices. The only problem with this is that u can add only a single (key,value) pair and get the value corresponding to a single key. So it would be better to write JAVA code that would take a set of kmers and add them to the hash table and return indices ... basically a vectorized version of get() and put().<br /><br />I need to do this.Anshulhttp://www.blogger.com/profile/02178466793315780705noreply@blogger.com9