Version 3 (modified by 13 years ago) (diff) | ,
---|
The Mira Assembler
We have installed the Mira de novo assembler.
For documentation on the program, see the Mira homepage at http://sourceforge.net/apps/mediawiki/mira-assembler/index.php?title=Main_Page
Read carefully how the assembly should be done with your specific dataset.
Mira uses fairly large amounts of memory. Luckily, there is a small program included to estimate the memory needs. It's called miramem. Just run it once you have logged in to my-mgrid and answer the questions. Take the estimate given by miramem and add 5GB to it when you request memory for your run on the grid (in the job script).
Mira is using an array of "switches" that turns a lot of features on and off. Most of these are pre-set by default to what Mira assumes most people want, which is not always necessarily coinciding with what you want. One such switch is the filter for long read names. While Mira itself can handle names of over 40 characters, it does not let them through unless you tell it to. The reason is that some other programs people might want to use later on (not sure which ones actually) do not accept long names, so Mira wants to give the user the opportunity to adjust it early on instead of having to redo the assembly completely at a later stage.
Anyway, its easy to fix. Add -MI:somrnl=0 to your command line (anywhere works I think, I did it right after the -job command).
Another thing Mira doesn't like is NFS mounted directories - such as the mykopat grid. As far as I understand it, if several computers need to communicate during an assembly, Mira fears it will slow down by a factor of 10 or more. On the grid, this can be circumvented by forcing the grid to run the job on only one machine. Add #$ -q *@my-mgrid2 (or whatever machine is most suited for this) to the initial list of commands in your job and Mira should decide you are not running on a NFS system.