wiki:Software/Mira

The Mira Assembler

We have installed the Mira de novo assembler.

For documentation on the program, see the Mira homepage at http://sourceforge.net/apps/mediawiki/mira-assembler/index.php?title=Main_Page

Read carefully how the assembly should be done with your specific dataset.

The first thing to be said about this assembler is that it should not be your first choice when dealing with large data sets. Sets in the range of 60M reads would take more than a week run-time and require more then 200G ram. Even sets in the range of 15M reads still take about 80 hours and 50G ram to finish, while for example velvet can deal with the 60M set in 24-48 hours.

The strongpoint of Mira, it would seem, lies instead in its ability to deal with different techniques at the same time. Got 5M reads of 454, Solexa and Sanger mixed together? Then Mira is your friend.

So Mira uses fairly large amounts of memory. Luckily, there is a small program included to estimate the memory needs. It's called miramem. Just run it once you have logged in to my-mgrid and answer the questions. Take the estimate given by miramem and add 5-10GB to it when you request memory for your run on the grid (in the job script). The estimates by miramem seems somewhat optimistic at times, so there is a possibility you will have to increase the allocated memory even more than that.

Mira is using an array of "switches" that turns a lot of features on and off. Most of these are pre-set by default to what Mira assumes most people want, which is not always necessarily coinciding with what you want. One such switch is the filter for long read names. While Mira itself can handle names of over 40 characters, it does not let them through unless you tell it to. The reason is that some other programs people might want to use later on (not sure which ones actually) do not accept long names, so Mira wants to give the user the opportunity to adjust it early on instead of having to redo the assembly completely at a later stage.

Anyway, its easy to fix. Add -MI:somrnl=0 to your command line (anywhere works I think, I did it right after the -job command).

Another thing Mira doesn't like is NFS mounted directories - such as the mykopat grid. As far as I understand it, if several computers need to communicate during an assembly, Mira fears it will slow down by a factor of 10 or more. On the grid, this can be circumvented by forcing the grid to run the job on only one machine. Add #$ -q *@my-mgrid2 (or whatever machine is most suited for this) to the initial list of commands in your job and Mira should decide you are not running on a NFS system. If it persists in claiming your on NFS, you can add -MI:sonfs=no to the command line, telling it to override the NFS abortion. As long as you have told Mira to only run on one machine, it will not have any NFS-related problems, even though the software itself might still think so.

Last modified 13 years ago Last modified on Oct 10, 2011, 9:34:50 AM