Context Navigation

Changes between Version 4 and Version 5 of Software/Mira

Timestamp:: 10/10/11 09:27:48 (14 years ago)
Author:: martenl
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

Software/Mira

-              v4
+              v5
 Read carefully how the assembly should be done with your specific dataset.
+Mira uses fairly large amounts of memory. Luckily, there is a small program included to estimate the memory needs. It's called miramem.
+Just run it once you have logged in to my-mgrid and answer the questions. Take the estimate given by miramem and add 5GB to it when
+you request memory for your run on the grid (in the job script). The estimates by miramem seems somewhat optimistic at times, so there is a possibility you will have to increase the allocated memory even more than that. Also, mira takes a lot of time to run. I did run on a data set of 15M reads and allocated 36 hours, which was not enough. So dont be frugal.
+The first thing to be said about this assembler is that it should not be your first choice when dealing with large data sets. Sets in the range of 60M reads would take more than a week run-time and require more then 200G ram. Even sets in the range of 15M reads still take about 80 hours and 50G ram to finish, while for example velvet can deal with the 60M set in 24-48 hours.
+The strongpoint of Mira, it would seem, lies instead in its ability to deal with different techniques at the same time. Got 5M reads of 454, Solexa and Sanger mixed together? Then Mira is your friend.
+So Mira uses fairly large amounts of memory. Luckily, there is a small program included to estimate the memory needs. It's called miramem.
+Just run it once you have logged in to my-mgrid and answer the questions. Take the estimate given by miramem and add 5-10GB to it when
+you request memory for your run on the grid (in the job script). The estimates by miramem seems somewhat optimistic at times, so there is a possibility you will have to increase the allocated memory even more than that.
 Mira is using an array of "switches" that turns a lot of features on and off. Most of these are pre-set by default to what Mira assumes most people want, which is not always necessarily coinciding with what you want. One such switch is the filter for long read names. While Mira itself can handle names of over 40 characters, it does not let them through unless you tell it to. The reason is that some other programs people might want to use later on (not sure which ones actually) do not accept long names, so Mira wants to give the user the opportunity to adjust it early on instead of having to redo the assembly completely at a later stage.