wiki:ScataHowTo
close Warning: Can't synchronize with repository "(default)" (/mykopat/svn/repos/scata does not appear to be a Subversion repository.). Look in the Trac log for more information.

Scata HowTo

Register and log in

Before you can use SCATA you need to register with your email address and log in to the system. The reason for requiring an email address for registration is that some actions take a while to perform and the results from the system are communicated through email to to you once they are done. (For SLU users, there is no need to register. You can log in with your AD username and password.)

Prepare your data

Before you begin to upload your data to run an analysis, make sure you have prepared all files you need:

  • Dataset consisting of one ore more Fasta files with the sequences and optionally the corresponding Fasta-formated quality files with quality data for your sequences. The sequences and quality must appear in the same order in the two files. (See DataSet)
  • A file with all tags you have used and want to include in the analysis formatted in a semicolon separated list. (See TagSet)
  • Optionally, a file with the reference sequences you want to include in the analysis in order to identify clusters, where possible. (See ReferenceSet?)

Upload your data

Go to the different sections of the Scata and upload your data files. Once they have been checked, you will get emails with the results of the checks, and the files will be available for analysis.

Create a parameter set

While your data files are being processed, you can create a parameter set for use in your analysis. There are several parameters that can be altered, but in most cases, it should suffice to check that the following parameters have desired values. The defaults are sane in most cases.

  • Clustering distance. Adjust if you want more or less stringent clustering.
  • Minimum alignment
  • Homopolymer collapsing

Submit your job!

Once all your uploads have been verified, go to the Jobs tab and create a new job. Select the dataset(s), reference set(s), tag set and parameter set you want to use. Submit the job. Once done, you will get an email telling you that the results are available for download. The results will be available for download in a zip file.

Results

The result file contains an number of files, allt .txt file are easy to import into excel as semicolon separated tables.

  • all_clusters_runID.txt contains a general summary of the run as well as all clusters from the run, including identifications where possible. The corresponding .fas-file contains the consensus sequences of the clusters.
  • all_tag_by_cluster_runID.txt is a semicolon separated data matrix for all tags and all clusters. Values are normalised abundances of each cluster for each tag (rows sum to 1).
  • all_tags_runID.txt contains summaries for all tags in the experiment.
  • Directories with summaries per tag, as well as cluster alignments.

Read quality parameters (read length, various quality cutoffs). The defaults should work fine, although you might want to check the minimum length parameter and adjust it to be slightly below the shortest read you expect to find in your sample. Primer sequence and similarity cutoff. Make sure it matches the primer you have used! Primer trimming to cut off the invariable part of your amplicons coming from the primer.

Last modified 14 years ago Last modified on Jun 16, 2011, 11:00:39 AM