Version 1 (modified by 13 years ago) (diff) | ,
---|
Scata HowTo
Register and log in
Before you can use SCATA you need to register with your email address and log in to the system. The reason for requiring an email address for registration is that some actions take a while to perform and the results from the system are communicated through email to to you once they are done. (For SLU users, there is no need to register. You can log in with your AD username and password.)
Prepare your data
Before you begin to upload your data to run an analysis, make sure you have prepared all files you need:
- Dataset consisting of one ore more Fasta files with the sequences and optionally the corresponding Fasta-formated quality files with quality data for your sequences. The sequences and quality must appear in the same order in the two files. (See DataSet)
- A file with all tags you have used and want to include in the analysis formatted in a semicolon separated list. (See TagSet)
- Optionally, a file with the reference sequences you want to include in the analysis in order to identify clusters, where possible. (See ReferenceSet?)
Upload your data
Go to the different sections of the Scata and upload your data files. Once they have been checked, you will get emails with the results of the checks, and the files will be available for analysis.
Create a parameter set
While your data files are being processed, you can create a parameter set for use in your analysis. There are several parameters that can be altered, but in most cases, it should suffice to check that the following parameters have desired values. The defaults are sane in most cases.
- Clustering distance. Adjust if you want more or less stringent clustering.
- Minimum alignment
- Homopolymer collapsing
Submit your job!
Once all your uploads have been verified, go to the Jobs tab and create a new job. Select the dataset(s), reference set(s), tag set and parameter set you want to use. Submit the job. Once done, you will get an email telling you that the results are available for download. The results will be available for download in a zip file.
Results
The result file contains an number of files, allt .txt file are easy to import into excel as semicolon separated tables.
all_clusters_runID.txt
contains a general summary of the run as well as all clusters from the run, including identifications where possible. The corresponding .fas-file contains the consensus sequences of the clusters.all_tag_by_cluster_runID.txt
is a semicolon separated data matrix for all tags and all clusters. Values are normalised abundances of each cluster for each tag (rows sum to 1).all_tags_runID.txt
contains summaries for all tags in the experiment.- Directories with summaries per tag, as well as cluster alignments.
Read quality parameters (read length, various quality cutoffs). The defaults should work fine, although you might want to check the minimum length parameter and adjust it to be slightly below the shortest read you expect to find in your sample. Primer sequence and similarity cutoff. Make sure it matches the primer you have used! Primer trimming to cut off the invariable part of your amplicons coming from the primer.