Commit 43e28b81 authored by Jerome Mariette's avatar Jerome Mariette
Browse files

add pyrocleaner documentation

parent 58f42d9c
From the <a href=''>pyrocleaner site</a> :
The pyrocleaner is intended to clean the reads included in the sff file in order to ease the assembly process.
It enables filtering sequences on different criteria such as length, complexity, number of undetermined bases
which has been proven to correlate with pour quality and multiple copy reads. It also enables to clean paired-ends
sff files and generates on one side a sff with the validated paired-ends and on the other the sequences which can be
used as shotgun reads.
Mariette J, Noirot C, Klopp C. Assessment of replicate bias in 454 pyrosequencing and a multi-purpose read-filtering tool.<a href=''>BMC Research Notes 2011, 4:149.</a>
<li>input file: This program accepts sff/fasta/fastq files as input.</li>
<li>qual file list: If a fasta file is used as input a qual file can be provided (has to be named input_file.qual or input_file.fasta.qual).</li>
<li>ng6cfg file: The NG6 run config file the analyse belongs to.</li>
<li>project id: The NG6 project id the analyse belongs to.</li>
Outputs a cleaned file in the same format as the input and a log file.
<li>cleaning options, choose one or more between:
<li>--clean-length-std, Filter short reads shorter than mean less x*standard deviation and long reads longer than mean plus x*standard deviation</li>
<li>--clean-length-win, Filter reads with a legnth in between [x:y]</li>
<li>--clean-ns, Filter reads with to many N</li>
<li>--clean-duplicated-reads, Filter duplicated reads</li>
<li>--clean-complexity-win, Filter low complexity reads computed on a sliding window</li>
<li>--clean-complexity-full, Filter low complexity reads computed on the whole sequence</li>
<li>--clean-quality, Filter low quality reads</li>
<li>recursion limit: Recursion limit when computing duplicated reads,</li>
<li>border limit: Minimal length between the spacer and the read extremity (used with --clean-pairends option),</li>
<li>missmatch: Limit of missmatch nucleotide (used with --clean-pairends option),</li>
<li>std: Number standard deviation to use (used with --clean-length-std option),</li>
<li>min length: Minimal length (used with --clean-length-win option),</li>
<li>max length: Maximal length (used with --clean-length-win option),</li>
<li>ns percent: Percentage of N to use to filter reads (used with --clean-ns option),</li>
<li>duplication limit: Limit size difference (used with --clean-duplicated-reads),</li>
<li>window: The window size (used with --clean-complexity-win),</li>
<li>step: The window step (used with --clean-complexity-win),</li>
<li>complexity: Minimal complexity/length ratio (used with --clean-complexity-win and --clean-complexity-full),</li>
<li>others: Can be : --aggressive, --acpus,</li>
<li>analyse name: The analyse name to display in ng6,</li>
<li>analyse description: The analyse description to display in ng6,</li>
<li>results archive name: The results archive name to display in ng6.</li>
\ No newline at end of file
Supports Markdown
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment