Readme file

4050b89b · Margot Zahm · 4e1c9daf · 4050b89b
Commit 4050b89b authored 4 years ago by Margot Zahm
--- a/README.md
+++ b/README.md
+# nf-benchgapcloser
+Pipeline to benchmark gapclosing tools. It compares:
+* behavior for gaps of known and unknown length
+* expected length
+* identity of filled gaps
+* running time
+* max memory used
+It can generate random reads and sequences or take given data as input.
+
+## Quick Start
+1. Install [`Nextflow`](https://www.nextflow.io/)
+2. Install [`Singularity`](https://sylabs.io/guides/3.6/user-guide/) for full pipeline reproducibility
+3. Generate the singularity image
+```
+cd benchgapcloser
+singularity build Singularity.img Singularityfile
+```
+4. Run the process on a small dataset
+```
+```
+5. Run the process with your data
+```
+nextflow run benchgapcloser/main.nf --all_gapcloser [--assembly 'assembly.fa' --reads 'reads.fq --reads_pos 'reads_pos.bed']
+```
+
+## Usage
+```
+nextflow run main.nf [options] --all_gapcloser [--assembly 'assembly.fa' --reads 'reads.fq --reads_pos 'reads_pos.bed']
+    Mandatory argument:
+      --all_gapcloser           Runs all gapclosing tools for benchmark (GMcloser, LR_Gapcloser and TGS-GapCloser)
+        or 
+      --GM_gapcloser            Runs GMcloser only
+        or 
+      --LR_gapcloser            Runs LR_Gapcloser only
+        or
+      --TGS_gapcloser           Runs TGS-GapCloser only
+
+    Scaffold options:
+      --assembly [file]         Path to fasta file which contain one sequence without gaps. If not specified, a random sequence is generated.
+      --scaffold_length [int]   Length of randomly generated sequence (default: 30Mb).
+      --contig_length [str]     Contig length distribution (mean and stdev, default: '300000 50000')
+      --gap_length [str]        Gap length distribution (mean and stdev, default: '20000 5000')
+
+    Reads options:
+      --reads [file]            Path to fastq file of reads. If not specified, reads are generated using BadReads.
+      --reads_coord [file]      Path to bed file of reads coordinates on assembly. It needs a fourth column: read ID. 
+                                Mandatory when --reads option is specified
+      --quantity [str]          Reads depth to generate (default: '50x')
+      --length [str]            Fragment length distribution (mean and stdev, default: '15000,13000')
+      --identity [str]          Sequencing identity distribution (mean, max and stdev, default: '100,100,0')
+      --error_model [str]       Can be "nanopore", "pacbio", "random" or a model filename (default: 'random')
+      --qscore_model [str]      Can be "nanopore", "pacbio", "random", "ideal" or a model filename (default: 'random')
+      --glitches [str]          Read glitch parameters (rate, size and skip, default: '0,0,0') [more info](https://github.com/rrwick/Badread#glitches)
+      --junk_reads [int]        This percentage of reads will be low-complexity junk (default: 0) [more info](https://github.com/rrwick/Badread#junk-and-random-reads)
+      --random_reads [int]      This percentage of reads will be random sequence (default: 0) [more info](https://github.com/rrwick/Badread#junk-and-random-reads)
+      --chimeras [int]          Percentage at which separate fragments join together (default: 0) [more info](https://github.com/rrwick/Badread#chimeras)
+      --start_adapter_seq [str] Adapter sequence for read starts (default: '')
+      --end_adapter_seq [str]   Adapter sequence for read ends (default: '')
+
+    General:
+      --seed [int]              Random number generator seed for deterministic output (default: different ouput each time)
+      --outdir [str]            Output directory (default: './results/')
+```
+
+## Input files
+The only parameter needed is one of these: `--all_gapcloser`, `--GM_gapcloser`, `--LR_gapcloser` or `--TGS_gapcloser`. It specifies the gapcloser tool(s) to run. This will generate a random sequence and corresponding random reads.
+
+If you want the pipeline to take as input your own sequence, use `--assembly` parameter. Your assembly must be a single fasta file without gaps. If you have a multi fasta file, split it and run the pipeline for each sequence.
+
+If you want the pipeline to take as input your own reads, use `--reads` parameter. These reads can not be specified without the associated assembly. You must also give the coordinates of reads on your assembly in BED format with option `--reads_coord`.
+
+## Output files
+Output files are stored in the output directory specified by `--outdir` option (default: `./results`). It contains:
+* `report.html`: An html report to show the efficiency of each gapcloser.
+* `pipeline_trace.txt`: A table of each process run by Nextflow and some info such as mempry used, running time...
+* `data/`: A directory with CSV files used to generate the report, assembly gapclosed by gapcloser, sequence and reads generated or given to the pipeline.
+* `images/`: A directory with images of each gap and reads mapped on these regions. There is one directory of each gapcloser tool.
+* `plots/`: A directory with plots generated for the report.
+
+## Dependencies
+If you do not use the singularity image, this is a list of required elements to install before running the workflow.
+
+### Gapclosing tools
+* [GMcloser](https://sourceforge.net/projects/gmcloser/)
+* [LR_Gapcloser](https://github.com/CAFS-bioinformatics/LR_Gapcloser)
+* [TGS-GapCloser](https://github.com/BGI-Qingdao/TGS-GapCloser)
+Carefull: These tools heve dependencies not specified in Dependencies section. Please, take care of requirements when you install them.
+
+### Other tools
+* [badread](https://github.com/rrwick/Badread)
+* [bedtools](https://bedtools.readthedocs.io)
+* [blat](https://github.com/djhshih/blat)
+* [bowtie2](http://bowtie-bio.sourceforge.net/bowtie2)
+* [samtools](http://www.htslib.org/download/)
+
+### Python modules
+* [biopython](https://biopython.org/)
+* [cython](https://cython.org/)
+* [GenomeView](https://github.com/nspies/genomeview)
+* [numpy](https://numpy.org/)
+* [pysam](https://pysam.readthedocs.io/en/latest/index.html)
+* [pytz](http://pytz.sourceforge.net/)
+* [scipy](https://www.scipy.org/)
+
+### R libraries
+* [ggplot2](https://rdrr.io/cran/ggplot2/)
+* [ggpubr](https://rdrr.io/cran/ggpubr/)
+* [rmarkdown](https://rdrr.io/cran/rmarkdown/)
\ No newline at end of file