From ecbe823af0d7e9a417a6523509e81b41797fe8f2 Mon Sep 17 00:00:00 2001 From: Thomas Faraut <Thomas.Faraut@inra.fr> Date: Fri, 5 Jul 2019 17:12:34 +0200 Subject: [PATCH] new readme --- README.md | 48 +++++++++++++++++++----------------------------- 1 file changed, 19 insertions(+), 29 deletions(-) diff --git a/README.md b/README.md index 693020c..6443c71 100644 --- a/README.md +++ b/README.md @@ -23,7 +23,7 @@ and additionnal third party softwares. Clone this repository: git clone --recursive https://forgemia.inra.fr/genotoul-bioinfo/cnvpipelines.git - + Then, copy `application.properties.example` into `application.properties`. Configuration will be edited in next step. Third party sofwtares can best be installed using conda. @@ -42,16 +42,6 @@ export PERL5LIB="$CONDA_HOME/envs/cnv/lib/perl5" ### 6. Additional softwares to install -You must install pysamstats from the master branch of their repository to have a compatibility to pysam 0.14 (required by other components of the pipeline): - - pip install git+https://github.com/alimanfoo/pysamstats.git - -For svtyper, you need to have the parallel python3-recoded version. For now (awaiting pull request), install it like this: - - pip install git+https://github.com/florealcab/svtyper.git - -You must install [genomestrip](http://software.broadinstitute.org/software/genomestrip/) and [lumpy](https://github.com/arq5x/lumpy-sv) using their install procedure. - You also need to install the RepBase (http://www.girinst.org/server/archive/RepBase21.12/ - choose this version as more recent ones are not compatible with 4.0.6 version of RepeatMaster). Download the Repbase-derived RepeatMasker libraries (repeatmaskerlibraries-20160829.tar.gz) Uncompress it in your save folder. It will create a Library folder. Then define the path to the Library folder inside the application.properties file (see below). If you run simulation, you need additional python modules: matplotlib and seaborn. Once you loaded your conda environment, just install them like that: @@ -60,7 +50,7 @@ If you run simulation, you need additional python modules: matplotlib and seabor Special case of genologin cluster (genotoul): -* Lumpy is already available through bioinfo/lumpy-v0.2.13. Just add it in the application.properties file. +* Lumpy is already available through bioinfo/lumpy-v0.2.13. Just add it in the application.properties file. * For genomestrip, you can use this folder: `/usr/local/bioinfo/src/GenomeSTRiP/svtoolkit_2.00.1774` (see configuration part, sv_dir point) ### 7. Future logins @@ -70,7 +60,7 @@ For future logins, you must reactivate all conda environments. This means launch export PATH=$CONDA_HOME/bin:$PATH source activate cnv export PERL5LIB="$CONDA_HOME/envs/cnv/lib/perl5" - + Where `$CONDA_HOME` is the folder in which you install miniconda in previous step. @@ -80,7 +70,7 @@ To do simulations, you need to compile pirs, which is included as submodule of y make - + ## Configuration Configuration should be edited in `application.properties` file. Sections and parameters are described below. @@ -117,7 +107,7 @@ This section must be filled only if you don't use local as batch system type (se ##### Command ./cnvpipelines.py run refbundle -r {fasta} -s {species} -w {working_dir} - + With: `fasta`: the path of the reference fasta file `species`: species name, according to the [NCBI Taxonomy database](http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html) @@ -142,15 +132,15 @@ With: ##### Command ./cnvpipelines.py run align -r {fasta} -s {samples} -w {working_dir} - + With: `fasta`: the path of the reference fasta file `samples`: a YAML file describing for each sample it's name and fastq files (reads2, and optionally reads2). Example: - + Sample_1: reads1: /path/to/reads_1.fq.gz reads2: /path/to/reads_2.gq.gz - + Sample_2: reads: /path/to/reads.fq.gz Where `Sample_1` and `Sample_2` are samples name. @@ -158,7 +148,7 @@ Where `Sample_1` and `Sample_2` are samples name. `working_dir`: the folder into store data ##### Optional arguments - + `-p`: for each rule, show the shell command run. `-n`: dry run: show which rules will be launched without run anything. `--keep-wdir`: in dry run mode, don't remove working dir after launch @@ -174,7 +164,7 @@ Where `Sample_1` and `Sample_2` are samples name. ##### Command ./cnvpipelines.py run detection -r {fasta} -s {samples} -w {working_dir} -t {tools} - + With: `fasta`: the path to the fasta file (with all files of the reference bundle on the same folder). `samples`: a file with in each line the path to a bam file to analyse. @@ -182,7 +172,7 @@ With: `tools`: list of tools, space separated. Choose among genomestrip, delly, lumpy and pindel. ##### Optional arguments - + `-b INT`: size of batches (default: -1, to always make only 1 batch) `--chromosomes CHRS`: list of chromosomes to study, space separated. Regex accepted (using the [python syntax](https://docs.python.org/3/library/re.html#regular-expression-syntax)). Default: all valid chromosomes of the reference `--force-all-chromosomes`: ignore filtering if `--chromosomes` is not set @@ -196,13 +186,13 @@ With: `--out-step STEP`: specify the output rule file to only run the workflow until the associated rule (run all workflow if not specified) `--cluster-config FILE`: Path to a cluster config file (erase the cluster config present in configution) - + #### Merge batches ##### Command ./cnvpipelines.py run mergebatches -w {working_dir} - + With: `working_dir`: the detection run output folder. Must have at least 2 batches to run correctly for now. @@ -224,7 +214,7 @@ We integrate to cnvpipelines popsim, our tool to simulate a population with vari ##### Command ./cnvpipelines.py run simulation -nb {nb_inds} -r {reference} -sp {species} -t {tools} -w {working_dir} - + With: `nb_inds`: number of individuals to generate. `reference`: fasta file to use as reference for individuals simulated. @@ -233,7 +223,7 @@ With: `working_dir`: the folder into store data. ##### Description of variants - + `-s {svlist}`: a file describing the size distributions of variants. If not given, a default distribution is used. Structure of the file (tab separated columns): >DEL minLength maxLength proba -> Create DELetion(s). @@ -297,7 +287,7 @@ Example: ##### Command ./cnvpipelines.py rerun -w {working_dir} - + With: `working_dir`: the folder into data is stored @@ -319,7 +309,7 @@ With: #### Full clean ./cnvpipelines.py clean -w {working_dir} - + With: `working_dir`: the folder into data is stored @@ -330,7 +320,7 @@ With: #### Soft clean ./cnvpipelines.py soft-clean -w {working_dir} - + With: `working_dir`: the folder into data is stored @@ -343,7 +333,7 @@ With: If snakemake crashes or was killed, workflow should be locked. You can launch it by: ./cnvpipelines.py unlock -w {working_dir} - + With: `working_dir`: the folder into data is stored -- GitLab