From ecbe823af0d7e9a417a6523509e81b41797fe8f2 Mon Sep 17 00:00:00 2001
From: Thomas Faraut <Thomas.Faraut@inra.fr>
Date: Fri, 5 Jul 2019 17:12:34 +0200
Subject: [PATCH] new readme

---
 README.md | 48 +++++++++++++++++++-----------------------------
 1 file changed, 19 insertions(+), 29 deletions(-)

diff --git a/README.md b/README.md
index 693020c..6443c71 100644
--- a/README.md
+++ b/README.md
@@ -23,7 +23,7 @@ and additionnal third party softwares.
 Clone this repository:
 
     git clone --recursive https://forgemia.inra.fr/genotoul-bioinfo/cnvpipelines.git
-    
+
 Then, copy `application.properties.example` into `application.properties`. Configuration will be edited in next step.
 Third party sofwtares can best be installed using conda.
 
@@ -42,16 +42,6 @@ export PERL5LIB="$CONDA_HOME/envs/cnv/lib/perl5"
 
 ### 6. Additional softwares to install
 
-You must install pysamstats from the master branch of their repository to have a compatibility to pysam 0.14 (required by other components of the pipeline):
-
-    pip install git+https://github.com/alimanfoo/pysamstats.git
-    
-For svtyper, you need to have the parallel python3-recoded version. For now (awaiting pull request), install it like this:
-
-    pip install git+https://github.com/florealcab/svtyper.git
-
-You must install [genomestrip](http://software.broadinstitute.org/software/genomestrip/) and [lumpy](https://github.com/arq5x/lumpy-sv) using their install procedure. 
-
 You also need to install the RepBase (http://www.girinst.org/server/archive/RepBase21.12/ - choose this version as more recent ones are not compatible with 4.0.6 version of RepeatMaster). Download the Repbase-derived RepeatMasker libraries (repeatmaskerlibraries-20160829.tar.gz) Uncompress it in your save folder. It will create a Library folder. Then define the path to the Library folder inside the application.properties file (see below).
 
 If you run simulation, you need additional python modules: matplotlib and seaborn. Once you loaded your conda environment, just install them like that:
@@ -60,7 +50,7 @@ If you run simulation, you need additional python modules: matplotlib and seabor
 
 Special case of genologin cluster (genotoul):
 
-* Lumpy is already available through bioinfo/lumpy-v0.2.13. Just add it in the application.properties file. 
+* Lumpy is already available through bioinfo/lumpy-v0.2.13. Just add it in the application.properties file.
 * For genomestrip, you can use this folder: `/usr/local/bioinfo/src/GenomeSTRiP/svtoolkit_2.00.1774` (see configuration part, sv_dir point)
 
 ### 7. Future logins
@@ -70,7 +60,7 @@ For future logins, you must reactivate all conda environments. This means launch
     export PATH=$CONDA_HOME/bin:$PATH
     source activate cnv
     export PERL5LIB="$CONDA_HOME/envs/cnv/lib/perl5"
-    
+
 Where `$CONDA_HOME` is the folder in which you install miniconda in previous step.
 
 
@@ -80,7 +70,7 @@ To do simulations, you need to compile pirs, which is included as submodule of y
 
     make
 
-   
+
 ## Configuration
 
 Configuration should be edited in `application.properties` file. Sections and parameters are described below.
@@ -117,7 +107,7 @@ This section must be filled only if you don't use local as batch system type (se
 ##### Command
 
     ./cnvpipelines.py run refbundle -r {fasta} -s {species} -w {working_dir}
-    
+
 With:  
 `fasta`: the path of the reference fasta file  
 `species`: species name, according to the [NCBI Taxonomy database](http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html)  
@@ -142,15 +132,15 @@ With:
 ##### Command
 
     ./cnvpipelines.py run align -r {fasta} -s {samples} -w {working_dir}
-    
+
 With:  
 `fasta`: the path of the reference fasta file  
 `samples`: a YAML file describing for each sample it's name and fastq files (reads2, and optionally reads2). Example:
-    
+
     Sample_1:
       reads1: /path/to/reads_1.fq.gz
       reads2: /path/to/reads_2.gq.gz
-      
+
     Sample_2:
       reads: /path/to/reads.fq.gz
 Where `Sample_1` and `Sample_2` are samples name.
@@ -158,7 +148,7 @@ Where `Sample_1` and `Sample_2` are samples name.
 `working_dir`: the folder into store data
 
 ##### Optional arguments
-    
+
 `-p`: for each rule, show the shell command run.  
 `-n`: dry run: show which rules will be launched without run anything.  
 `--keep-wdir`: in dry run mode, don't remove working dir after launch  
@@ -174,7 +164,7 @@ Where `Sample_1` and `Sample_2` are samples name.
 ##### Command
 
     ./cnvpipelines.py run detection -r {fasta} -s {samples} -w {working_dir} -t {tools}
-    
+
 With:  
 `fasta`: the path to the fasta file (with all files of the reference bundle on the same folder).  
 `samples`: a file with in each line the path to a bam file to analyse.
@@ -182,7 +172,7 @@ With:
 `tools`: list of tools, space separated. Choose among genomestrip, delly, lumpy and pindel.
 
 ##### Optional arguments
-  
+
 `-b INT`: size of batches (default: -1, to always make only 1 batch)  
 `--chromosomes CHRS`: list of chromosomes to study, space separated. Regex accepted (using the [python syntax](https://docs.python.org/3/library/re.html#regular-expression-syntax)). Default: all valid chromosomes of the reference  
 `--force-all-chromosomes`: ignore filtering if `--chromosomes` is not set  
@@ -196,13 +186,13 @@ With:
 `--out-step STEP`: specify the output rule file to only run the workflow until the associated rule (run all workflow if not specified)  
 `--cluster-config FILE`: Path to a cluster config file (erase the cluster
                     config present in configution)
-                    
+
 #### Merge batches
 
 ##### Command
 
     ./cnvpipelines.py run mergebatches -w {working_dir}
-    
+
 With:  
 `working_dir`: the detection run output folder. Must have at least 2 batches to run correctly for now.
 
@@ -224,7 +214,7 @@ We integrate to cnvpipelines popsim, our tool to simulate a population with vari
 ##### Command
 
     ./cnvpipelines.py run simulation -nb {nb_inds} -r {reference} -sp {species} -t {tools} -w {working_dir}
-    
+
 With:
 `nb_inds`: number of individuals to generate.  
 `reference`: fasta file to use as reference for individuals simulated.  
@@ -233,7 +223,7 @@ With:
 `working_dir`: the folder into store data.
 
 ##### Description of variants
- 
+
 `-s {svlist}`: a file describing the size distributions of variants. If not given, a default distribution is used.  
 Structure of the file (tab separated columns):  
 >DEL minLength maxLength proba -> Create DELetion(s).  
@@ -297,7 +287,7 @@ Example:
 ##### Command
 
     ./cnvpipelines.py rerun -w {working_dir}
-    
+
 With:
 `working_dir`: the folder into data is stored
 
@@ -319,7 +309,7 @@ With:
 #### Full clean
 
     ./cnvpipelines.py clean -w {working_dir}
-    
+
 With:  
 `working_dir`: the folder into data is stored
 
@@ -330,7 +320,7 @@ With:
 #### Soft clean
 
     ./cnvpipelines.py soft-clean -w {working_dir}
-    
+
 With:  
 `working_dir`: the folder into data is stored
 
@@ -343,7 +333,7 @@ With:
 If snakemake crashes or was killed, workflow should be locked. You can launch it by:
 
     ./cnvpipelines.py unlock -w {working_dir}
-    
+
 With:  
 `working_dir`: the folder into data is stored
 
-- 
GitLab