You must install pysamstats from the master branch of their repository to have a compatibility to pysam 0.14 (required by other components of the pipeline):
You must install [genomestrip](http://software.broadinstitute.org/software/genomestrip/) and [lumpy](https://github.com/arq5x/lumpy-sv) using their install procedure.
You also need to install the RepBase (http://www.girinst.org/server/archive/RepBase21.12/ - choose this version as more recent ones are not compatible with 4.0.6 version of RepeatMaster). Download the Repbase-derived RepeatMasker libraries (repeatmaskerlibraries-20160829.tar.gz) Uncompress it in your save folder. It will create a Library folder. Then define the path to the Library folder inside the application.properties file (see below).
If you run simulation, you need additional python modules: matplotlib and seaborn. Once you loaded your conda environment, just install them like that:
...
...
@@ -60,7 +50,7 @@ If you run simulation, you need additional python modules: matplotlib and seabor
Special case of genologin cluster (genotoul):
* Lumpy is already available through bioinfo/lumpy-v0.2.13. Just add it in the application.properties file.
* Lumpy is already available through bioinfo/lumpy-v0.2.13. Just add it in the application.properties file.
* For genomestrip, you can use this folder: `/usr/local/bioinfo/src/GenomeSTRiP/svtoolkit_2.00.1774` (see configuration part, sv_dir point)
### 7. Future logins
...
...
@@ -70,7 +60,7 @@ For future logins, you must reactivate all conda environments. This means launch
export PATH=$CONDA_HOME/bin:$PATH
source activate cnv
export PERL5LIB="$CONDA_HOME/envs/cnv/lib/perl5"
Where `$CONDA_HOME` is the folder in which you install miniconda in previous step.
...
...
@@ -80,7 +70,7 @@ To do simulations, you need to compile pirs, which is included as submodule of y
make
## Configuration
Configuration should be edited in `application.properties` file. Sections and parameters are described below.
...
...
@@ -117,7 +107,7 @@ This section must be filled only if you don't use local as batch system type (se
##### Command
./cnvpipelines.py run refbundle -r {fasta} -s {species} -w {working_dir}
With:
`fasta`: the path of the reference fasta file
`species`: species name, according to the [NCBI Taxonomy database](http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html)
...
...
@@ -142,15 +132,15 @@ With:
##### Command
./cnvpipelines.py run align -r {fasta} -s {samples} -w {working_dir}
With:
`fasta`: the path of the reference fasta file
`samples`: a YAML file describing for each sample it's name and fastq files (reads2, and optionally reads2). Example:
Sample_1:
reads1: /path/to/reads_1.fq.gz
reads2: /path/to/reads_2.gq.gz
Sample_2:
reads: /path/to/reads.fq.gz
Where `Sample_1` and `Sample_2` are samples name.
...
...
@@ -158,7 +148,7 @@ Where `Sample_1` and `Sample_2` are samples name.
`working_dir`: the folder into store data
##### Optional arguments
`-p`: for each rule, show the shell command run.
`-n`: dry run: show which rules will be launched without run anything.
`--keep-wdir`: in dry run mode, don't remove working dir after launch
...
...
@@ -174,7 +164,7 @@ Where `Sample_1` and `Sample_2` are samples name.
`fasta`: the path to the fasta file (with all files of the reference bundle on the same folder).
`samples`: a file with in each line the path to a bam file to analyse.
...
...
@@ -182,7 +172,7 @@ With:
`tools`: list of tools, space separated. Choose among genomestrip, delly, lumpy and pindel.
##### Optional arguments
`-b INT`: size of batches (default: -1, to always make only 1 batch)
`--chromosomes CHRS`: list of chromosomes to study, space separated. Regex accepted (using the [python syntax](https://docs.python.org/3/library/re.html#regular-expression-syntax)). Default: all valid chromosomes of the reference
`--force-all-chromosomes`: ignore filtering if `--chromosomes` is not set
...
...
@@ -196,13 +186,13 @@ With:
`--out-step STEP`: specify the output rule file to only run the workflow until the associated rule (run all workflow if not specified)
`--cluster-config FILE`: Path to a cluster config file (erase the cluster
config present in configution)
#### Merge batches
##### Command
./cnvpipelines.py run mergebatches -w {working_dir}
With:
`working_dir`: the detection run output folder. Must have at least 2 batches to run correctly for now.
...
...
@@ -224,7 +214,7 @@ We integrate to cnvpipelines popsim, our tool to simulate a population with vari