Skip to content
Snippets Groups Projects
Commit ecbe823a authored by Thomas Faraut's avatar Thomas Faraut
Browse files

new readme

parent 4abab084
No related branches found
No related tags found
1 merge request!14Newinstall
......@@ -23,7 +23,7 @@ and additionnal third party softwares.
Clone this repository:
git clone --recursive https://forgemia.inra.fr/genotoul-bioinfo/cnvpipelines.git
Then, copy `application.properties.example` into `application.properties`. Configuration will be edited in next step.
Third party sofwtares can best be installed using conda.
......@@ -42,16 +42,6 @@ export PERL5LIB="$CONDA_HOME/envs/cnv/lib/perl5"
### 6. Additional softwares to install
You must install pysamstats from the master branch of their repository to have a compatibility to pysam 0.14 (required by other components of the pipeline):
pip install git+https://github.com/alimanfoo/pysamstats.git
For svtyper, you need to have the parallel python3-recoded version. For now (awaiting pull request), install it like this:
pip install git+https://github.com/florealcab/svtyper.git
You must install [genomestrip](http://software.broadinstitute.org/software/genomestrip/) and [lumpy](https://github.com/arq5x/lumpy-sv) using their install procedure.
You also need to install the RepBase (http://www.girinst.org/server/archive/RepBase21.12/ - choose this version as more recent ones are not compatible with 4.0.6 version of RepeatMaster). Download the Repbase-derived RepeatMasker libraries (repeatmaskerlibraries-20160829.tar.gz) Uncompress it in your save folder. It will create a Library folder. Then define the path to the Library folder inside the application.properties file (see below).
If you run simulation, you need additional python modules: matplotlib and seaborn. Once you loaded your conda environment, just install them like that:
......@@ -60,7 +50,7 @@ If you run simulation, you need additional python modules: matplotlib and seabor
Special case of genologin cluster (genotoul):
* Lumpy is already available through bioinfo/lumpy-v0.2.13. Just add it in the application.properties file.
* Lumpy is already available through bioinfo/lumpy-v0.2.13. Just add it in the application.properties file.
* For genomestrip, you can use this folder: `/usr/local/bioinfo/src/GenomeSTRiP/svtoolkit_2.00.1774` (see configuration part, sv_dir point)
### 7. Future logins
......@@ -70,7 +60,7 @@ For future logins, you must reactivate all conda environments. This means launch
export PATH=$CONDA_HOME/bin:$PATH
source activate cnv
export PERL5LIB="$CONDA_HOME/envs/cnv/lib/perl5"
Where `$CONDA_HOME` is the folder in which you install miniconda in previous step.
......@@ -80,7 +70,7 @@ To do simulations, you need to compile pirs, which is included as submodule of y
make
## Configuration
Configuration should be edited in `application.properties` file. Sections and parameters are described below.
......@@ -117,7 +107,7 @@ This section must be filled only if you don't use local as batch system type (se
##### Command
./cnvpipelines.py run refbundle -r {fasta} -s {species} -w {working_dir}
With:
`fasta`: the path of the reference fasta file
`species`: species name, according to the [NCBI Taxonomy database](http://www.ncbi.nlm.nih.gov/Taxonomy/taxonomyhome.html)
......@@ -142,15 +132,15 @@ With:
##### Command
./cnvpipelines.py run align -r {fasta} -s {samples} -w {working_dir}
With:
`fasta`: the path of the reference fasta file
`samples`: a YAML file describing for each sample it's name and fastq files (reads2, and optionally reads2). Example:
Sample_1:
reads1: /path/to/reads_1.fq.gz
reads2: /path/to/reads_2.gq.gz
Sample_2:
reads: /path/to/reads.fq.gz
Where `Sample_1` and `Sample_2` are samples name.
......@@ -158,7 +148,7 @@ Where `Sample_1` and `Sample_2` are samples name.
`working_dir`: the folder into store data
##### Optional arguments
`-p`: for each rule, show the shell command run.
`-n`: dry run: show which rules will be launched without run anything.
`--keep-wdir`: in dry run mode, don't remove working dir after launch
......@@ -174,7 +164,7 @@ Where `Sample_1` and `Sample_2` are samples name.
##### Command
./cnvpipelines.py run detection -r {fasta} -s {samples} -w {working_dir} -t {tools}
With:
`fasta`: the path to the fasta file (with all files of the reference bundle on the same folder).
`samples`: a file with in each line the path to a bam file to analyse.
......@@ -182,7 +172,7 @@ With:
`tools`: list of tools, space separated. Choose among genomestrip, delly, lumpy and pindel.
##### Optional arguments
`-b INT`: size of batches (default: -1, to always make only 1 batch)
`--chromosomes CHRS`: list of chromosomes to study, space separated. Regex accepted (using the [python syntax](https://docs.python.org/3/library/re.html#regular-expression-syntax)). Default: all valid chromosomes of the reference
`--force-all-chromosomes`: ignore filtering if `--chromosomes` is not set
......@@ -196,13 +186,13 @@ With:
`--out-step STEP`: specify the output rule file to only run the workflow until the associated rule (run all workflow if not specified)
`--cluster-config FILE`: Path to a cluster config file (erase the cluster
config present in configution)
#### Merge batches
##### Command
./cnvpipelines.py run mergebatches -w {working_dir}
With:
`working_dir`: the detection run output folder. Must have at least 2 batches to run correctly for now.
......@@ -224,7 +214,7 @@ We integrate to cnvpipelines popsim, our tool to simulate a population with vari
##### Command
./cnvpipelines.py run simulation -nb {nb_inds} -r {reference} -sp {species} -t {tools} -w {working_dir}
With:
`nb_inds`: number of individuals to generate.
`reference`: fasta file to use as reference for individuals simulated.
......@@ -233,7 +223,7 @@ With:
`working_dir`: the folder into store data.
##### Description of variants
`-s {svlist}`: a file describing the size distributions of variants. If not given, a default distribution is used.
Structure of the file (tab separated columns):
>DEL minLength maxLength proba -> Create DELetion(s).
......@@ -297,7 +287,7 @@ Example:
##### Command
./cnvpipelines.py rerun -w {working_dir}
With:
`working_dir`: the folder into data is stored
......@@ -319,7 +309,7 @@ With:
#### Full clean
./cnvpipelines.py clean -w {working_dir}
With:
`working_dir`: the folder into data is stored
......@@ -330,7 +320,7 @@ With:
#### Soft clean
./cnvpipelines.py soft-clean -w {working_dir}
With:
`working_dir`: the folder into data is stored
......@@ -343,7 +333,7 @@ With:
If snakemake crashes or was killed, workflow should be locked. You can launch it by:
./cnvpipelines.py unlock -w {working_dir}
With:
`working_dir`: the folder into data is stored
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment