Cnvpipelines is a workflow to detect Copy Number Variation (CNV) variants (DEL, INV, MULTICOPY). It's build as a snakemake workflow with a wrapper to ease job run.
## In development...
This workflow is still in development. For now, only DEL variants are available. Also, genomestrip still fail. Please use only other tools for the moment.
## Requirements
All tools used by the workflow must be available in your PATH (can be loaded as modules or added to PATH in config file, see below):
- delly
- lumpy
- pindel
- svtyper
- samtools
- sambamba (confirm?)
- bedtools
- parallel
For genomestrip, you must define the `sv_dir` parameter in the configuration (see below).
Other dependencies:
- python3 >= 3.4
- python 2.7
Python 3 modules required:
- pysam
- pybedtools
- numpy
- pandas (might be deleted, not really used in lib/svfilter.py... Thomas ?)
Then, copy `application.properties.example``application.properties`. Configuration will be edited in next step
## Configuration
Configuration should be edited in `application.properties` file. Sections and parameters are described below.
### Global section
**batch_system_type*: `local` to run all jobs locally, or `slurm` or `sge` to submit on a cluster (depending on which cluster you have) (default: local).
**modules*: list of modules to load before launching the workflow, space separated.
**paths*: list of paths to add to the global PATH environment variable.
**jobs*: maximum number of jobs to submit concurrently (default: 999).
**sv_dir*: absolute path to the `svtoolkit` folder (for genomestrip).
### Cluster section
This section must be filled only if you don't use local as batch system type (see above).
**submission_mode*: `drmaa` to submit jobs through DRMAA API, `cluster` to submit jobs through bash commands.
**submission_command*: if you choose `cluster` for `submission_mode`, you must specify the command used to submit jobs (e.g.: srun, qsub).
**drmaa*: if you choose `drmaa` for `submission_mode`, you must specify the absolute path to the DRMAA library on the cluster.
**native_submission_options*: options passed to the submission command. Should be kept as it on most cases.
**config*: absolute path to the config file defining for each rule the amount of memory and cluster threads to ask for (you should use the cluster.yaml file as a model). Can be kept as it.
## Run
### Run a new workflow
./cnvpipelines.py run -r {fasta} -s {samples} -w {working_dir} -t {tools}
With:
`fasta`: the path to the fasta file (with all files of the reference bundl on the same folder).
`samples`: a file with in each line the path to a bam file to analyse.
`working_dir`: the folder into store data
`tools`: list of tools, space separated
Optional arguments:
*`-p`: for each rule, show the shell command run.
*`-n`: dry run: show which rules will be launched without run anything.
*`--cluster-config`: erase the default cluster config file (see above) with a new one.
*`-c`: clean after launch: keep only filtered results.