Newer
Older
POPSIM 0.2
Created by: Floreal Cabanettes
Contact: floreal.cabanettes@inra.fr
Popsim is a population CNV simulator. Generates FASTQ reads that can be used as input of a
CNV detection workflow, like CNV detection jflow workflows suites.
####################
# How to install ? #
####################
1# Download the software from our repository using git:
git clone --recursive git+ssh://floreal@scm.mulcyber.toulouse.inra.fr//var/lib/gforge/chroot/scmrepos/git/popsim/popsim.git
Don't forget the "--recursive" option!
2# Compile the pirs software:
cd pirs
make
################
# Requirements #
################
- Python 2
- Python 3
- BioPython for Python3
################
# How to use ? #
################
Run the script build_pop.py
Required parameters:
- nb-inds: the number of individuals you want
- reference: the reference fasta file used to build individuals
- sv-list: a file that list of structure variants to build (see below)
Optional parameters:
- coverage: mean coverage of the reads for each individual (default: 15)
- output-directory: directory where outputs will be written (see below) (default: <current>/res)
- tmp-directory: temporary directory (default: <current>/tmp)
####################################
# How to create the SV list file ? #
####################################
This file describe the SV you want to create, and their size:
DEL startLength [endLength] [increment] - Create DELetion(s).
DUP startLength [endLength] [increment] - Create tandem DUPlication(s).
INV startLength [endLength] [increment] - Create in-place INVersion(s).
INR startLength [endLength] [increment] - Create INsertions from a Random source region. Each instance has a new source.
If endLength is not specified, it defaults to the value of startLength (i.e. one event will be created).
If increment is not specified, it defaults to 1.
1 line by SV type, like above.
IMPORTANT: only deletion is implemented yet.
We use SVsim to generate positions of SVs. https://github.com/GregoryFaust/SVsim
###########
# Outputs #
###########
In the output directory there are:
- A vcf file (genotypes.vcf) that list for each SV their position and the genotype for each individual
- For each individual, 2 fastq files of the paired end reads (INDIV_%d_100_180_(1|2).fq.gz)
###############
# Limitations #
###############
As pirs works only on Intel CPUs, this software is only compatible with Intel. Tested and failed on AMD CPUs.
######################
# External softwares #
######################
This software use several external programs:
- SVsim (included): https://github.com/GregoryFaust/SVsim
- pirs (included): https://github.com/galaxy001/pirs
- SeqIO from BioPython: http://biopython.org/wiki/SeqIO