Newer
Older
> **`/!\`:** Act with care; this workflow uses significant memory if you increase the values in `.masterconfig`. We recommend keeping the default settings and running a test first.
> **`/!\`:** For now dont run multiple split at once
> **`/!\`:** The Transduplication and Reciprocal Translocation sections in the `visor_sv_type.yaml` config file are placeholders; do not use them yet.
### 1. Set up
Clone the Git repository and switch to my branch:
git clone https://forgemia.inra.fr/pangepop/MSpangepop.git
cd MSpangepop
git checkout dev_lpiat
- Add a `.fasta.gz` file; an example can be found in the repository.
- Edit the `.masterconfig` file in the `.config/` directory with your sample information.
- Edit the `visor_sv_type.yaml` file with the mutations you want.
- Edit line 17 of `job.sh` and line 13 of `./config/snakemake_profile/clusterconfig.yaml` with your email.
The workflow has two parts: `split` and `simulate`. Always run the split first and once its done (realy quick) run the simulate.
> **Nb 1:** to create a visual representation of the workflow, use `dag` instead of `dry`. Open the generated `.dot` file with a [viewer](https://dreampuf.github.io/GraphvizOnline/) that supports the format.
> **Nb 2:** Frist execution of the workflow will be slow since images need to be pulled.
> **Nb 3:** The workflow is in two parts because we want to execute the simulations chromosome by chromosome. Snakemake cannot retrieve the number of chromosomes in one go and needs to index and split first.
> **Nb 4:** Since the cbib dose not support `python:3.9.7` we cant use cookie cutter config, use the `cbib_job.sh` to run.
## B. Run localy
- Ensure `snakemake` and `singularity` are installed on your machine, then run the workflow:
```
If the workflow cannot download images from the container registry, install `Docker`, log in with your credentials, and rerun the workflow:
docker login -u "<your_username>" -p "<your_token>" "registry.forgemia.inra.fr"
```
The variants generation is inspired by [VISOR](https://github.com/davidebolo1993/VISOR).
You can extract a VCF from the graph using the `vg deconstruct` command. It is not implemented in the pipeline.
You can use the script `workflow/scripts/split_path.sh` to cut the final fasta into chromosome level fasta files.
```bash
./split_fasta.sh input.fasta /path/to/output_directory
```
pandas, msprime, argprase, os, multiprocessing, yaml, Bio.Seq
singularity, snakemake
vg:1.60.0, bcftools:1.12, bgzip:latest, tabix:1.7.