Skip to content
Snippets Groups Projects
Commit e55f00b8 authored by Christophe Klopp's avatar Christophe Klopp
Browse files

Update paperCorrection.md Minor corrections

parent 8cd82ad7
No related branches found
No related tags found
No related merge requests found
......@@ -26,32 +26,31 @@ bibliography: paper.bib
# Summary
With the availability of cheap long read sequences and efficient genome assembly software packages, novel genome assemblies are made available frequently. It is not rare to produce multiple assemblies of the same or closely related species in a project. These assemblies being produced independently, raw assembly file include contigs, scaffolds or chromosomes in random order, orientation and with different sequence names. Depending on the availability of genomic long range information (Hi-C, optical maps, linked reads) the assemblies are sometime not at chromosome scale. One option to bring further the assembly state is to scaffold these assemblies using a reference genome. In addition, for related species, scientist are interested in comparing assemblies with each others, and focusing at chromosome level. In these cases, dot-plots are a simple and efficient approach to find large genomic rearrangements.
In both case, GenomOrder allow the dot-plot production or the scaffolding of your assembly in a fast and simple way.
The pipeline is implemented with Nextflow and can be run with Docker, code is available on Github.
With the availability of cheap long read sequences and efficient genome assembly software packages, novel genome assemblies are made available frequently. It is not rare to produce multiple assemblies of the same or closely related species in a project. These assemblies being produced independently, raw assembly file include contigs, scaffolds or chromosomes in random order, orientation and with different sequence names. Depending on the availability of genomic long range information (Hi-C, optical maps, linked reads) the assemblies are sometimes not at chromosome scale. One option to bring them further towards chromsomal state is to scaffold them using a reference genome. In addition, scientist are interested in comparing assemblies with each others, often focusing on the chromosome level to find large genomic rearrangements. Dot-plots are a simple and efficient approach to find these rearrangements.
In both cases, GenomOrder can also scaffold and generate chromosomal dot-plots in a fast and simple way.
The pipeline is implemented as Nextflow pipeline and can be run with Docker, code is available on Github.
# Statement of need
Produced genome assemblies often require to be scaffolded or at least compared to a reference. The fastest method for that requires the use of alignment and visualization tools that are not always compatible and need multiple parameters. In addition, if multiple assemblies are produced, it would be necessary to perform again numerous analyzes for each of the new assemblies. This is why GenomOrder was designed with the aim of making assembly alignments visualization and scaffolding more accessible and faster. That tool has to be used in combination with DGenies which already have numerous studies [Cabanettes:2018]. The major advantages of GenomOrder is that it allows quick and easy production of multiple alignments and rearrangements of assemblies while being reproducible.
Today, novel genome assemblies often organized as chromosomes. If a reference genome is already available, common practice is to follow its naming, sorting and orienting schemas for novel assemblies. If multiple assemblies are produced, this has to be performes for each of them. GenomOrder aims at easing organizing assemblies as a reference. It includes reference base scaffolding when the assemblies are not already in chromosomes as well as chrosmosome scale dot plot visualization. GenomOrder reference scaffolding and visualisation is based on DGenies [Cabanettes:2018]. GenomOrder is quick and easy to use.
# Material and Methods
## Features
This pipeline is implemented in Nextflow, a portable, reproducible, scalable and parallelizable workflow framework for pipelines [@Di:2017]. With Nextflow, GenomOrder pipeline is designed to parallelize and automate a list of process in a single command line, thus improving reproducibility and traceability while allowing rapid production. In addition, as the pipeline is developped with multiple different features, it allow users to customize command-line options to fit the desired behaviour.
This pipeline is implemented in Nextflow, a portable, reproducible, scalable and parallelizable workflow framework [@Di:2017]. With Nextflow, GenomOrder pipeline is designed to parallelize and automate a list of process in a single command line, thus improving reproducibility and traceability while allowing rapid production. In addition, as the pipeline is developped with multiple features, it allow users to customize command-line options to fit the desired behaviour.
GenomOrder is developped with two main modules : genomic assembly reorganisation and assemblies reference comparision.
GenomOrder can reorder, reorient and rename sequences from up to five assemblies according to the given reference assembly. If the reference assembly is in chromosomes and the other assemblies are not, genomeorder can scaffold the assemblies in chromosomes.
GenomOrder can reorder, reorient and rename sequences from up to five assemblies according to a given reference. If the reference assembly is in chromosomes and the other assemblies are not, genomeorder can scaffold the assemblies in chromosomes.
Given a list of chromosomes, GenomOrder will align the chromosomes sharing the same name and produce an all-vs-all chromosome visualisation archives for http://dgenies.toulouse.inra.fr/ [FIGURE X]. In addition, GenomOrder can simply align multiple assemblies against a given reference assembly and quickly produce dot-plot archive for http://dgenies.toulouse.inra.fr/ [FIGURE Y].
## Workflow
[FIGURE Z]
1. Input. GenomOrder require at least one assembly fasta file and one reference fasta file to produce an alignment and the resulting DGenies visualization files. Optionally, users can provide 4 more assemblies to be aligned to reference. Additionaly one option allow the users to scaffold the input assemblies against the reference, and one other is used to align chromosome from the different input assemblies against their equivalent in other assemblies.
1. Input. GenomOrder requires at least one assembly fasta file and one reference fasta file to produce an alignment and the resulting DGenies visualization files. Optionally, users can provide up to 4 other assemblies to be aligned to the reference. One option allows users to scaffold the input assemblies against the reference, and a other one aligns chromosome having the same name to produce dot plots.
2. Align and produce DGenies backup files. Assemblies are aligned to reference using minimap 2. Fasta file are then indexed and alignment file is sorted to produce an archive that can be given as input to DGenies for the dot-plot visualization.
2. Align and produce DGenies backup files. Assemblies are aligned to the reference using minimap 2. Fasta file are then indexed and alignment file is sorted to produce an archive that can be given as input to DGenies for the dot-plot visualization.
3. Arrange assembly. If input parameters '--arrange' is set to True, input assembly can be reordered in accordance with the reference. Index, fasta and alignment file are used through a Python script that reorganize the input assembly.
......@@ -63,16 +62,16 @@ Four configuration and file containing pipeline options and parameters are provi
## Installation and execution
GenomOrder can be cloned from https://forgemia.inra.fr/seqoccin/GenomOrder (A CHANGER SI ON PASSE SUR GITHUB GENOTOUL). A Docker container is available to help in running the pipeline easier. The pipeline can be run localy or with an high-performance computing environments (cluster). In addition, if all specific dependencies are installed, pipeline can be run without the docker container.
Further information about running the pipeline are available on github, in the config file, or with the '--help' command.
GenomOrder can be cloned from https://forgemia.inra.fr/seqoccin/GenomOrder (A CHANGER SI ON PASSE SUR GITHUB GENOTOUL). A Docker container is available to help running the pipeline without further dependency installation. The pipeline can be run localy or in an high-performance computing environments (cluster). If all specific dependencies are installed, pipeline can be run without the docker container.
Further information about how to run the pipeline are available on github, in the config file, or with the '--help' command.
## Output and Error Handling
Output files are arranged in the given ouput folder with '--output'. Each new folder contain only the principal output. Nextflow creates its own work folder to produce the intermediate output. Process logs and stored in the specific folder of each runs. If the run stop due to an error, user can fix this error and then run the pipeline with the initial command and '-resume' option.
Output files are stored in the ouput folder specified with the '--output' parameter. Each new folder contains only the principal output. Nextflow creates its own work folder to produce the intermediate output. Process logs are stored in a specific folder for each run. If the run stops due to an error, user can fix this error and then run the pipeline with the initial command and '-resume' option.
## Conclusion and discussions
The GenomOrder pipeline is user-friendly and provide a one-step analysis tool. Required options and parameters are limited and easily understandable. In addition, it will be easily possible to implement new options and functionalities requested by users, in order to satisfy their needs.
The GenomOrder pipeline is user-friendly and provides a one-step analysis tool. Required options and parameters are limited and easily understandable. In addition, it will be easily possible to implement new options and functionalities requested by users, in order to satisfy their needs.
# References
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment