Commit 87bf8bea authored by Floreal Cabanettes's avatar Floreal Cabanettes
Browse files

Add README and LICENCE + requirements file

parent a40f987f
This diff is collapsed.
A tool for identifying genes and introns based on mapping of De novo RNA-seq contigs on a reference genome.
This tools also identify indels between the contigs and the reference genome.
How it works?
1. [Minimap2]( is used to map contigs on the reference genome.
2. We parse matches and get all of them with at least 90% of their sequence matching the reference (this value can be customized)
3. For each contig:
- mapping bounds defines the gene position
- splits defines the exons: the N are the intron, matching part are the exons
- we save all indels from CIGAR string
4. Save all genes, exons and indels into an unique GTF file
This program requires at least Python 3.5.
Some python modules are required:
- pysam == 0.14.*
- PyYAML == 3.13
Install them in one command line:
pip3 install -r requirements.txt
How to use?
./ -r {reference_file} -a {contigs} -o {output_folder}
`reference_file`: path to the reference fasta file
`contigs`: path to the contigs fasta file
`output_folder`: path to the output folder where results will be stored. If not exists, will be created.
### Optional arguments
`-m {map}`: give your bam file to the program, if already computed. It will be used instead of building it with minimap
`-q {min-overlap}`: set the minimum percent size of the contig which needs to match the reference (other matches will be ignored). Default: 90.
If minimap2 or samtools is not in your PATH, set in miniannotator.yml their path.
You can also define the number of threads to use for minimap2.
\ No newline at end of file
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment