Commit 342ba619 authored by Celine Noirot's avatar Celine Noirot
Browse files

Merge branch 'dev' into 'master'

MR before tag 2.1

See merge request !11
parents 7a403a1c cf858367
.nextflow*
work/
functional_tests/__pycache__
functional_tests/*.pyc
\ No newline at end of file
# recipe for building singularity image and deploy it on the registery for bwa version 0.7.17
image:
name: quay.io/singularity/singularity:v3.4.0
entrypoint: [""]
stages:
- build
- deploy
# Build Singularity container bwa_v0.7.17.sif
singularity-image:
stage: build
script:
- singularity build metagWGS.sif env/Singularity_recipe_metagWGS
- singularity build eggnog_mapper.sif env/Singularity_recipe_eggnog_mapper
artifacts:
paths:
- metagWGS.sif
- eggnog_mapper.sif
only:
changes:
- .gitlab-ci.yml
- env/*
# Push the image template.sif on the registry
deploy:
stage: deploy
script:
- singularity push --docker-username "${CI_REGISTRY_USER}" --docker-password "${CI_REGISTRY_PASSWORD}" metagWGS.sif oras://"$CI_REGISTRY_IMAGE"/"$CI_PROJECT_NAME":"$CI_COMMIT_TAG"
- singularity push --docker-username "${CI_REGISTRY_USER}" --docker-password "${CI_REGISTRY_PASSWORD}" eggnog_mapper.sif oras://"$CI_REGISTRY_IMAGE"/eggnog_mapper:"$CI_COMMIT_TAG"
only:
changes:
- .gitlab-ci.yml
- env/*
......@@ -35,7 +35,7 @@ metagWGS is splitted into different steps that correspond to different parts of
* `07_taxo_affi`
* taxonomically affiliates the genes ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* taxonomically affiliates the contigs ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_idxstats_percontig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_idxstats_percontig_lineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_contig_quantif_perlineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_contig_quantif_perlineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
* `08_binning` from [nf-core/mag 1.0.0](https://github.com/nf-core/mag/releases/tag/1.0.0)
* makes binning of contigs ([MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/))
* assesses bins ([BUSCO](https://busco.ezlab.org/) + [metaQUAST](http://quast.sourceforge.net/metaquast) + [summary_busco.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/summary_busco.py) and [combine_tables.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/combine_tables.py) from [nf-core/mag](https://github.com/nf-core/mag))
......
#!/usr/bin/env python3
"""----------------------------------------------------------------------------
Script Name: filter_diamond_hits.py
Description: Keep best diamond hits for each query gene/protein
based on best bitscore and filter out query with low identity and low coverage
Adapted from best_bitscore_diamond.py script of Joanna Fourquet
Input files: Diamond output file (.m8)
Created By: Jean Mainguy
Date: 2021-08-02
-------------------------------------------------------------------------------
"""
# Metadata
__author__ = 'Mainguy Jean - Plateforme bioinformatique Toulouse'
__copyright__ = 'Copyright (C) 2021 INRAE'
__license__ = 'GNU General Public License'
__version__ = '0.1'
__email__ = 'support.bioinfo.genotoul@inra.fr'
__status__ = 'dev'
from argparse import ArgumentParser, ArgumentDefaultsHelpFormatter, FileType
import logging
import sys
import csv
def get_hits_with_highest_bitscore(hits):
highest_bitscore = max([float(hit['bitScore']) for hit in hits])
return [hit for hit in hits if float(hit['bitScore']) == highest_bitscore]
def get_all_hits_per_query(blast_result_file, header_list):
# Assertion: Hit are already sorted by query in diamond output.
# Both commands should output the same number of line:
# cut -f1 blast_result_file | uniq | wc -l
# cut -f1 blast_result_file | sort | uniq | wc -l
with open(blast_result_file) as in_fl:
result_reader = csv.DictReader(in_fl, delimiter='\t', fieldnames=header_list)
query_ids_processed = []
current_query_id = None
hits = []
for hit in result_reader:
if not current_query_id:
current_query_id = hit['queryId']
if current_query_id and current_query_id != hit['queryId']:
yield hits
hits = []
current_query_id = hit['queryId']
assert current_query_id not in query_ids_processed, f"Queries are not sorted in blast result. Query {current_query_id} is found in different part of the file."
query_ids_processed.append(current_query_id)
hits.append(hit)
if current_query_id:
yield hits
def is_identity_and_coverage_ok(hit, min_identity, min_coverage):
qcovhsp = (int(hit["queryEnd"]) - int(hit["queryStart"]) + 1) / int(hit['queryLength'])
if float(hit['percIdentity']) >= min_identity or qcovhsp >= min_coverage:
return True
return False
def parse_arguments():
"""Parse script arguments."""
parser = ArgumentParser(description="...",
formatter_class=ArgumentDefaultsHelpFormatter)
parser.add_argument('aln_input_file',
help="File with blast/diamond matches expected format m8 \
\nqueryId, subjectId, percIdentity, alnLength, mismatchCount, gapOpenCount,\
queryStart, queryEnd, subjectStart, subjectEnd, eVal, bitScore")
parser.add_argument('-o', '--output_file', type=str,
default="best_hit.tsv", help=("string specifying output file path"))
parser.add_argument('-i', '--min_identity', default=60, type=float,
help="percentage of identity")
parser.add_argument('-c', '--min_coverage', default=70, type=float,
help="percentage of coverage")
parser.add_argument("-v", "--verbose", help="increase output verbosity",
action="store_true")
args = parser.parse_args()
return args
def main():
args = parse_arguments()
if args.verbose:
logging.basicConfig(format="%(levelname)s: %(message)s", level=logging.DEBUG)
logging.info('Mode verbose ON')
else:
logging.basicConfig(format="%(levelname)s: %(message)s")
headers = "queryId subjectId percIdentity alnLength mismatchCount gapOpenCount queryStart queryEnd subjectStart subjectEnd eVal bitScore queryLength subjectLength subjectTitle"
header_list = headers.split(' ')
blast_result = args.aln_input_file
outfile = args.output_file
min_coverage = args.min_coverage
min_identity = args.min_identity
best_hit_count = 0
query_count_with_low_hit = 0
with open(outfile, 'w') as out_fl:
writer = csv.DictWriter(out_fl, fieldnames=header_list, delimiter='\t')
for query_i, query_hits in enumerate(get_all_hits_per_query(blast_result, header_list)):
if query_i % 10000 == 0:
logging.info(f'{query_i} queries processed... ')
correct_hits = [hit for hit in query_hits if is_identity_and_coverage_ok(
hit, min_identity, min_coverage)]
if not correct_hits:
query_count_with_low_hit += 1
continue
best_hits = get_hits_with_highest_bitscore(correct_hits)
for best_hit in best_hits:
best_hit_count += 1
writer.writerow(best_hit)
logging.info(f'{query_count_with_low_hit} queries ({100*query_count_with_low_hit/(query_i+1):.2f}%) have low hits that do not pass identity ({min_identity}%) or coverage ({min_coverage}%) thresholds')
logging.info(f'{best_hit_count} best hits of {query_i+1 - query_count_with_low_hit } queries have been written in {outfile}.')
if __name__ == '__main__':
main()
......@@ -89,21 +89,21 @@ concat_diamond_files = pd.DataFrame()
# Concatenate diamond files.
for (diamond_idx,diamond_path) in enumerate(diamond_files):
diamond_file = pd.read_csv(diamond_path, delimiter='\t', decimal='.', header=None)
diamond_file.loc[:,1] = 'https://www.ncbi.nlm.nih.gov/protein/' + diamond_file.loc[:,1]
group_diamond_file = diamond_file.groupby(diamond_file.columns[0])\
.agg({diamond_file.columns[14] : ';'.join, diamond_file.columns[1] : ','.join})\
.reset_index()\
.reindex(columns=diamond_file.columns)
res_diamond_file = group_diamond_file.iloc[:,[0,1,14]]
diamond_columns = ["qseqid","sseqid","pident","length","mismatch","gapopen","qstart","qend","sstart","send","evalue","bitscore","qlen","slen","stitle"]
diamond_file = pd.read_csv(diamond_path, delimiter='\t', decimal='.', header=None, names=diamond_columns)
diamond_file.loc[:,"sseqid"] = 'https://www.ncbi.nlm.nih.gov/protein/' + diamond_file.loc[:,"sseqid"]
group_diamond_file = diamond_file.groupby("qseqid")\
.agg({"stitle" : ';'.join, "sseqid" : ','.join})\
.reset_index()\
.reindex(columns=diamond_file.columns)
res_diamond_file = group_diamond_file.loc[:,["qseqid","sseqid","stitle"]]
concat_diamond_files = pd.concat([concat_diamond_files, res_diamond_file])
# Merge counts, annotation and diamond results.
merge_annot = pd.merge(counts_file,concat_eggnog_mapper_files,left_on="seed_cluster",right_on='#query_name', how='left')
merge = pd.merge(merge_annot,concat_diamond_files,left_on="seed_cluster",right_on=concat_diamond_files.columns[0], how='left')
merge = pd.merge(merge_annot,concat_diamond_files,left_on="seed_cluster",right_on="qseqid", how='left')
merge.drop('#query_name', inplace=True, axis=1)
merge.drop(merge.columns[28], inplace=True, axis=1)
res_merge = merge.rename(columns = {1: 'diamond_db_id', 14: 'diamond_db_description'})
merge.drop("qseqid", inplace=True, axis=1)
# Write merge data frame in output file.
res_merge.to_csv(args.output_file, sep="\t", index=False)
merge.to_csv(args.output_file, sep="\t", index=False)
#!/usr/bin/env python
"""--------------------------------------------------------------------
Script Name: merge_idxstats_percontig_lineage.py
Description: merge idstats and .percontig.tsv files for one sample.
Input files: idxstats file and percontig.tsv file.
Script Name: merge_contig_quantif_perlineage.py
Description: merge quantifications and lineage into one matrice for one sample.
Input files: idxstats file, depth from mosdepth (bed.gz) and lineage percontig.tsv file.
Created By: Joanna Fourquet
Date: 2021-01-19
-----------------------------------------------------------------------
......@@ -37,11 +37,14 @@ print(str(datetime.now()))
# Manage parameters.
parser = argparse.ArgumentParser(description = 'Script which \
merge idstats and .percontig.tsv files for one sample.')
merge quantifications and lineage into one matrice for one sample.')
parser.add_argument('-i', '--idxstats_file', required = True, \
help = 'idxstats file.')
parser.add_argument('-m', '--mosdepth_file', required = True, \
help = 'depth per contigs from mosdepth (regions.bed.gz).')
parser.add_argument('-c', '--percontig_file', required = True, \
help = '.percontig.tsv file.')
......@@ -56,42 +59,56 @@ args = parser.parse_args()
# Recovery of idxstats file.
idxstats = pd.read_csv(args.idxstats_file, delimiter='\t', header=None)
idxstats.columns = ["contig","len","mapped","unmapped"]
# Recovery of mosdepth file; remove start/end columns
mosdepth = pd.read_csv(args.mosdepth_file, delimiter='\t', header=None,compression='gzip')
mosdepth.columns = ["contig","start","end","depth"]
mosdepth.drop(["start","end"], inplace=True,axis=1)
# Recovery of .percontig.tsv file.
percontig = pd.read_csv(args.percontig_file, delimiter='\t', dtype=str)
# Merge idxstats and .percontig.tsv files.
merge = pd.merge(idxstats,percontig,left_on=0,right_on='#contig', how='outer')
merge = pd.merge(idxstats,percontig,left_on='contig',right_on='#contig', how='outer')
# Add depth
merge = pd.merge(merge,mosdepth,left_on='contig',right_on='contig', how='outer')
# Fill NaN values to keep unmapped contigs.
merge['consensus_lineage'] = merge['consensus_lineage'].fillna('Unknown')
merge['tax_id_by_level'] = merge['tax_id_by_level'].fillna(1)
merge['consensus_tax_id'] = merge['consensus_tax_id'].fillna(1)
# Group by lineage and sum number of reads and contigs.
res = merge.groupby(['consensus_lineage','consensus_tax_id', 'tax_id_by_level']).agg({0 : [';'.join, 'count'], 2: 'sum'}).reset_index()
res.columns=['lineage_by_level', 'consensus_tax_id', 'tax_id_by_level', 'name_contigs', 'nb_contigs', 'nb_reads']
print(res.head())
# Fill the NaN by 0.
res = merge.groupby(['consensus_lineage','consensus_tax_id', 'tax_id_by_level']).agg({'contig' : [';'.join, 'count'], 'mapped': 'sum', 'depth': 'mean'}).reset_index()
res.columns=['lineage_by_level', 'consensus_tax_id', 'tax_id_by_level', 'name_contigs', 'nb_contigs', 'nb_reads', 'depth']
# Fill NaN values with 0.
res.fillna(0, inplace=True)
# Split by taxonomic level
res_split_tax_id = res.join(res['tax_id_by_level'].str.split(pat=";",expand=True))
res_split_tax_id.columns=['consensus_lineage', 'consensus_taxid', 'tax_id_by_level', 'name_contigs', 'nb_contigs', 'nb_reads', "superkingdom_tax_id", "phylum_tax_id", "order_tax_id", "class_tax_id", "family_tax_id", "genus_tax_id", "species_tax_id"]
res_split_tax_id.columns=['consensus_lineage', 'consensus_taxid', 'tax_id_by_level', 'name_contigs', 'nb_contigs', 'depth', 'nb_reads', "superkingdom_tax_id", "phylum_tax_id", "order_tax_id", "class_tax_id", "family_tax_id", "genus_tax_id", "species_tax_id"]
res_split_tax_id.fillna(value='no_affi', inplace = True)
print(res_split_tax_id.head())
res_split = res_split_tax_id.join(res_split_tax_id['consensus_lineage'].str.split(pat=";",expand=True))
res_split.columns=['consensus_lineage', 'consensus_taxid', 'tax_id_by_level', 'name_contigs', 'nb_contigs', 'nb_reads', "superkingdom_tax_id", "phylum_tax_id", "order_tax_id", "class_tax_id", "family_tax_id", "genus_tax_id", "species_tax_id", "superkingdom_lineage", "phylum_lineage", "order_lineage", "class_lineage", "family_lineage", "genus_lineage", "species_lineage"]
res_split.columns=['consensus_lineage', 'consensus_taxid', 'tax_id_by_level', 'name_contigs', 'nb_contigs', 'nb_reads', 'depth', "superkingdom_tax_id", "phylum_tax_id", "order_tax_id", "class_tax_id", "family_tax_id", "genus_tax_id", "species_tax_id", "superkingdom_lineage", "phylum_lineage", "order_lineage", "class_lineage", "family_lineage", "genus_lineage", "species_lineage"]
res_split.fillna(value='no_affi', inplace = True)
level_superkingdom = res_split.groupby(['superkingdom_tax_id','superkingdom_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum'}).reset_index()
level_superkingdom.columns=['tax_id_by_level','lineage_by_level','name_contigs','nb_contigs', 'nb_reads']
level_phylum = res_split.groupby(['phylum_tax_id','phylum_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum'}).reset_index()
level_phylum.columns=['tax_id_by_level','lineage_by_level','name_contigs','nb_contigs', 'nb_reads']
level_order = res_split.groupby(['order_tax_id','order_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum'}).reset_index()
level_order.columns=['tax_id_by_level','lineage_by_level','name_contigs','nb_contigs', 'nb_reads']
level_class = res_split.groupby(['class_tax_id','class_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum'}).reset_index()
level_class.columns=['tax_id_by_level','lineage_by_level','name_contigs','nb_contigs', 'nb_reads']
level_family = res_split.groupby(['family_tax_id','family_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum'}).reset_index()
level_family.columns=['tax_id_by_level','lineage_by_level','name_contigs','nb_contigs', 'nb_reads']
level_genus = res_split.groupby(['genus_tax_id','genus_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum'}).reset_index()
level_genus.columns=['tax_id_by_level','lineage_by_level','name_contigs','nb_contigs', 'nb_reads']
level_species = res_split.groupby(['species_tax_id','species_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum'}).reset_index()
level_species.columns=['tax_id_by_level','lineage_by_level','name_contigs','nb_contigs', 'nb_reads']
levels_columns=['tax_id_by_level','lineage_by_level','name_contigs','nb_contigs', 'nb_reads', 'depth']
level_superkingdom = res_split.groupby(['superkingdom_tax_id','superkingdom_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum', 'depth': 'mean'}).reset_index()
level_superkingdom.columns=levels_columns
level_phylum = res_split.groupby(['phylum_tax_id','phylum_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum', 'depth': 'mean'}).reset_index()
level_phylum.columns=levels_columns
level_order = res_split.groupby(['order_tax_id','order_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum', 'depth': 'mean'}).reset_index()
level_order.columns=levels_columns
level_class = res_split.groupby(['class_tax_id','class_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum', 'depth': 'mean'}).reset_index()
level_class.columns=levels_columns
level_family = res_split.groupby(['family_tax_id','family_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum', 'depth': 'mean'}).reset_index()
level_family.columns=levels_columns
level_genus = res_split.groupby(['genus_tax_id','genus_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum', 'depth': 'mean'}).reset_index()
level_genus.columns=levels_columns
level_species = res_split.groupby(['species_tax_id','species_lineage']).agg({'name_contigs' : [';'.join], 'nb_contigs' : 'sum', 'nb_reads' : 'sum', 'depth': 'mean'}).reset_index()
level_species.columns=levels_columns
# Write merge data frame in output files.
res.to_csv(args.output_name + ".tsv", sep="\t", index=False)
......
......@@ -57,7 +57,7 @@ with open(args.list_of_kaiju_files) as fkaiju_list:
kaiju_files = fkaiju_list.read().split()
# Merge kaiju results for all samples.
for (kaiju_idx,kaiju_path) in enumerate(kaiju_files):
for (kaiju_idx,kaiju_path) in enumerate(sorted(kaiju_files)):
print(kaiju_idx)
if(kaiju_idx==0):
merge = pd.read_csv(kaiju_path, delimiter='\t', dtype=str)
......
......@@ -3,7 +3,7 @@
"""--------------------------------------------------------------------
Script Name: quantification_by_contig_lineage.py
Description: make table where each line is a lineage and for each
sample there are two columns: nb contigs and nb reads.
sample there are four columns: name of contigs, nb contigs, reads and depth.
Input files: List of merged files (idxstats+.percontig.csv).
Created By: Joanna Fourquet
Date: 2021-01-19
......@@ -27,6 +27,7 @@ try:
import re
import sys
import pandas as pd
import os
from datetime import datetime
except ImportError as error:
print(error)
......@@ -57,27 +58,28 @@ with open(args.list_of_input_files) as finput_list:
sample_files = finput_list.read().split()
# Merge results for all samples by lineage.
for (sample_idx,sample_path) in enumerate(sample_files):
for (sample_idx,sample_path) in enumerate(sorted(sample_files)):
print(sample_idx)
if(sample_idx==0):
merge = pd.read_csv(sample_path, delimiter='\t', dtype=str)
sample_name = sample_path
if('consensus_tax_id' in merge.columns): merge.drop('consensus_tax_id', inplace=True, axis=1)
sample_name = os.path.splitext(sample_path)[0]
else:
sample_results = pd.read_csv(sample_path, delimiter='\t', dtype=str)
merge = pd.merge(merge,sample_results,left_on=["tax_id_by_level","lineage_by_level"],right_on=["tax_id_by_level","lineage_by_level"], how='outer', suffixes=('_' + sample_name,''))
sample_name = sample_path
if('consensus_tax_id' in merge.columns): merge.drop('consensus_tax_id', inplace=True, axis=1)
print (merge.head())
sample_name = os.path.splitext(sample_path)[0]
if('consensus_tax_id' in merge.columns): merge.drop('consensus_tax_id', inplace=True, axis=1)
# Rename columns corresponding to the last sample file.
sample_name = sample_path
sample_name = os.path.splitext(sample_path)[0]
merge.rename(columns = {'name_contigs': 'name_contigs_' + sample_name, \
'nb_contigs': 'nb_contigs_' + sample_name,\
'nb_reads': 'nb_reads_' + sample_name},inplace=True)
'nb_reads': 'nb_reads_' + sample_name,\
'depth': 'depth_' + sample_name},inplace=True)
# Fill the NaN by 0.
# Fill NaN values with 0.
merge.fillna(0, inplace=True)
print("Write " + args.output_file)
# Write merge data frame in output file.
merge.to_csv(args.output_file, sep="\t", index=False)
......@@ -12,8 +12,9 @@ process {
cpus = { 1 * task.attempt }
memory = { 2.GB * task.attempt }
errorStrategy = { task.exitStatus in [1,143,137,104,134,139] ? 'retry' : 'finish' }
maxRetries = 4
errorStrategy = 'finish'
//{ task.exitStatus in [1,143,137,104,134,139] ? 'retry' : 'finish' }
maxRetries = 1
maxErrors = '-1'
container = 'file://metagwgs/env/metagwgs.sif'
withName: cutadapt {
......@@ -43,14 +44,10 @@ process {
memory = { 50.GB * task.attempt }
cpus = 25
}
withName: metaspades {
withName: assembly {
memory = { 110.GB * task.attempt }
cpus = 20
}
withName: megahit {
cpus = 20
memory = { 100.GB * task.attempt }
}
withName: quast {
cpus = 4
memory = { 8.GB * task.attempt }
......@@ -112,4 +109,7 @@ process {
withLabel: eggnog {
container = 'file://metagwgs/env/eggnog_mapper.sif'
}
withLabel: mosdepth {
container = 'file://metagwgs/env/mosdepth.sif'
}
}
......@@ -45,14 +45,10 @@ process {
memory = { 2.GB * task.attempt }
cpus = 20
}
withName: metaspades {
withName: assembly {
memory = { 60.GB * task.attempt }
cpus = 14
}
withName: megahit {
cpus = 20
memory = { 60.GB * task.attempt }
}
withName: quast {
cpus = 3
memory = { 8.GB * task.attempt }
......
process.executor = 'slurm'
includeConfig 'singularity.config'
singularity.runOptions = "-B /work/bank/ -B /bank -B /work2 -B /work -B /save -B /home -B /work/project"
singularity.runOptions = "-B /work/bank/ -B /bank -B /work2 -B /work -B /save -B /home -B /work/project -B /usr/local/bioinfo"
process.queue = 'workq'
process.executor = 'slurm'
includeConfig 'singularity.config'
singularity.runOptions = "-B /work/bank/ -B /bank -B /work2 -B /work -B /save -B /home -B /work/project"
singularity.runOptions = "-B /work/bank/ -B /bank -B /work2 -B /work -B /save -B /home -B /work/project -B /usr/local/bioinfo"
process.queue = 'testq'
process {
......@@ -41,14 +41,10 @@ process {
memory = { 36.GB * task.attempt }
cpus = 4
}
withName: metaspades {
withName: assembly {
memory = { 10.GB * task.attempt }
cpus = 8
}
withName: megahit {
cpus = 8
memory = { 10.GB * task.attempt }
}
withName: quast {
cpus = 2
memory = { 2.GB * task.attempt }
......
process.executor = 'slurm'
includeConfig 'singularity.config'
singularity.runOptions = "-B /work/bank/ -B /bank -B /work2 -B /work -B /save -B /home -B /work/project"
singularity.runOptions = "-B /work/bank/ -B /bank -B /work -B /work2 -B /save -B /home -B /work/project -B /usr/local/bioinfo"
process.queue = 'workq'
process {
......@@ -15,18 +15,18 @@ process {
maxErrors = '-1'
withName: cutadapt {
cpus = 3
memory = { 1.GB * task.attempt }
cpus = 3
memory = { 1.GB * task.attempt }
}
withName: sickle {
memory = { 1.GB * task.attempt }
memory = { 1.GB * task.attempt }
}
withLabel: fastqc {
cpus = 6
memory = { 1.GB * task.attempt }
cpus = 6
memory = { 1.GB * task.attempt }
}
withName: multiqc {
memory = { 2.GB * task.attempt }
memory = { 2.GB * task.attempt }
}
withName: host_filter {
memory = { 20.GB * task.attempt }
......@@ -38,17 +38,13 @@ process {
cpus = 6
}
withName: kaiju {
memory = { 100.GB * task.attempt }
memory = { 50.GB * task.attempt }
cpus = 4
}
withName: metaspades {
withName: assembly {
memory = { 10.GB * task.attempt }
cpus = 8
}
withName: megahit {
cpus = 8
memory = { 10.GB * task.attempt }
}
withName: quast {
cpus = 2
memory = { 2.GB * task.attempt }
......@@ -75,8 +71,8 @@ process {
memory = { 1.GB * task.attempt }
}
withName: diamond {
cpus = 2
memory = { 8.GB * task.attempt }
cpus = 8
memory = { 10.GB * task.attempt }
}
withName: get_software_versions {
memory = { 1.GB * task.attempt }
......
......@@ -37,14 +37,10 @@ process {
memory = { 10.GB * task.attempt }
cpus = 2
}
withName: metaspades {
withName: assembly {
memory = { 2.GB * task.attempt }
cpus = 3
}
withName: megahit {
cpus = 3
memory = { 2.GB * task.attempt }
}
withName: quast {
cpus = 2
memory = { 2.GB * task.attempt }
......
......@@ -35,7 +35,7 @@ metagWGS is splitted into different steps that correspond to different parts of
* `07_taxo_affi`
* taxonomically affiliates the genes ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* taxonomically affiliates the contigs ([Samtools](http://www.htslib.org/) + [aln2taxaffi.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/aln2taxaffi.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_idxstats_percontig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_idxstats_percontig_lineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
* counts the number of reads and contigs, for each taxonomic affiliation, per taxonomic level ([Samtools](http://www.htslib.org/) + [merge_contig_quantif_perlineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/merge_contig_quantif_perlineage.py) + [quantification_by_contig_lineage.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/quantification_by_contig_lineage.py))
* `08_binning` from [nf-core/mag 1.0.0](https://github.com/nf-core/mag/releases/tag/1.0.0)
* makes binning of contigs ([MetaBAT2](https://bitbucket.org/berkeleylab/metabat/src/master/))
* assesses bins ([BUSCO](https://busco.ezlab.org/) + [metaQUAST](http://quast.sourceforge.net/metaquast) + [summary_busco.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/summary_busco.py) and [combine_tables.py](https://forgemia.inra.fr/genotoul-bioinfo/metagwgs/-/blob/dev/bin/combine_tables.py) from [nf-core/mag](https://github.com/nf-core/mag))
......@@ -45,7 +45,7 @@ A report html file is generated at the end of the workflow with [MultiQC](https:
The pipeline is built using [Nextflow,](https://www.nextflow.io/docs/latest/index.html#) a bioinformatics workflow tool to run tasks across multiple compute infrastructures in a very portable manner.
Two [Singularity](https://sylabs.io/docs/) containers are available making installation trivial and results highly reproducible.
Three [Singularity](https://sylabs.io/docs/) containers are available making installation trivial and results highly reproducible.
## Documentation
......
......@@ -32,7 +32,7 @@ A directory called `metagwgs` containing all source files of the pipeline have b
## III. Install Singularity