Skip to content
Snippets Groups Projects
Commit 21fe7968 authored by mariabernard's avatar mariabernard
Browse files

1000RNASEQ chicken ASE: correct wildcard constraint, rmdup command line and...

1000RNASEQ chicken ASE: correct wildcard constraint, rmdup command line and add fasta index  as input in SplitCigar
parent c06b562c
No related branches found
No related tags found
No related merge requests found
...@@ -8,7 +8,7 @@ Align RNASeq reads on masked genome ...@@ -8,7 +8,7 @@ Align RNASeq reads on masked genome
''' '''
wildcard_constraints: wildcard_constraints:
properlyPaired_rmDup = '_pp_|_' properlyPaired = '_pp_|_'
STAR_INDEX_FILE=['genomeParameters.txt', 'chrName.txt', 'chrLength.txt', 'chrStart.txt', 'chrNameLength.txt', 'exonGeTrInfo.tab', 'geneInfo.tab', 'transcriptInfo.tab', 'exonInfo.tab', 'sjdbList.fromGTF.out.tab', 'sjdbInfo.txt', 'sjdbList.out.tab', 'Genome', 'SA', 'SAindex'] STAR_INDEX_FILE=['genomeParameters.txt', 'chrName.txt', 'chrLength.txt', 'chrStart.txt', 'chrNameLength.txt', 'exonGeTrInfo.tab', 'geneInfo.tab', 'transcriptInfo.tab', 'exonInfo.tab', 'sjdbList.fromGTF.out.tab', 'sjdbInfo.txt', 'sjdbList.out.tab', 'Genome', 'SA', 'SAindex']
...@@ -157,18 +157,20 @@ def get_properly_paired_bam(wildcards): ...@@ -157,18 +157,20 @@ def get_properly_paired_bam(wildcards):
return "Results/STAR_Aln_2/" + wildcards.sample + "_rg_genomic.bam" return "Results/STAR_Aln_2/" + wildcards.sample + "_rg_genomic.bam"
# note : This tool uses the READ_NAME_REGEX and the OPTICAL_DUPLICATE_PIXEL_DISTANCE options as the primary methods to identify and differentiate duplicate types. Set READ_NAME_REGEX to null to skip optical duplicate detection, e.g. for RNA-seq or other data where duplicate sets are extremely large and estimating library complexity is not an aim
rule RmDuplicates: rule RmDuplicates:
input: input:
bam = get_properly_paired_bam bam = get_properly_paired_bam
output: output:
bam = "Results/STAR_Aln_2/{sample}_rg_genomic{properlyPaired}.bam", bam = "Results/STAR_Aln_2/{sample}_rg_genomic{properlyPaired}rmdup.bam",
metrics = "Results/STAR_Aln_2/{sample}_rg_genomic{properlyPaired}.metrics.txt", metrics = "Results/STAR_Aln_2/{sample}_rg_genomic{properlyPaired}.metrics.txt",
params: params:
mem=str(int(config["rmdup"]["mem"].replace("G","")) -4 )+"G" if int(config["rmdup"]["mem"].replace("G","")) -4 > 1 else config["rmdup"]["mem"] , mem=str(int(config["rmdup"]["mem"].replace("G","")) -4 )+"G" if int(config["rmdup"]["mem"].replace("G","")) -4 > 1 else config["rmdup"]["mem"] ,
jar=find_jar("picard.jar") jar=find_jar("picard.jar")
shell: shell:
""" """
java -Xmx{params.mem} -jar {params.jar} MarkDuplicates READ_NAME_REGEX=null I={input.bam} O={output.bam} REMOVE_DUPLICATES=true CREATE_INDEX=true VALIDATION_STRINGENCY=SILENT M={output.metrics} java -Xmx{params.mem} -jar {params.jar} MarkDuplicates CREATE_INDEX=false READ_NAME_REGEX=null I={input.bam} O={output.bam} REMOVE_DUPLICATES=true VALIDATION_STRINGENCY=SILENT M={output.metrics}
""" """
rule remove_multimap: rule remove_multimap:
...@@ -184,9 +186,11 @@ rule remove_multimap: ...@@ -184,9 +186,11 @@ rule remove_multimap:
rule SplitNCigarReads: rule SplitNCigarReads:
input: input:
fasta = 'Results/genomeMasked/' + os.path.splitext(os.path.basename(config['fasta_ref']))[0] + '_masked.fa', fasta = 'Results/genomeMasked/' + os.path.splitext(os.path.basename(config['fasta_ref']))[0] + '_masked.fa',
idx = 'Results/genomeMasked/' + os.path.splitext(os.path.basename(config['fasta_ref']))[0] + '_masked.fa.fai',
dict = 'Results/genomeMasked/' + os.path.splitext(os.path.basename(config['fasta_ref']))[0] + '_masked.dict',
bam = "Results/STAR_Aln_2/{sample}_rg_genomic{properlyPaired}rmdup_uniq.bam" bam = "Results/STAR_Aln_2/{sample}_rg_genomic{properlyPaired}rmdup_uniq.bam"
output: output:
bam = "Results/STAR_Aln_2/{sample}_rg_genomic{properlyPairedrmdup_uniq_split.bam" bam = "Results/STAR_Aln_2/{sample}_rg_genomic{properlyPaired}rmdup_uniq_split.bam"
params: params:
mem=config["SplitNCigarReads"]["mem"], mem=config["SplitNCigarReads"]["mem"],
jar=find_jar("GenomeAnalysisTK.jar") jar=find_jar("GenomeAnalysisTK.jar")
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment