Add process merging quantifcation and annotation files
In input:
-
output of quantification_table
-
output of eggNOG-mapper process
Use library pandas Python with this Céline's code:
module load system/Python-3.7.4
Sur genotoul pour tester et avoir la dependance pandas je suis passée par un environement virtuel
$ module load system/Python-3.7.4
$ virtualenv -p python3 test
$ source test/bin/activate
$ pip install pandas
python
Code test en python :
import pandas as pd
matrice = pd.read_csv("/work2/genphyse/NED_metaG/processing/ExpoMycoPig/analysis/Quantif_genes_sum.tsv.quantif.sorted", delimiter='\t', decimal='.', header=None)
#si tu connais a quoi correspond chaque colonne ca peut etre bien du fixer le header (voir doc https://pandas.pydata.org/docs/user_guide/io.html#io-read-csv-table)
matrice.head()
matrice.dtypes
print(matrice.shape)
(3157004, 34)
eggnode_map1 = pd.read_csv("/work2/genphyse/NED_metaG/processing/ExpoMycoPig/analysis/diamond_analysis/SC1802-114605_AATGCCTC-TCGATCCA-BHLGV2DSXX_L004.annotated.faa_maNOG_one2one.emapper.annotations", delimiter='\t', decimal='.',skiprows=4)
eggnode_map2 = pd.read_csv("/work2/genphyse/NED_metaG/processing/ExpoMycoPig/analysis/diamond_analysis/SC1802-114586_CCAAGTCT-AAGGATGA-BHLGV2DSXX_L004.annotated.faa_maNOG_one2one.emapper.annotations", delimiter='\t', decimal='.',skiprows=4)
merge_egg_node = pd.concat([eggnode_map1, eggnode_map2])
merge = pd.merge(matrice,merge_egg_node,left_on=0,right_on='#query_name', how='left')
merge.head()
ecriture de la sortie (voir si il faut modifier delimiteurs and co ...)
merge.to_csv("write.output")
Edited by Celine Noirot