idea : adding species contribution for each gene families / KEGG / EC and compute pathway abundance table
As in HumanN3 or PICRUSt2, functionnal abundance is expressed per community (sample) and/or per taxonomy, at gene families (function) or at pathway level. Pathway abundance is something more complicated than just sum up all abundance of individual function composing the pathway. See HumanN2 publication and "something close" is done in PICRUST2.
We could performed something equivalent by :
- normalise gene abundance by gene length
- sum up gene depth at seed_ortholog / KEGG / EC levels and precise contribution of each gene (normally/ideally associated to one species) and optionnally detailed the contribution of each species in that sum
- use PICRUSt2
pathway_pipeline.py
to compute pathway abundance which take into account pathway graph reference files.
This need a little bit more thinking (how associate gene with taxonomy, using PICRUST or HumanN3 or developpe our own script for final pathway abundance computation ...)