Idea : compute statistics to help defining abudance filtering threshold
Genes abundance tables are often filtered on abundance and/or prevalence, as well as taxonomy tables (contigs or bin) We could compute a graphic or table which filtered genes based on a minimum abundance (from 3 to 15 reads ? or from 0.5 to 1 depth) in at least X samples (from 2 samples to 10% of all samples ). This graphic/table need to return :
- the number/percentage of feature kept (obviously)
- the number/percentage of total abundance kept
Ideally we could remove lot of niose (rar feature) for a minimal lost of reads/depth.