Newer
Older

BOURBEILLON Julie
committed
# QuaDS
Qualitative/quantitative Descriptive Statistics is a Python implementation of the [catdes()](http://factominer.free.fr/factomethods/description-des-modalites.html) function from the [FactoMiner R package](http://factominer.free.fr) with extras.
In order to use the pipeline you first of all have to `clone` the git
repository or download it.
The pipeline has been written for Python 3.9.
You have to create a Python environment to run the pipeline.
conda create -n environment_quads
conda activate environment_quads
conda install python=3.9
It relies on several libraries which are listed in the `requirement.txt` file.
Alternatively the dependencies can by installed using pip:
In the example test, the datafile is in the repository: /data
You have to say in the config_file.yml the different parameters :
- directory of data and results
- names of datafile, and output files
- separator of your datafile
- presence of an index in your datafile
- list of qualitative variables names
- list of quantitative variables names
- different thresholds of the tests
- colors of the visual output
For your qualitative analysis, you may obtain up to 4 output files:
- Chi2.csv: contains the results of the Chi-squared test of independence which assess whether a variable is dependent on the factor.
- fisher_exact.csv: an alternative to the Chi-squared when the expected frequencies in the contingency table between the variable and the factor are very low (<5). It also tests the dependency between the variable and the factor.
- qualitative_results.csv: for dependent variables, this file describes the dependency of the factor levels. For each factor level, it indicates whether the variable modality is:
- over-represented
- under-represented
- not significant
- not present
- weight.csv: indicates the contribution of the qualitative variables to the factor levels
For your quantitative analysis, you may obtain up to 5 outputs files:
- normality.csv: contains the results of the Shapiro-Wilk test, which assesses whether each quantitative variable meets the normality assumption within the compared factor levels.
- homoscedasticity.csv: contains the results of Bartlett's test, which verifies the equality of variances between the factor levels.
- anova.csv : when assumptions are met, this test determines whether there is a significance difference in at least one factor level for each quantitative variable.
- kruskal_wallis.csv: the non-parametric alternative of anova, used when at least one assumption is not met.
- quantitative_results.csv: provides information on variables with significant differences (based on anova and Kruskal-Wallis results) and only when the homoscedasticity assumption is verified. If homoscedasticity is not verified, Kruskal-Wallis is applied but, no further description of the factor levels is performed. This file indicates whether the variable mean is:
- above the overall average
- below the overall average
- Not significantly different from the overall average
## Visuals
When you have your tables support and you want to see the visualisation
## Deactivation of conda
You have finish to use the pipeline.
conda deactivate