Johann Confais · e1b40e12
--- a/REPET-V3.0-tutorial.md
+++ b/REPET-V3.0-tutorial.md
+* [Practical course: Transposable Elements identification with The REPET package](#practical-course-transposable-elements-identification-with-the-repet-package)
+  * [Run the REPET pipelines](#run-the-repet-pipelines)
+    * [Setup The REPET package environment](#setup-the-repet-package-environment)
+    * [Start TEdenovo pipeline](#start-tedenovo-pipeline)
+      * [Alternatively, you can launch the TEdenovo pipeline step by step:](#alternatively-you-can-launch-the-tedenovo-pipeline-step-by-step)
+    * [Post TEdenovo pipeline](#post-tedenovo-pipeline)
+      * [Parse MCL clustering results (TEdenovo step 8): create a list (tabulated file) with 2 columns "Cluster_id TE_id"](#parse-mcl-clustering-results-tedenovo-step-8-create-a-list-tabulated-file-with-2-columns-cluster_id-te_id)
+      * [Get all the annotations done by PASTEC (TEdenovo, step 5) on the Consensus](#get-all-the-annotations-done-by-pastec-tedenovo-step-5-on-the-consensus)
+      * [Get the multiple-alignment used to build the consensus](#get-the-multiple-alignment-used-to-build-the-consensus)
+    * [Start TEannot pipeline](#start-teannot-pipeline)
+      * [Alternatively, you can launch the TEannot.py pipeline step by step:](#alternatively-you-can-launch-the-teannotpy-pipeline-step-by-step)
+    * [Post TEannot pipeline](#post-teannot-pipeline)
+      * [TEdenovo consensus library classification corresponding to Chig_refTEs.fa](#tedenovo-consensus-library-classification-corresponding-to-chig_reftesfa)
+      * [Concatenate all gff files of genome annotation in one](#concatenate-all-gff-files-of-genome-annotation-in-one)
+      * [Compute statistics of TE genome annotation](#compute-statistics-of-te-genome-annotation)
+      * [Compute and plot the consensuses coverage](#compute-and-plot-the-consensuses-coverage)
+      * [Select consensus for the second round of TEannot](#select-consensus-for-the-second-round-of-teannot)
+  * [Results analysis](#results-analysis)
+    * [TEdenovo (and post TEdenovo) most interesting output files](#tedenovo-and-post-tedenovo-most-interesting-output-files)
+      * [TEdenovo output directories](#tedenovo-output-directories)
+      * [TEdenovo consensus library](#tedenovo-consensus-library)
+      * [TEdenovo consensus library after filtering of “noCat” consensus built using less than 10 copies and consensus classified as SSR – This library is used as input of TEannot pipeline](#tedenovo-consensus-library-after-filtering-of-nocat-consensus-built-using-less-than-10-copies-and-consensus-classified-as-ssr-this-library-is-used-as-input-of-teannot-pipeline)
+      * [Classification of TEdenovo consensus library (All consensuses including SSR and noCat built with less than 10 HSPs) according to Wicker classification nomenclature](#classification-of-tedenovo-consensus-library-all-consensuses-including-ssr-and-nocat-built-with-less-than-10-hsps-according-to-wicker-classification-nomenclature)
+      * [Classification statistics (All consensuses including SSR and noCat built with less than 10 HSPs)](#classification-statistics-all-consensuses-including-ssr-and-nocat-built-with-less-than-10-hsps)
+      * [MCL clustering output files](#mcl-clustering-output-files)
+    * [TEannot (and post TEannot) most interesting output files](#teannot-and-post-teannot-most-interesting-output-files)
+      * [TEannot output directories](#teannot-output-directories)
+      * [Genome annotation file](#genome-annotation-file)
+      * [Classification of TEdenovo consensus library corresponding to Chig_refTEs.fa](#classification-of-tedenovo-consensus-library-corresponding-to-chig_reftesfa)
+      * [Genome annotation global statistics file](#genome-annotation-global-statistics-file)
+      * [TE annotation statistics per consensus](#te-annotation-statistics-per-consensus)
+  * [Annexes](#annexes)
+    * [Additional commands](#additional-commands)
+* [Practical course: Manual curation of the transposable elements library](#practical-course-manual-curation-of-the-transposable-elements-library)
+  * [Compilation of consensus information : classification, genome annotation statistics, MCL clustering](#compilation-of-consensus-information-classification-genome-annotation-statistics-mcl-clustering)
+  * [Consensus annotation (from PASTEC classifier) using IGV genome browser](#consensus-annotation-from-pastec-classifier-using-igv-genome-browser)
+  * [Display multiple alignment of HSP used to build the consensus using Jaview](#display-multiple-alignment-of-hsp-used-to-build-the-consensus-using-jaview)
+  * [Plot genome copies related to a consensus](#plot-genome-copies-related-to-a-consensus)
+
+# Practical course: Transposable Elements identification with [The REPET package](https://forgemia.inra.fr/urgi-anagen/wiki-repet/-/wikis/REPET-V2.5-tutorial)
+
+```plaintext
+This tutorial was written by Joelle Amselem and Nathalie Choisne in the frame of Elixir and URGI 
+ TE annotation training sessions, using URGI cloud Virtual Machines.
+Note that REPET v2.5 was performed using Colletotrichum higginsianum dataset.
+You should adapt the command path to your environment.
+```
+
+## Run the REPET pipelines
+
+### Setup [The REPET package](https://forgemia.inra.fr/urgi-anagen/wiki-repet/-/wikis/REPET-V2.5-tutorial) environment
+
+* Connect to the virtual machine containing the REPET installation:
+
+`ssh -XY guestFormation@IP`
+
+* Your home directory is by default : "/home/guestFormation"
+* To start a new project, create a folder with the project name « Chig » :
+
+`mkdir Chig`
+
+* Change directory, copy and source the environment used by REPET softwares
+
+`cd Chig `\
+`cp ~/data/setEnv.sh ./ `\
+`. setEnv.sh`
+
+\-Check the database parameters in the « setEnv.sh » configuration file:
+
+`more setEnv.sh`
+
+`mysql -h $REPET_HOST -u $REPET_USER -p$REPET_PW $REPET_DB`
+
+### Start TEdenovo pipeline
+
+* Create a directory to launch TEdenovo
+
+`mkdir TEdenovo`\
+`cd TEdenovo`
+
+* Make a link (ln -s) to access the input fasta file of the genomic sequences – The genome fasta file must be “project_name.fa”
+
+`ln -s ~/data/Chig.fa Chig.fa`
+
+* Make a link (ln -s) to access the databanks used in similarity based classification.
+
+`ln -s ~/data/ProfilesBankForREPET_Pfam27.0_GypsyDB.hmm`\
+`ln -s ~/data/rRNA_Eukaryota.fsa`\
+`ln -s ~/data/repbase22.05_aaSeq_cleaned_TE.fa`\
+`ln -s ~/data/repbase22.05_ntSeq_cleaned_TE.fa`
+
+* Copy the configuration file « TEdenovo.cfg », into your TEdenovo working directory:
+
+(The original TEdenovo.cfg is available at “$REPET_PATH/config/TEdenovo.cfg”)
+
+`cp ~/data/TEdenovo.cfg ./`
+
+\-Check if the configuration file is properly filled before launching TEdenovo:
+
+`gedit TEdenovo.cfg >/dev/null 2>&1 &`
+
+```plaintext
+[repet_env]
+repet_version: 2.5
+...
+repet_job_manager: slurm
+
+[project]
+project_name: Chig
+project_dir: /home/guestFormation/Chig/TEdenovo
+…
+[detect_features]
+…
+TE_BLRn: yes
+TE_BLRtx: yes
+TE_nucl_bank: repbase22.05_ntSeq_cleaned_TE.fsa
+TE_BLRx: yes
+TE_prot_bank: repbase22.05_aaSeq_cleaned_TE.fsa
+TE_HMMER: yes
+TE_HMM_profiles:  ProfilesBankForREPET_Pfam27.0_GypsyDB.hmm
+…
+rDNA_BLRn: yes
+rDNA_bank: rRNA_Eukaryota.fsa
+```
+
+* **TEdenovo pipeline consists of 8 steps that can be launched using only one command line**:
+
+`nohup launch_TEdenovo.py -P Chig -C TEdenovo.cfg -f MCL >& TEdenovo.log &`
+
+**P**: project name
+
+**f**: clustering program used to find consensus families
+
+* Useful commands to follow the progress of steps
+
+\- job status (under Slurm)
+
+`squeue`
+
+\- the log files. ex:
+
+`more TEdenovo.log`\
+`tail TEdenovo.log`
+
+#### Alternatively, you can launch the TEdenovo pipeline step by step:
+
+`nohup TEdenovo.py -P name -C config.cfg -S step -[specific-step-param]`
+
+![400px-TEdenovo_1-2](uploads/bbba43a19a359edc2fc81efe2c21056d/400px-TEdenovo_1-2.png)
+
+* Step 1: Genomic sequences are cut and grouped into batches
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 1 >& runS1.log &`
+
+* Step 2: The genome is aligned to itself using BLAST
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 2 -s Blaster >& runS2.log &`
+
+![400px-TEdenovo_3](uploads/5907451fc26f37fa8cdf4b4d8a08e7d8/400px-TEdenovo_3.png)
+
+* Step 3: The repetitives HSP from BLAST are clustered by Recon, Grouper and/or Piler
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 3 -s Blaster -c Grouper >& runS3G.log &`
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 3 -s Blaster -c Recon >& runS3R.log &`
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 3 -s Blaster -c Piler >& runS3P.log &`
+
+![400px-TEdenovo_4](uploads/b7eddeae30ffb0ce2e7d13992aff646f/400px-TEdenovo_4.png)
+
+* Step 4: A multiple alignment is computed for each cluster, and a consensus sequence is derived from each multiple alignment
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 4 -s Blaster -c Grouper -m Map >& runS4G.log &`
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 4 -s Blaster -c Recon -m Map >& runS4R.log &`
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 4 -s Blaster -c Piler -m Map >& runS4P.log &`
+
+![400px-TEdenovo_5-6-7](uploads/d5a0b42979d45f73a6fb1f25ceb3b6fa/400px-TEdenovo_5-6-7.png)
+
+* Step 5: Particular features are detected on each consensus, such as structural features or homology with known TE, HMM profiles or host genes
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 5 -s Blaster -c GrpRecPil -m Map >& runS5.log &`
+
+mySQL table are created: contain the evidences of consensus annotation used by Pastec classifier
+
+* Step 6: The consensuses are classified using Wicker's TEs classification
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 6 -s Blaster -c GrpRecPil -m Map >& runS6.log &`
+
+* Step 7: SSR and under-represented unclassified ("noCat") consensus are filtered
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 7 -s Blaster -c GrpRecPil -m Map >& runS7.log &`
+
+* Step 8: The consensuses are clustered into families to facilitate manual curation using Blastclust or MCL
+
+`nohup TEdenovo.py -P Chig -C TEdenovo.cfg -S 8 -s Blaster -c GrpRecPil -m Map -f MCL >& runS8.log &`
+
+### Post TEdenovo pipeline
+
+#### Parse MCL clustering results (TEdenovo step 8): create a list (tabulated file) with 2 columns "Cluster_id TE_id"
+
+`cd ~/Chig/TEdenovo/Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered_MCL`\
+`gawk -F"_MCL|_Chig" '{if(/>/){gsub(">","",$0);print "MCL\t"$2"\t"$1"_Chig"$3}}' Chig_sim_denovoLibTEs_filtered_MCL.fa \`\
+`| sort -nk2,2 \`\
+`| gawk -F"\t" '{print $1$2"\t"$3}' > Chig_sim_denovoLibTEs_filtered_MCL.lst`
+
+#### Get all the annotations done by PASTEC (TEdenovo, step 5) on the Consensus
+
+A GFF file will be created for each analysis output of the Step 5(detect feature), these GFF annotations files can be viewed in a genome browser such as IGV:
+
+* Copy the configuration files « CreateGFF3sForClassifFeatures.cfg » into your working directory:
+
+`cd ~/Chig/TEdenovo `\
+`cp ~/data/CreateGFF3sForClassifFeatures.cfg ./`
+
+* Check if the configuration file is properly filled before launching CreateGFF3sForClassifFeatures:
+
+`gedit CreateGFF3sForClassifFeatures.cfg >/dev/null 2>&1 &`
+
+```plaintext
+[repet_env]
+...
+repet_job_manager: slurm
+
+[project]
+project_name: Chig
+project_dir: /home/guestFormation/Chig/TEdenovo
+[gff3_TEdenovo_options]
+add_classif_infos: yes
+TR: yes
+polyA: yes
+ORF: yes
+TE_BLRn: yes
+TE_BLRtx: yes
+TE_BLRx: yes
+HG_BLRn: no
+rDNA_BLRn: yes
+tRNA: no
+Profiles: yes
+SSR: yes
+[gff3_TEannot_options]
+project_name_teannot: Chig
+annotated_copies: no
+[other]
+original_HSP: yes
+
+[gff3_TEannot_options]
+project_name_teannot: ThalChr4
+annotated_copies: no
+
+[other]
+original_HSP: yes
+```
+
+* Launch the CreateGFF3sForClassifFeatures:
+
+`. ~/data/setEnv.sh `\
+`nohup CreateGFF3sForClassifFeatures.py -C CreateGFF3sForClassifFeatures.cfg -f Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered/Chig_sim_denovoLibTEs_filtered.fa -v 3 >& CreateGFF3sForClassifFeatures.log &`
+
+**C**: Configuration file"
+
+**f**: Consensus sequence (fasta file) provided by the TEdenovo.
+
+A new directory "Visualization_Files" is created
+
+* Reverse-complement the coordinates of "\*_reversed" consensus
+
+Indeed, the consensus annotations used to classify the consensus are performed before the step 6 where the consensus are “reverse-complemented”. The coordinates of these annotations are not reversed in the database tables. So we need a patch for GFF files provided the CreateGFF3sForClassifFeatures.py of the release 2.5 (it will be including in the next release of REPET v3).
+
+\- Create a new directory for reverse-complemented GFF\
+`cd Visualization_Files/; mkdir gff_reversed`
+
+\- Create a file with 2 columns consensus name and length\
+` cut -f1,2 ../Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered/classifFileFromList.classif > Chig_sim_denovoLibTEs_filtered.len`
+
+\- Reverse complement\
+`` for file in `ls *.gff3`; ``\
+`do`\
+`grep -P "^#" $file > gff_reversed/$file;`\
+`while read TE len;`\
+`do gawk -F"\t" '{if($1 ~ /_reversed/ && $1 ~ /'$TE'/){rstart='$len'-$5+1;rend='$len'-$4+1; if($7 ~ /+/){rstr="-"}; if($7 ~ /-/){rstr="+"};OFS="\t";print $1,$2,$3,rstart,rend,$6,rstr,$8,$9}else{if($1 ~ /'$TE'/){print $0}}}' $file; done < Chig_sim_denovoLibTEs_filtered.len >> gff_reversed/$file;`\
+`done`
+
+#### Get the multiple-alignment used to build the consensus
+
+The "original_HSP: yes" option in the CreateGFF3sForClassifFeatures.cfg config file creates a new directory "Original_HSP_fastaAlignment" with Symbolic links to the multiple-alignment used to build the consensus.\
+These file can be loaded and browsed in Jalview - Note that they are not reversed, a base is kept in the consensus only if shared by at least 2 HSPs.
+
+`TEdenovo/Visualization_Files/Original_HSP_fastaAlignment/*.fa_aln`
+
+### Start TEannot pipeline
+
+* Copy the configuration files « TEannot.cfg » into your working directory:
+
+(The original TEannot.cfg file is available at $REPET_PATH/config/TEannot.cfg)
+
+`cd ; cd Chig`\
+`mkdir TEannot/; cd TEannot/`\
+`cp ~/data/TEannot.cfg ./`
+
+* Check if the configuration file is properly filled before launching TEannot:
+
+`gedit TEannot.cfg >/dev/null 2>&1 &`
+
+```plaintext
+[repet_env]
+...
+repet_job_manager: slurm
+
+[project]
+project_name: Chig
+project_dir: /home/guestFormation/Chig/TEannot
+…
+[export]
+…
+gff3_merge_redundant_features: yes
+gff3_compulsory_match_part: yes
+gff3_with_genomic_sequence: no
+gff3_with_TE_length: yes
+gff3_with_classif_info: yes
+classif_table_name: Chig_sim_consensus_classif
+```
+
+* Link to the TEdenovo consensus library
+
+This library contains consensus after filtering of “noCat” consensus built using less than 10 copies and consensus classified as SSR
+
+`ln -s ../TEdenovo/Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered/Chig_sim_denovoLibTEs_filtered.fa Chig_refTEs.fa`
+
+* Link to the input fasta file of the genomic sequences
+
+`ln -s ~/data/Chig.fa`
+
+* Source the environment before launching REPET pipeline (if new terminal window after TEdenovo)
+
+`. ~/Chig/setEnv.sh`
+
+* **TEannot pipeline consists of 8 steps that you can launch using only one command line:**
+
+`nohup launch_TEannot.py -P Chig -C TEannot.cfg -e >& TEannot.log &`
+
+**P**: project_name
+
+#### Alternatively, you can launch the TEannot.py pipeline step by step:
+
+`nohup TEannot.py -P name -C config.cfg -S step -[specific-step-param]`
+
+![400px-TEannot_1-2-3](uploads/bd18a459024a8da599a30eabcea5bffd/400px-TEannot_1-2-3.png)
+
+* Step 1: The first step prepares all the data banks required in the next steps
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 1 > S1.log >& runS1.log &`
+
+* Step 2: aligns the reference TE sequences on each genomic chunk via BLASTER (high sensitivity, followed by MATCHER) AND/OR REPEATMASKER (cutoff at 200) AND/OR CENSOR (high sensitivity)
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 2 -a BLR >& runS2BLR.log &`
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 2 -a RM >& runS2RM.log &`
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 2 -a CEN >& runS2CEN.log &`
+
+* Step 2 bis: idem to step 2 on randomized sequences to generate filter threshold
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 2 -a BLR -r >& runS2BLRr.log &`
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 2 -a RM -r >& runS2RMr.log &`
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 2 -a CEN -r >& runS2CENr.log &`
+
+* Step 3: filters and combines the HSPs obtained at step 2, i.e. the TE annotations
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 3 -c BLR+RM+CEN >& runS3.log &`
+
+![400px-TEannot_4-5](uploads/c1181974802711024711d716a4d0332d/400px-TEannot_4-5.png)
+
+* Step 4: search for satellites on the genomic sequences via TRF, Mreps and RepeatMasker
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 4 -s TRF >& runS4TRF.log &`
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 4 -s Mreps >& runS4Mreps.log &`
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 4 -s RMSSR >& runS4RMSSR.log &`
+
+* Step 5: merges the SSR annotations from the 3 programs used at the previous step
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 5 >& runS5.log &`
+
+* Step 6: compares a data bank (nucleotides or amino-acids, in fasta format, e.g. Repbase Update)
+
+(not mandatory) - Useful when TE are too degenerated to build "reliable" consensus
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 6 -b tblastx >& runS6btx.log &`
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 6 -b blastx >& runS6bx.log &`
+
+![400px-TEannot_7](uploads/65ea06363d2e045d12f7c345aeb37380/400px-TEannot_7.png)
+
+* Step 7: performs successive procedures such as removal of redundant TE, removal of SSR annotations included into TE annotations and "long join procedure"
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 7 >& runS7.log &`
+
+* Step 8: export annotations to GFF3 format
+
+`nohup TEannot.py -P Chig -C TEannot.cfg -S 8 -o GFF3 >& runS8.log &`
+
+### Post TEannot pipeline
+
+#### TEdenovo consensus library classification corresponding to Chig_refTEs.fa
+
+`cd ~/Chig/TEannot`\
+`gawk '{if(/>/){gsub(">","",$0);print}}' Chig_refTEs.fa >Chig_refTEs.lst`
+
+`egrep -f Chig_refTEs.lst ../TEdenovo/Chig_Blaster_GrpRecPil_Map_TEclassif/classifConsensus/Chig_sim_withoutRedundancy_negStrandReversed_WickerH.classif > Chig_refTEs.classif`
+
+#### Concatenate all gff files of genome annotation in one
+
+The outputs of TEannot step 8 are genome annotations in GFF3 format (and/or gameXML):
+
+`cd ~/Chig/TEannot `\
+`cat Chig_GFF3chr/*.gff3 |grep -v "##" > Chig_refTEs.gff `\
+`rm -r Chig_GFF3chr Chig_gameXMLchr`
+
+#### Compute statistics of TE genome annotation
+
+* Launch the "PostAnalyzeTELib.py" script to generate statistics about identified TE during the TEdenovo pipeline.
+
+`. ~/Chig/setEnv.sh `\
+`nohup PostAnalyzeTELib.py -a 3 -g 50819261 -p Chig_chr_allTEs_nr_noSSR_join_path -s Chig_refTEs_seq -v 2 >& runPostAnalyze.log &`
+
+**g**: Genome length (A. thaliana 4_CHROMOSOME).
+
+**p**: Project name + "chr_allTEs_nr_noSSR_join_path"
+
+**s**: Project name + "_refTEs_seq"
+
+#### Compute and plot the consensuses coverage
+
+* Launch the "plotCoverage.py". Each output image file (plotCoverage/\*.png) correspond to a plot of the coordinates of copies on their respective TE consensus sequences.
+
+`mkdir plotCoverage`
+
+`python $PYTHONPATH/SMART/Java/Python/plotCoverage.py -i Chig_refTEs.gff -f gff3 -q Chig_refTEs.fa --merge -l grey -o plotCoverage/Chig >& runPlotCoverage.log &`
+
+`rm *.Rout`
+
+**i**: Genome annotation file (gff).
+
+**f**: the file format
+
+**q**: the consensus sequences used in the TEannot
+
+**o**: output directory and project_name prefixe
+
+#### Select consensus for the second round of TEannot
+
+* Launch the "GetSpecificTELibAccordingToAnnotation.py" to select 3 subset of the consensus library used in the 1st TEannot
+
+`nohup GetSpecificTELibAccordingToAnnotation.py -i Chig_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE.tab -t Chig_refTEs_seq -v1 >& GetSpecificTELibAccordingToAnnotation.log &`
+
+**i**: Output file of PostAnalyzeTELib.py (statistics per consensus).
+
+**t**: MySQL table containing the consensus sequences
+
+* get the number of consenus by category
+
+`egrep -c ">" Chig_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE_*.fa`
+
+```plaintext
+Chig_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE_FullLengthCopy.fa:55
+Chig_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE_FullLengthFrag.fa:51
+Chig_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE_OneCopyAndMore.fa:101
+```
+
+* get the list of consensus with at least one full-length fragment in the genome
+
+`egrep ">" Chig_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE_FullLengthFrag.fa |sed 's/>//' > Chig_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE_FullLengthFrag.lst`
+
+```plaintext
+DHX-incomp-chim_Chig-B-G92-Map4_reversed
+DHX-incomp_Chig-B-G87-Map12
+DHX-incomp_Chig-B-R43-Map4
+DTX-comp_Chig-B-G32-Map20_reversed
+DTX-comp_Chig-B-G48-Map19_reversed
+DTX-comp_Chig-B-G49-Map20_reversed
+DTX-comp_Chig-B-G52-Map5_reversed
+DTX-comp_Chig-B-G53-Map20_reversed
+DTX-comp_Chig-B-P13.15-Map8
+...
+```
+
+* One can use this list to restrict the previous result files to these consensus list
+
+`grep -F -f Chig_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE_FullLengthFrag.lst A_result_file > A_result_file_FLF`
+
+## Results analysis
+
+### TEdenovo (and post TEdenovo) most interesting output files
+
+`cd /home/guestFormation/Chig/TEdenovo`
+
+#### TEdenovo output directories
+
+```plaintext
+Chig_db							step1: chunks and batches
+Chig_Blaster						step2: Blaster results
+Chig_Blaster_Grouper					step3: Grouper clustering
+Chig_Blaster_Recon					step3: Recon clustering
+Chig_Blaster_Piler					step3: Piler clustering
+Chig_Blaster_Grouper_Map				step4: Multiple alignment for each Grouper cluster
+Chig_Blaster_Recon_Map					step4: Multiple alignment for each Recon cluster
+Chig_Blaster_Piler_Map					step4: Multiple alignment for each Piler cluster
+Chig_Blaster_GrpRecPil_Map_TEclassif/detectFeatures/	step5: Output of all programs used to detect features
+Chig_Blaster_GrpRecPil_Map_TEclassif/classifConsensus	step6: consensus classification
+Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered/		step7: consensus filtered for SSR and under-represented noCat
+Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered_MCL	step8: MCL clustering of consensus
+```
+
+#### TEdenovo consensus library
+
+`Chig_Blaster_GrpRecPil_Map_TEclassif/classifConsensus/Chig_sim_withoutRedundancy_negStrandReversed_WickerH.fa`
+
+```plaintext
+>noCat_Chig-B-G1-Map20
+AGGTAGCAGGTAAATTGCCAGCCCTCATCTAGTATTTTGCTAGTCTCTAACCTATTTAGG
+…
+>SSR_Chig-B-G10-Map20
+TAATTTATATATATAGTAAGCTGTATATTATATTAATCTATATATAATTTAGTACCTTTC
+...
+>RLX-incomp_Chig-B-G102-Map10
+GAATTTCTTTCCAGAGTGCTTAGGAATTTCTAAGTAAGTTATTTTCCTTTATATAGGTTG
+…
+```
+
+#### TEdenovo consensus library after filtering of “noCat” consensus built using less than 10 copies and consensus classified as SSR – This library is used as input of TEannot pipeline
+
+`Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered/Chig_sim_denovoLibTEs_filtered.fa`
+
+```plaintext
+>noCat_Chig-B-G1-Map20
+AGGTAGCAGGTAAATTGCCAGCCCTCATCTAGTATTTTGCTAGTCTCTAACCTATTTAGG
+…
+>RLX-incomp_Chig-B-G102-Map10
+GAATTTCTTTCCAGAGTGCTTAGGAATTTCTAAGTAAGTTATTTTCCTTTATATAGGTTG
+…
+```
+
+#### Classification of TEdenovo consensus library (All consensuses including SSR and noCat built with less than 10 HSPs) according to Wicker classification nomenclature
+
+`Chig_Blaster_GrpRecPil_Map_TEclassif/classifConsensus/Chig_sim_withoutRedundancy_negStrandReversed_WickerH.classif`
+
+\- Legend
+
+```plaintext
+Seq_name	length	strand	status	class_classif	order_classif	completeness	evidence
+```
+
+```plaintext
+...
+RXX-TRIM-chim_Chig-B-G163-Map3	892	.	PotentialChimeric	I	TRIM	NA	CI=40; struct=(TElength: <700bps; TermRepeats: termLTR: 442); other=(TermRepeats: termTIR: 441; SSRCoverage=0.15)
+noCat_Chig-B-G166-Map3	912	.	ok	noCat	noCat	NA	CI=NA; struct=(SSRCoverage=0.20)
+DTX-incomp_Chig-B-G16-Map9_reversed	1590	-	ok	II	TIR	incomplete	CI=37; coding=(TE_BLRtx: Mariner-13_SS:ClassII:TIR:Tc1-Mariner:?: 10.98%, Mariner-3_SS:ClassII:TIR:Tc1-Mariner:?: 7.13%, Mariner-4_SS:ClassII:T
+IR:Tc1-Mariner:?: 8.26%, Mariner-6_SS:ClassII:TIR:Tc1-Mariner:?: 7.13%, Mariner1_AO:ClassII:TIR:Tc1-Mariner:?: 16.66%; profiles: PF03184.14_DDE_1_NA_EN_20.1: 91.71%(91.71%)); struct=(TElength: >1000bps); other=(SSR: (TA)11_end; SSRCoverage
+=0.51)
+noCat_Chig-B-G177-Map3	1089	.	ok	noCat	noCat	NA	CI=NA; struct=(SSRCoverage=0.63)
+```
+
+#### Classification statistics (All consensuses including SSR and noCat built with less than 10 HSPs)
+
+`Chig_Blaster_GrpRecPil_Map_TEclassif/classifConsensus/Chig_sim_withoutRedundancy_negStrandReversed_WickerH.classif_stats.txt`
+
+```plaintext
+DIRS incomp: 1 (0.62%)
+DIRS potential chimeric*: 1 (0.62%)
+DIRS total (RYX): 1 (0.62%)
+LARD potential chimeric*: 3 (1.85%)
+LARD total (RXX-LARD): 4 (2.47%)
+LINE comp: 4 (2.47%)
+LINE incomp: 4 (2.47%)
+LINE potential chimeric*: 2 (1.23%)
+LINE total (RIX): 8 (4.94%)
+LTR comp: 2 (1.23%)
+LTR incomp: 26 (16.05%)
+LTR potential chimeric*: 2 (1.23%)
+LTR total (RLX): 28 (17.28%)
+TRIM potential chimeric*: 1 (0.62%)
+TRIM total (RXX-TRIM): 3 (1.85%)
+
+ClassI + noCat order: 12 (7.41%)
+ClassI + one order: 44 (27.16%)
+ClassI potential chimeric*: 9 (5.56%)
+ClassI total (RXX): 56 (34.57%)
+
+Helitron incomp: 4 (2.47%)
+Helitron potential chimeric*: 1 (0.62%)
+Helitron total (DHX): 4 (2.47%)
+MITE total (DXX-MITE): 3 (1.85%)
+TIR comp: 11 (6.79%)
+TIR incomp: 20 (12.35%)
+TIR potential chimeric*: 1 (0.62%)
+TIR total (DTX): 31 (19.14%)
+
+ClassII + one order: 38 (23.46%)
+ClassII potential chimeric*: 2 (1.23%)
+ClassII total (DXX): 38 (23.46%)
+
+PotentialHostGene total: 3 (1.85%)
+SSR total: 20 (12.35%)
+
+Nb Potential chimeric*: 11 (6.79%)
+
+Nb noCat at class and order levels (noCat): 45 (27.78%)
+
+	-------------------------Summary--------------------------------
+
+RXX: 56 (34.57%)
+DXX: 38 (23.46%)
+PotentialHostGene: 3 (1.85%)
+SSR: 20 (12.35%)
+noCat: 45 (27.78%)
+TOTAL: 162 (100.00%)
+```
+
+#### MCL clustering output files
+
+\-Clustering statistics (1st column \[1,2 ..n\] correspond to MCL clusters \[MCL1, MCL2..MCLn\]):
+
+`Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered_MCL/Chig_sim_denovoLibTEs_filtered_MCL_statsPerCluster.tab`
+
+```plaintext
+cluster	sequencesNb	sizeOfSmallestSeq	sizeOfLargestSeq	averageSize	medSize
+1	10	1828	18549	8169	6870
+2	5	444	3020	1484	892
+3	5	1489	7092	3138	2384
+4	4	1590	1879	1782	1831
+5	4	2969	7645	5530	5753
+```
+
+\-Clustering global statistics:
+
+`Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered_MCL/Chig_sim_denovoLibTEs_filtered_MCL_globalStatsPerCluster.txt`
+
+```plaintext
+nb of clusters: 28
+nb of clusters with 1 sequence: 4
+nb of clusters with 2 sequences: 13
+nb of clusters with >2 sequences: 11 (48 sequences)
+nb of sequences: 78
+nb of sequences in the largest cluster: 10
+nb of sequences in the smallest cluster: 1
+size of the smallest sequence: 439
+size of the largest sequence: 33401
+average sequences size: 4365
+median sequences size: 2405
+```
+
+\-Consensus Library with header containing the cluster name \[MCL1, MCL2..MCLn\]:
+
+`Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered_MCL/Chig_sim_denovoLibTEs_filtered_MCL.fa`
+
+```plaintext
+>noCat_MCL2_Chig-B-G1-Map20
+AGGTAGCAGGTAAATTGCCAGCCCTCATCTAGTATTTTGCTAGTCTCTAACCTATTTAGG
+…
+>RLX-incomp_MCL12_Chig-B-G102-Map10
+GAATTTCTTTCCAGAGTGCTTAGGAATTTCTAAGTAAGTTATTTTCCTTTATATAGGTTG
+...
+```
+
+\-List (tabulated file) with 2 columns "Cluster_id TE_id" created Post TEdenovo piepeline
+
+`Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered_MCL/Chig_sim_denovoLibTEs_filtered_MCL.lst`
+
+```plaintext
+MCL1	DHX-incomp_Chig-B-G2-Map20
+MCL1	DHX-incomp_Chig-B-G87-Map12
+MCL1	DHX-incomp_Chig-B-R43-Map4
+MCL1	DHX-incomp-chim_Chig-B-G92-Map4_reversed
+MCL1	DTX-incomp_Chig-B-G3-Map8
+MCL1	DTX-incomp_Chig-B-G43-Map20_reversed
+MCL1	DTX-incomp_Chig-B-G59-Map3_reversed
+MCL1	DTX-incomp_Chig-B-R36-Map3
+MCL1	DTX-incomp-chim_Chig-B-G88-Map7
+MCL1	RLX-incomp_Chig-B-P28.6-Map5
+MCL2	DTX-incomp_Chig-B-R39-Map11
+MCL2	DXX-MITE_Chig-B-R23-Map20
+MCL2	noCat_Chig-B-G1-Map20
+MCL2	noCat_Chig-B-P1.5-Map20
+MCL2	RXX-TRIM-chim_Chig-B-G163-Map3
+... 
+```
+
+### TEannot (and post TEannot) most interesting output files
+
+`cd /home/guestFormation/Chig/TEannot`
+
+#### TEannot output directories
+
+```plaintext
+Chig_db			step1: chunks and batches
+Chig_TEdetect		step2 to 7: Censor, RepeatMasker, Blaster on genome sequences and combined results
+Chig_TEdetect_rnd	step2 : Censor, RepeatMasker, Blaster on random genome sequences and threshold file
+Chig_SSRdetect		step4 & 5 : TRF, Mreps and RepeatMaskerSSR on genome sequences and combined SSR results
+Chig_GFF3chr		step8: A gff3 file for each genome sequence annotated
+Chig_gameXMLchr		step8: A gamexml file for each genome sequence annotated
+```
+
+#### Genome annotation file
+
+`Chig_refTEs.gff`
+
+```plaintext
+unitig_10	Chig_REPET_TEs	match	15811	15915	0.0	+	.	ID=ms2134_unitig_10_DTX-comp_Chig-B-G52-Map5_reversed;Target=DTX-comp_Chig-B-G52-Map5_reversed 2 107;TargetLength=1865;Identity=73.3
+unitig_10	Chig_REPET_TEs	match_part	15811	15915	0.0	+	.	ID=mp2134-1_unitig_10_DTX-comp_Chig-B-G52-Map5_reversed;Parent=ms2134_unitig_10_DTX-comp_Chig-B-G52-Map5_reversed;Target=DTX-comp_Chig-B-G52-Map5_reversed 2 107;Identity=73.3
+unitig_10	Chig_REPET_TEs	match	17936	18124	0.0	-	.	ID=ms2135_unitig_10_DTX-comp_Chig-B-G53-Map20_reversed;Target=DTX-comp_Chig-B-G53-Map20_reversed 1691 1880;OtherTargets=DTX-comp_Chig-B-G48-Map19_reversed 1894 1705
+unitig_10	Chig_REPET_TEs	match_part	17936	18124	0.0	-	.	ID=mp2135-1_unitig_10_DTX-comp_Chig-B-G53-Map20_reversed;Parent=ms2135_unitig_10_DTX-comp_Chig-B-G53-Map20_reversed;Target=DTX-comp_Chig-B-G53-Map20_reversed 1691 1880
+unitig_10	Chig_REPET_TEs	match	24809	26695	0.0	+	.	ID=ms2136_unitig_10_DTX-comp_Chig-B-P13.15-Map8;Target=DTX-comp_Chig-B-P13.15-Map8 5 1891;TargetLength=1892;Identity=100.0
+unitig_10	Chig_REPET_TEs	match_part	24809	26695	0.0	+	.	ID=mp2136-1_unitig_10_DTX-comp_Chig-B-P13.15-Map8;Parent=ms2136_unitig_10_DTX-comp_Chig-B-P13.15-Map8;Target=DTX-comp_Chig-B-P13.15-Map8 5 1891;Identity=100.0
+unitig_10	Chig_REPET_TEs	match	178240	178319	0.0	+	.	ID=ms2137_unitig_10_DHX-incomp_Chig-B-G2-Map20;Target=DHX-incomp_Chig-B-G2-Map20 5638 5718;TargetLength=12963;Identity=71.6
+unitig_10	Chig_REPET_TEs	match_part	178240	178319	0.0	+	.	ID=mp2137-1_unitig_10_DHX-incomp_Chig-B-G2-Map20;Parent=ms2137_unitig_10_DHX-incomp_Chig-B-G2-Map20;Target=DHX-incomp_Chig-B-G2-Map20 5638 5718;Identity=71.6
+...
+```
+
+#### Classification of TEdenovo consensus library corresponding to Chig_refTEs.fa
+
+`Chig_refTEs.classif`
+
+```plaintext
+RLX-incomp_Chig-B-G102-Map10	467	.	ok	I	LTR	incomplete	CI=7; coding=(TE_BLRx: Copia-1_DPer-I_1p:ClassI:LTR:Copia:?: 10.51%, Copia-5_DAn-I_1p:ClassI:LTR:Copia:?: 10.54%; profiles: _RNaseH_copia_NA_RH_NA: 63.
+50%(63.50%)); struct=(TElength: <700bps); other=(SSRCoverage=0.44)
+noCat_Chig-B-G129-Map17	489	.	ok	noCat	noCat	NA	CI=NA; struct=(SSRCoverage=0.20)
+RXX_Chig-B-G14-Map15	2202	.	ok	I	noCat	NA	CI=33; coding=(profiles: _RT_maggy_NA_RT_NA: 33.78%(33.78%)); other=(SSRCoverage=0.57)
+noCat_Chig-B-G15-Map20	2239	.	ok	noCat	noCat	NA	CI=NA; struct=(SSRCoverage=0.41)
+RXX-TRIM-chim_Chig-B-G163-Map3	892	.	PotentialChimeric	I	TRIM	NA	CI=40; struct=(TElength: <700bps; TermRepeats: termLTR: 442); other=(TermRepeats: termTIR: 441; SSRCoverage=0.15)
+...
+```
+
+#### Genome annotation global statistics file
+
+`Chig_chr_allTEs_nr_noSSR_join_path.globalAnnotStatsPerTE.txt`
+
+```plaintext
+nb of sequences: 104
+nb of matched sequences: 101
+cumulative coverage: 3528765 bp
+coverage percentage: 6.94%
+
+total nb of TE fragments: 3036
+total nb full-length fragments: 411 (13.54%)
+total nb of TE copies: 2785
+total nb full-length copies: 448 (16.09%)
+families with full-length fragments: 51 (49.04%)
+ with only one full-length fragment: 10
+ with only two full-length fragments: 7
+ with only three full-length fragments: 9
+ with more than three full-length fragments: 25
+families with full-length copies: 55 (52.88%)
+ with only one full-length copy: 12
+ with only two full-length copies: 7
+ with only three full-length copies: 7
+ with more than three full-length copies: 29
+mean of median identity of all families: 85.57 +- 9.93
+mean of median length percentage of all families: 25.02 +- 32.68
+```
+
+#### TE annotation statistics per consensus
+
+`Chig_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE.tab`
+
+```plaintext
+TE	length	covg	frags	fullLgthFrags	copies	fullLgthCopies	meanId	sdId	minId	q25Id	medId	q75Id	maxId	meanLgth	sdLgth	minLgth	q25Lgth	medLgth	q75Lgth	maxLgth	meanLgthPerc	sdLgthPerc	minLgthPerc	q25LgthPerc	medLgthPerc	q75LgthPerc	maxLgthPerc
+DHX-incomp-chim_Chig-B-G92-Map4_reversed	18549	116688	64	4	60	4	85.21	9.39	66.90	78.70	84.75	93.50	100.00	1946.32	4893.32	28	47.00	59.50	157.00	18551	10.49	26.38	0.15	0.25	0.32	0.85	100.01
+DHX-incomp_Chig-B-G2-Map20	12963	39422	158	0	154	1	78.78	7.91	60.40	73.20	78.80	84.00	100.00	255.99	1270.14	29	48.00	64.00	97.00	12970	1.97	9.80	0.22	0.37	0.49	0.75	100.05
+DHX-incomp_Chig-B-G87-Map12	11268	166093	319	11	314	11	84.64	7.81	61.10	79.20	83.05	90.90	100.00	529.07	2168.88	26	39.00	53.00	77.00	11240	4.70	19.25	0.23	0.35	0.47	0.68	99.75
+DHX-incomp_Chig-B-R43-Map4	6103	21451	131	1	129	1	74.07	8.01	58.90	68.50	73.90	79.20	99.90	166.33	523.52	41	70.00	95.00	147.00	5963	2.73	8.58	0.67	1.15	1.56	2.41	97.71
+DTX-comp_Chig-B-G24-Map4_reversed	1866	459	2	0	2	0	87.50	4.95	84.00	84.00	87.50	91.00	91.00	229.50	167.58	111	111.00	229.50	348.00	348	12.30	8.98	5.95	5.95	12.30	18.65	18.65
+...
+```
+
+## Annexes
+
+### Additional commands
+
+* If you need to restart the REPET pipeline, you must delete all the folder created by REPET and clear the “jobs” table from the database
+
+`cd Chig `\
+`# delete one or the 2 pipeline directories depending on what you have to relaunch `\
+`rm -r TEannot `\
+`rm -r TEdenovo`
+
+`. setEnv.sh`
+
+`mysql -h $REPET_HOST -u $REPET_USER -p$REPET_PW $REPET_DB`
+
+`mysql> show tables;`
+
+`mysql> select * from jobs;`
+
+`mysql> delete from jobs;`
+
+`mysql> exit`
+
+* To delete all the tables: in case of relaunching all the 2 pipelines
+
+`ListAndDropTables.py -l "*" -C TEdenovo.cfg -d "*" -v 3`
+
+\->Deleting 30 tables corresponding to '\*'
+
+* To delete only TEannot tables in case of relaunching only TEannot
+
+`ListAndDropTables.py -l "Chig_chk_" -d "Chig_chk_"`
+
+\->Deleting 9 tables corresponding to 'Chig_chk_'
+
+`ListAndDropTables.py -l "Chig_chr_" -d "Chig_chr_"`
+
+> Deleting 4 tables corresponding to 'Chig_chr_'
+
+`ListAndDropTables.py -l "Chig_refTEs" -d "Chig_refTEs"`
+
+\->Deleting 2 tables corresponding to 'Chig_refTEs'
+
+# Practical course: Manual curation of the transposable elements library
+
+### Compilation of consensus information : classification, genome annotation statistics, MCL clustering
+
+* Create a tab (tabulated format) containing all the useful information for each consensus)
+
+\- Sort MCL cluster list, TEannot stat file and TEdenovo classification file on consensus ID
+
+`cd ~/Chig/TEannot`
+
+`sort -k2,2 ../TEdenovo/Chig_Blaster_GrpRecPil_Map_TEclassif_Filtered_MCL/Chig_sim_denovoLibTEs_filtered_MCL.lst > tmp.mcl`
+
+`gawk -F"\t" '{OFS="\t"; if($6>0){print $1,$2,$3,$4,$5,$6,$7,$8,$22}}' Chig_chr_allTEs_nr_noSSR_join_path.annotStatsPerTE.tab | sort -k1,1 >tmp.stat`
+
+`sort -k1,1 Chig_refTEs.classif > tmp.classif`
+
+\- join tmp.stat column 1 with tmp.mcl column 2
+
+`join -t $'\t' -1 1 -2 2 tmp.stat tmp.mcl > tmp.stat.mcl`
+
+\-Join tmp.stat.mcl column 1 with tmp.classif column 1
+
+`join -t $'\t' -1 1 -2 1 tmp.stat.mcl tmp.classif > tmp.stat.mcl.classif`
+
+\- Add a header to the final tab file
+
+`echo -e "ID\tLength\tGenome_coverage(bp)\tFragments\tFLFragments\tCopies\tFLCopies\tMeanId(%)\tMeanLengthPerc(%)\tMCLcluster\tLength\tstrand\tStatus\tClass\tOrder\tCompletude\tEvidences" |cat - tmp.stat.mcl.classif > Chig_refTEs_stat_mcl_classif.tsv`
+
+### Consensus annotation (from PASTEC classifier) using IGV genome browser
+
+* We will use here, IGV genome browser ([download](https://software.broadinstitute.org/software/igv/download)) to display consensuses annotation
+
+All the annotations on each consensus (output of the TEdenovo, step 5), such as structural features or homology with known TE, HMM profiles have been extracted in GFF files using the "CreateGFF3sForClassifFeatures.py" (Cf Post TEdenovo pipeline section). The coordinates (start, end) and strand have also been reversed complemented when PASTEC classifier reversed-complemented a Consensus according to evidences found. The gff files are present in "\~/TEdenovo/Visualization_Files/gff_reversed/" directory.
+
+* Launch IGV
+
+\- 3 ways to launch IGV: \
+The Application installed on your computer, java webstart, or on your VM (using x2go client: igv.sh &)
+
+* Load the consensus sequences:
+
+```plaintext
+Menu Genome -> Load genome from file ... 
+TEdenovo/Visualization_Files/Chig_sim_denovoLibTEs_filtered.fa
+```
+
+* Load the gff files corresponding to each track of annotation
+
+```plaintext
+Menu File -> Load from File ... 
+TEdenovo/Visualization_Files/gff_reversed/Chig_TE_BLRx.gff3
+					Chig_TE_BLRx.gff3
+					Chig_TE_BLRtx.gff3
+					Chig_TE_BLRn.gff3
+					Chig_Profiles.gff3
+					Chig_TR.gff3
+					Chig_SSR.gff3
+					Chig_ORF.gff3
+					Chig_polyA.gff3
+					Chig_rDNA_BLRn.gff3
+```
+
+* Save the IGV Session
+
+```plaintext
+Menu File -> Save Session ... 
+TEdenovo/Visualization_Files/gff_reversed/igv_session.xml
+```
+
+* Reload an IGV session saved
+
+```plaintext
+Menu File -> Open Session ... 
+TEdenovo/Visualization_Files/gff_reversed/igv_session.xml
+```
+
+### Display multiple alignment of HSP used to build the consensus using Jaview
+
+* We will use here jalview ([Download](http://www.jalview.org/Download)) to display multiple alignments used to build the consensuses
+
+In the "\[other\]" section of "\~/TEdenovo/CreateGFF3sForClassifFeatures.cfg" if key "original_HSP: yes" : \
+The "\~/TEdenovo/Visualization_Files/Original_HSP_fastaAlignment/" directory has been created and contains symbolic links (alias-like) to the original Consensuses alignments build at the TEdenovo step 4.
+
+* Launch Jalview
+
+\- 3 ways to launch jalview \
+The Application installed on your computer, java webstart, or on your VM (using x2go client: jalview &)
+
+* Close all the internal windows corresponding to a project opened by default:
+
+```plaintext
+Menu File -> Input Alignment -> From File -
+```
+
+* Setup your displaying preferences
+
+```plaintext
+Menu Tools -> Preferences -> Visual
+```
+
+![400px-Jalview_preference_Visual](uploads/0cb91663076ce458b154c99692629a91/400px-Jalview_preference_Visual.png)
+
+```plaintext
+Menu Tools -> Preferences -> Colours
+```
+
+![400px-Jalview_preference_Colours](uploads/8ff7e2a2bd55c3416599c777c04f7209/400px-Jalview_preference_Colours.png)
+
+### Plot genome copies related to a consensus
+
+* Output images of plotCoverage.py have been saved at:
+
+```plaintext
+~/TEannot/plotCoverage/*
+```
\ No newline at end of file