pipeline.tex 7.72 KB
 Sylvain Jasson committed Jun 02, 2017 1 2 3 4 5 6 7 8 9 10 \chapter{The main Spell-QTL pipeline} \section{General view} \begin{figure}[h] \centering \includesvg[width=\columnwidth]{images/Spell-pipeline2} \caption{The main Spell-QTL pipeline}\label{fig:pipeline} \end{figure}  Sylvain Jasson committed Jun 03, 2017 11 Global software organization is displayed in figure \vref{fig:pipeline}. More detailed informations about the main purpose of each part and then about the required input files will be provided in this chapter.  Sylvain Jasson committed Jun 02, 2017 12 13 14 15 16  \section{Software suite details} \subsection{\texttt{spell-pedigree}} \begin{itemize}  Sylvain Jasson committed Jun 03, 2017 17  \item Computes the transition matrices for the Continuous Time Hidden Markov Models (CTHMM). They are the $T_d$ matrices in formula \vref{eq:pop}. The number of hidden states is of course the order od the matrix.  Sylvain Jasson committed Jun 02, 2017 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44  \item These computations are inherently dependent, so it can only run sequentially. \item Outputs a data file that can be fed to \texttt{spell-marker}. \end{itemize} \subsection{\texttt{spell-marker}} \begin{itemize} \item Computes the 1-point Parental Origins Probabilities by Bayesian inference for all markers. \item Each marker is independent, so it can run in various ways: \begin{itemize} \item Sequentially, \item Multithreaded, \item Scheduling jobs on {\em Sun Grid Engine}, \item Sending jobs to remote machines via \texttt{ ssh} \end{itemize} \item Outputs a data file that can be fed to \texttt{spell-qtl}. \item Can also output the raw Parental Origin Probabilities. \end{itemize} \subsection{\texttt{spell-qtl}} \begin{itemize} \item Performs the QTL analysis {\em per se}. \item Can also output the n-point Parental Origin Probabilities along the linkage groups. \item Can run most computations concurrently on a multicore computer. \item Computation results are cached on disk (and/or in RAM). \end{itemize} \section{Input files} \subsection{Pedigree} \subsubsection{File format}  Sylvain Jasson committed Jun 03, 2017 45 See \texttt{spell-pedigree} man page (at \vref{spell-pedigree:description}.)  Sylvain Jasson committed Jun 02, 2017 46 47 48 49 50 \subsubsection{File sample} \lstinputlisting[numbers=left, frame=single, breaklines=false, caption={[Pedigree (.ped input file)]Pedigree (selected lines from example1.ped from three\_parents\_F2 example)},  Sylvain Jasson committed Jun 03, 2017 51  %label=file:pedigree,  Sylvain Jasson committed Jun 02, 2017 52 53 54 55  linerange={1-12,107-112} ] {input_files/example1.ped}  Sylvain Jasson committed Jun 03, 2017 56 57 58 59 60 Note that \begin{itemize} \item the first line is expected to be header only and will be ignored by \texttt{spell-pedigree}. \item Only four columns are used, any additional column will be silently ignored by \texttt{spell-pedigree} \end{itemize}  Sylvain Jasson committed Jun 02, 2017 61 62 63 64 65 66  \subsection{Marker observations} \subsubsection{File format} \texttt{spell-marker} understand a few common formats, based on MapMaker RAW format (without traits) : \begin{itemize}  Sylvain Jasson committed Jun 03, 2017 67 68 69 \item A line beginning with \texttt{data type} followed by ignored text (\textit{e.g.} line 1 in sample \vref{file:gen}) \item A line containing four integer values : number of individuals, number of markers, two ignored values (\textit{e.g.} line 2 in sample \vref{file:gen}) \item A line per marker beginning with starred(*) marker name followed by a space and by allele observed or inferred for each individual (a character per individual). (\textit{e.g.} line 3-39 in sample \vref{file:gen})  Sylvain Jasson committed Jun 02, 2017 70 71 72 73 74 75 76 77 78 79 80 81 82 83 84 85 86 87 88 89 90 91 92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 \end{itemize} Build in allele code are : \begin{description} \item[02] SNP observations, where 0 and 2 are homozygous and 1 is heterozygous. These observations type are relevant for any individual in the pedigree, including parents. \texttt{spell-marker} will then perform inference of possible genotypes and inference of possible states in the CTHMM. \item[ABHCD] MapMaker like Parental Origin inferred observations. These are relevant for inbred lines crosses products. Let's consider the cross $A|A \times B|B$: \begin{itemize} \item The child is typed A and the allele A is not dominant. The only possible genotype is $A|A$. This is encoded by the character \texttt{ A} in MapMaker. \item The child is typed A and the allele A is dominant. The possible genotype are $A|A$, $A|B$ and $B|A$. This is encoded by the character \texttt{ D} in MapMaker. \item The child is typed B and the allele B is not dominant. The only possible genotype is $B|B$. This is encoded by the character \texttt{ B} in MapMaker. \item The child is typed B and the allele B is dominant. The possible genotype are $A|B$, $B|A$ and $B|B$. This is encoded by the character \texttt{ C} in MapMaker. \item The child is typed AB (the allele A and B are codominant). The possible genotype are $A|B$ and $B|A$. This is encoded by the character \texttt{ H} in MapMaker. \item The child in not typed. The possible genotypes are $A|A$, $A|B$, $B|A$ and $B|B$. This is encoded by the character \texttt{ -} in MapMaker. \end{itemize} The parental origin letters can be overridden in the command line. \item[CP] Outbred observations as defined in Cathagene. These observations are relevant for all known phases situations, including cases where one parent is homozygous, when 3 or 4 different alleles are present. Lets consider the cross $A|B \times C|D$: The possibles child genotypes are $A|C$, $A|D$, $B|C$ and $B|D$. Carthagene format actually enables the user to express any subset of the 4 different possibilities using a single hexadecimal digit (0-f). \begin{center} \begin{tabular}{cc} Code & Possible genotypes \\ \hline 1 & $A|C$ \\ 2 & $A|D$ \\ 3 & $A|C$,$A|D$ \\ 4 & $B|C$ \\ 5 & $A|C$,$B|C$ \\ 6 & $A|D$,$B|C$ \\ 7 & $A|C$,$A|D$,$B|C$ \\ 8 & $B|D$ \\ 9 & $A|C$,$B|D$ \\ a & $A|D$,$B|D$ \\ b & $A|C$,$A|D$,$B|D$ \\ c & $B|C$,$B|D$ \\ d & $A|C$,$B|C$,$B|D$ \\ e & $A|D$,$B|C$,$B|D$ \\ 0 or f or - & $A|C$,$A|D$,$B|C$,$B|D$ \\ \end{tabular} \end{center} \end{description} Note that \textbf{CP} and \textbf{ABHCD} formats imply user-made genotype inference. Depending on generation, \texttt{spell-marker} will perform further genotype inference and HMM state inference using pedigree.  Sylvain Jasson committed Jun 03, 2017 112 Other allele code can be defined via a JSON file. (see in appendix \vref{spell-marker:marker-observation-format-specification} for format and \vref{spell-marker:example-the-02-abhcd-and-cp-formats} for sample files)  Sylvain Jasson committed Jun 02, 2017 113 114 115 116 117 118  \subsubsection{File sample} \lstinputlisting[numbers=left, frame=single, breaklines=false, caption={[Marker alleles (.gen input file)]Marker alleles (example1\_F2.gen from three\_parents\_F2 example)},  Sylvain Jasson committed Jun 03, 2017 119  label=file:gen  Sylvain Jasson committed Jun 02, 2017 120 121 122 123  %linerange={1-8,35-39} ] {input_files/example1_F2.gen}  Sylvain Jasson committed Jun 03, 2017 124 125 126 127 128 Note that \begin{itemize} \item in line 1 \texttt{F2} after \texttt{ data type} is irrelevant for \texttt{spell-marker}. \item in line 2 \texttt{0 0} after \texttt{100 37} is irrelevant \texttt{spell-marker}. \end{itemize}  Sylvain Jasson committed Jun 02, 2017 129 130 131 132 133 134 135 136 137 138 139 140  \subsection{Genetic map} \subsubsection{File format} One line per linkage group (space separated) : \begin{itemize} \item Starred(*) name for this linkage group \item Number of markers in the linkage group \item Name of first marker \item Series of distance in cM and name of next marker \end{itemize} \subsubsection{File sample}  Sylvain Jasson committed Jun 03, 2017 141 142 143 144 145 146 \lstinputlisting[numbers=left, frame=single, breaklines=false, caption={[Genetic Map (.map input file)] Genetic map (example1.map from three\_parents\_F2 example)}, label=file:map] {input_files/example1.map}  Sylvain Jasson committed Jun 02, 2017 147 148 149 150 151 152 153 154  \subsection{Trait observations} \subsubsection{File format} As in MapMaker RAW format, without header : one line per trait beginning with starred (*) trait name followed by space separated observations (one numerical observation per individual, \texttt{ -} means unobserved). \subsubsection{File sample} \lstinputlisting[numbers=left,frame=single,breaklines=false,caption={[Trait observations (.phen input file)]Trait observations (example1\_F2.phen from three\_parents\_F2 example)}]{input_files/example1_F2.phen}