pipeline.tex 8.31 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
\chapter{The main Spell-QTL pipeline}

\section{General view}

\begin{figure}[h] 
  \centering
  \includesvg[width=\columnwidth]{images/Spell-pipeline2}
  \caption{The main Spell-QTL pipeline}\label{fig:pipeline}
\end{figure}

Sylvain Jasson's avatar
Sylvain Jasson committed
11
12
13
14
15
16
17
18
19
20
21
Global software organization is displayed in figure \vref{fig:pipeline}. 
\section{Minimal session}

A minimal session for Spell-QTL analysis is 3 commands long. For an example to run example 1 from package:
\begin{itemize}
\item \texttt{spell-pedigree -wd my\_directory -n my\_name -p example1.ped}
\item \texttt{spell-marker -wd my\_directory -n my\_name -m F2:A/B example1\_F2.gen -m F2C:A/C example1\_F2C.gen -o F2,F2C}
\item \texttt{spell-qtl -wd my\_directory -n my\_name -P auto -p F2 example1\_F2.phen -p F2C example1\_F2C.phen -gm example1.map}
\end{itemize}

If you want to duplicate these commands, you must check that the input files are available to the programs. You may want to copy them in your test directory, or use an absolute or relative path in you command line. 
22
23
24
25
26


\section{Software suite details}
\subsection{\texttt{spell-pedigree}}
\begin{itemize}
Sylvain Jasson's avatar
Sylvain Jasson committed
27
	\item Computes the transition matrices for the Continuous Time Hidden Markov Models (CTHMM). They are the $T_d$ matrices in formula \vref{eq:pop}. The number of hidden states is of course the order of the matrix. 
28
29
30
31
32
33
34
35
36
37
38
        \item These computations are inherently dependent, so it can only run sequentially.
        \item Outputs a data file that can be fed to \texttt{spell-marker}.
\end{itemize}
\subsection{\texttt{spell-marker}}
\begin{itemize}
	\item Computes the 1-point Parental Origins Probabilities by Bayesian inference for all markers.
        \item Each marker is independent, so it can run in various ways:
                \begin{itemize}
                  \item Sequentially,
                  \item Multithreaded,
                  \item Scheduling jobs on {\em Sun Grid Engine},
Sylvain Jasson's avatar
Sylvain Jasson committed
39
                  \item Sending jobs to remote machines via \texttt{ssh}
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
                \end{itemize}
        \item Outputs a data file that can be fed to \texttt{spell-qtl}.
        \item Can also output the raw Parental Origin Probabilities.
\end{itemize}
\subsection{\texttt{spell-qtl}}
\begin{itemize}
	\item Performs the QTL analysis {\em per se}.
        \item Can also output the n-point Parental Origin Probabilities along the linkage groups.
        \item Can run most computations concurrently on a multicore computer.
        \item Computation results are cached on disk (and/or in RAM).
\end{itemize}

\section{Input files}
\subsection{Pedigree}
\subsubsection{File format}
Sylvain Jasson's avatar
Sylvain Jasson committed
55
See \texttt{spell-pedigree} man page (at \vref{spell-pedigree:description}.)
56
57
58
59
60
\subsubsection{File sample}
\lstinputlisting[numbers=left,
		frame=single,
		breaklines=false,
		caption={[Pedigree (.ped input file)]Pedigree (selected lines from example1.ped from three\_parents\_F2 example)},
Sylvain Jasson's avatar
Sylvain Jasson committed
61
		%label=file:pedigree,
62
63
64
65
		linerange={1-12,107-112}
		]
		{input_files/example1.ped}

Sylvain Jasson's avatar
Sylvain Jasson committed
66
Note that: \begin{itemize}
Sylvain Jasson's avatar
Sylvain Jasson committed
67
68
69
70
\item the first line is expected to be header only and will be ignored by \texttt{spell-pedigree}.
\item Only four columns are used, any additional column will be silently ignored by \texttt{spell-pedigree}
\end{itemize}

71
72
73
74

\subsection{Marker observations}

\subsubsection{File format}
Sylvain Jasson's avatar
Sylvain Jasson committed
75
\texttt{spell-marker} understand a few common formats, based on MapMaker RAW format (without traits):
76
\begin{itemize}
Sylvain Jasson's avatar
Sylvain Jasson committed
77
\item A line beginning with \texttt{data type} followed by ignored text (\textit{e.g.} line 1 in sample \vref{file:gen})
Sylvain Jasson's avatar
Sylvain Jasson committed
78
79
\item A line containing four integer values:  number of individuals, number of markers, two ignored values (\textit{e.g.} line 2 in sample \vref{file:gen})
\item A line per marker beginning with starred(\texttt{*}) marker name followed by a space and by allele observed or inferred for each individual (a character per individual). (\textit{e.g.} line 3-39 in sample \vref{file:gen})
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
\end{itemize}

Build in allele code are : 
\begin{description}
\item[02] SNP observations, where 0 and 2 are homozygous and 1 is heterozygous. These observations type are relevant for any individual in the pedigree, including parents. \texttt{spell-marker} will then perform inference of possible genotypes and inference of possible states in the CTHMM.   
\item[ABHCD] MapMaker like Parental Origin inferred observations. These are relevant for inbred lines crosses products.  Let's consider the cross $A|A \times B|B$:
\begin{itemize}
\item The child is typed A and the allele A is not dominant. The only possible genotype is $A|A$. This is encoded by the character \texttt{ A} in MapMaker.
\item The child is typed A and the allele A is dominant. The possible genotype are $A|A$, $A|B$ and $B|A$. This is encoded by the character \texttt{ D} in MapMaker.
\item The child is typed B and the allele B is not dominant. The only possible genotype is $B|B$. This is encoded by the character \texttt{ B} in MapMaker.
\item The child is typed B and the allele B is dominant. The possible genotype are $A|B$, $B|A$ and $B|B$. This is encoded by the character \texttt{ C} in MapMaker.
\item The child is typed AB (the allele A and B are codominant). The possible genotype are $A|B$ and  $B|A$. This is encoded by the character \texttt{ H} in MapMaker.
\item The child in not typed. The possible genotypes are $A|A$, $A|B$, $B|A$ and $B|B$. This is encoded by the character \texttt{ -} in MapMaker.
\end{itemize}
The parental origin letters can be overridden in the command line.
Sylvain Jasson's avatar
Sylvain Jasson committed
95
\item[CP] Outbred observations  as defined in Cathagene. These observations are relevant for all known phase situations, including cases where one parent is homozygous, when 3 or 4 different alleles are present.  Lets consider the cross $A|B \times C|D$: The possibles child genotypes are $A|C$, $A|D$, $B|C$ and $B|D$. Carthagene format actually enables the user to express any subset of the 4 different possibilities using a single hexadecimal digit (0-f). 
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121

\begin{center}
\begin{tabular}{cc}
Code & Possible genotypes \\
\hline
1    & $A|C$ \\
2    & $A|D$     \\
3    & $A|C$,$A|D$      \\
4    & $B|C$      \\
5    & $A|C$,$B|C$      \\
6    & $A|D$,$B|C$      \\
7    & $A|C$,$A|D$,$B|C$      \\
8    & $B|D$      \\
9    & $A|C$,$B|D$      \\
a    & $A|D$,$B|D$      \\
b    & $A|C$,$A|D$,$B|D$      \\
c    & $B|C$,$B|D$     \\
d    & $A|C$,$B|C$,$B|D$       \\
e    & $A|D$,$B|C$,$B|D$  \\
0 or f or -    & $A|C$,$A|D$,$B|C$,$B|D$      \\
\end{tabular}
\end{center}
\end{description}

Note that \textbf{CP} and \textbf{ABHCD} formats imply user-made genotype inference. Depending on generation, \texttt{spell-marker} will perform further genotype inference and HMM state inference using pedigree. 

Sylvain Jasson's avatar
Sylvain Jasson committed
122
Other allele code can be defined via a JSON file. (see in appendix \vref{spell-marker:marker-observation-format-specification} for format and \vref{spell-marker:example-the-02-abhcd-and-cp-formats} for sample files)
123
124
125
126
127
128

\subsubsection{File sample}
\lstinputlisting[numbers=left,
		frame=single,
		breaklines=false,
		caption={[Marker alleles (.gen input file)]Marker alleles  (example1\_F2.gen from three\_parents\_F2 example)},
Sylvain Jasson's avatar
Sylvain Jasson committed
129
		label=file:gen
130
131
132
133
		%linerange={1-8,35-39}
		]
		{input_files/example1_F2.gen}

Sylvain Jasson's avatar
Sylvain Jasson committed
134
135
Note that \begin{itemize}
\item in line 1 \texttt{F2} after \texttt{ data type} is irrelevant for \texttt{spell-marker}.
Sylvain Jasson's avatar
Sylvain Jasson committed
136
\item in line 2 \texttt{0 0} after \texttt{100 37} is irrelevant for \texttt{spell-marker}.
Sylvain Jasson's avatar
Sylvain Jasson committed
137
138
\end{itemize}

139
140
141
142
143

\subsection{Genetic map}
\subsubsection{File format}
One line per linkage group (space separated) :
\begin{itemize}
Sylvain Jasson's avatar
Sylvain Jasson committed
144
\item Starred(\texttt{*}) name for this linkage group
145
146
147
148
149
150
\item Number of markers in the linkage group
\item Name of first marker
\item Series of distance in cM and name of next marker
\end{itemize}
               
\subsubsection{File sample}
Sylvain Jasson's avatar
Sylvain Jasson committed
151
152
153
154
155
156
\lstinputlisting[numbers=left,
		frame=single,
		breaklines=false,
		caption={[Genetic Map (.map input file)] Genetic map (example1.map from three\_parents\_F2 example)},
		label=file:map]
		{input_files/example1.map}
157
158
159

\subsection{Trait observations}
\subsubsection{File format}
Sylvain Jasson's avatar
Sylvain Jasson committed
160
As in MapMaker RAW format, without header : one line per trait beginning with starred(\texttt{*}) trait name followed by space separated observations (one numerical observation per individual, \texttt{ -} means unobserved). 
161
162
163
164
 
\subsubsection{File sample}
\lstinputlisting[numbers=left,frame=single,breaklines=false,caption={[Trait observations (.phen input file)]Trait observations (example1\_F2.phen from three\_parents\_F2 example)}]{input_files/example1_F2.phen}