Commit 9671a832 authored by Sylvain Jasson's avatar Sylvain Jasson
Browse files

input file samples

parent afe0b42e
\documentclass[openright,twoside,10pt,DIV=11]{scrreprt}
\documentclass[openright,oneside,10pt,DIV=11]{scrreprt}
\usepackage[toc,page]{appendix}
\usepackage{url}
\usepackage{listings}
......@@ -286,7 +286,7 @@ Global software organization is displayed in figure \ref{fig:pipeline}. More det
\section{Software suite details}
\subsection{\texttt{spell-pedigree}}
\begin{itemize}
\item Computes the transition matrices for the Continuous Time Hidden Markov Models.
\item Computes the transition matrices for the Continuous Time Hidden Markov Models (The $T_d$ matrices in formula \ref{eq:pop}).
\item These computations are inherently dependent, so it can only run sequentially.
\item Outputs a data file that can be fed to \texttt{spell-marker}.
\end{itemize}
......@@ -313,9 +313,9 @@ Global software organization is displayed in figure \ref{fig:pipeline}. More det
\section{Input files}
\subsection{Pedigree}
\subsection{File format}
\subsubsection{File format}
See \texttt{spell-pedigree} man page (at appendix \ref{ch:spell:predigree})
\subsection{File sample}
\subsubsection{File sample}
\begin{lstlisting}[frame=single,caption={Pedigree (.ped)}]
Founders,1,0,0
Founders,2,0,0
......@@ -330,8 +330,76 @@ F3,10,8,8
F3,11,8,8
\end{lstlisting}
\subsection{Marker observations}
\subsubsection{File format}
\texttt{spell-marker} understand a few common formats, based on MapMaker RAW format (without traits) :
\begin{itemize}
\item A line beginning with {\tt data type} followed by ignored text
\item A line containing four integer values : number of markers, number of individuals, two ignored values
\item A line per marker beginning with starred(*) marker name followed by a space and by allele observed or inferred for each individual (a character per individual).
\end{itemize}
Build in allele code are :
\begin{description}
\item[02] SNP observations, where 0 and 2 are homozygous and 1 is heterozygous. These observations type are relevant for any individual in the pedigree.
\item[ABHCD] MapMaker like Parental Origin inferred observations. These are relevant for inbred lines crosses products. Let's consider the cross $A|A \times B|B$:
\begin{itemize}
\item The child is typed A and the allele A is not dominant. The only possible genotype is $A|A$. This is encoded by the character {\tt A} in MapMaker.
\item The child is typed A and the allele A is dominant. The possible genotype are $A|A$, $A|B$ and $B|A$. This is encoded by the character {\tt D} in MapMaker.
\item The child is typed B and the allele B is not dominant. The only possible genotype is $B|B$. This is encoded by the character {\tt B} in MapMaker.
\item The child is typed B and the allele B is dominant. The possible genotype are $A|B$, $B|A$ and $B|B$. This is encoded by the character {\tt C} in MapMaker.
\item The child is typed AB (the allele A and B are codominant). The possible genotype are $A|B$ and $B|A$. This is encoded by the character {\tt H} in MapMaker.
\item The child in not typed. The possible genotypes are $A|A$, $A|B$, $B|A$ and $B|B$. This is encoded by the character {\tt -} in MapMaker.
\end{itemize}
The parental origin letters can be overridden in the command line.
\item[CP] Outbred observations as defined in Cathagene. These observations are relevant for all known phases situations, including cases where one parent is homozygous, when 3 or 4 different alleles are present. Lets consider the cross $A|B \times C|D$: The possibles child genotypes are $A|C$, $A|D$, $B|C$ and $B|D$. Carthagene format actually enables the user to express any subset of the 4 different possibilities using a single hexadecimal digit (0-f).
\begin{center}
\begin{tabular}{cc}
Code & Possible genotypes \\
\hline
1 & $A|C$ \\
2 & $A|D$ \\
3 & $A|C$,$A|D$ \\
4 & $B|C$ \\
5 & $A|C$,$B|C$ \\
6 & $A|D$,$B|C$ \\
7 & $A|C$,$A|D$,$B|C$ \\
8 & $B|D$ \\
9 & $A|C$,$B|D$ \\
a & $A|D$,$B|D$ \\
b & $A|C$,$A|D$,$B|D$ \\
c & $B|C$,$B|D$ \\
d & $A|C$,$B|C$,$B|D$ \\
e & $A|D$,$B|C$,$B|D$ \\
0 or f or - & $A|C$,$A|D$,$B|C$,$B|D$ \\
\end{tabular}
\end{center}
\end{description}
Other allele code can be defined via a JSON file. (see appendix \ref{ch:spell:marker} for format and sample files)
\subsubsection{File sample}
\begin{lstlisting}[frame=single,caption={Marker alleles (.gen)}]
data type random example
42 42 0 0
*M1 21221-0212-122-20000-101022220100202102200
*M2 012120-0221010220101112222101211122120211-
*M3 0221022000211200012112-020000-101221222202
*M4 00112-021012200110101221221222112120100120
*M5 2211-00211121002221--2-20002102-1011220211
*M6 01211-201202221121002-12211200000001011001
*M7 202212202-00--10-101221200-112001-110-220-
*M8 222-22-02102002002220112-021-022--12012-11
*M9 0210-10-1122212-21000-2200-121200200222211
...
\end{lstlisting}
Note that {\tt data type} is irrelevant
\subsection{Genetic map}
\subsection{File format}
\subsubsection{File format}
One line per linkage group (space separated) :
\begin{itemize}
\item Starred(*) name for this linkage group
......@@ -340,7 +408,7 @@ One line per linkage group (space separated) :
\item Series of distance in cM and name of next marker
\end{itemize}
\subsection{File sample}
\subsubsection{File sample}
\begin{lstlisting}[frame=single,breaklines=false,caption={Genetic map (.map)}]
*Chrom1 3 M11 10.5 M12 30.3 M13
*Chrom2 17 M21 5.5 M22 0 M23 2 M24 5 M25 8 M26 11 M27 2.2 M28 2.5 M29 ...
......@@ -355,7 +423,7 @@ One line per linkage group (space separated) :
\sloppy
\input{spell-pedigree.tex}
\fussy
\chapter{\texttt{spell-marker} man page}
\chapter{\texttt{spell-marker} man page} \label{ch:spell:marker}
\input{spell-marker.tex}
\chapter{\texttt{spell-qtl} man page}
\input{spell-qtl.tex}
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment