Skip to content
Snippets Groups Projects
Commit 9671a832 authored by Sylvain Jasson's avatar Sylvain Jasson
Browse files

input file samples

parent afe0b42e
No related branches found
No related tags found
No related merge requests found
\documentclass[openright,twoside,10pt,DIV=11]{scrreprt}
\documentclass[openright,oneside,10pt,DIV=11]{scrreprt}
\usepackage[toc,page]{appendix}
\usepackage{url}
\usepackage{listings}
......@@ -286,7 +286,7 @@ Global software organization is displayed in figure \ref{fig:pipeline}. More det
\section{Software suite details}
\subsection{\texttt{spell-pedigree}}
\begin{itemize}
\item Computes the transition matrices for the Continuous Time Hidden Markov Models.
\item Computes the transition matrices for the Continuous Time Hidden Markov Models (The $T_d$ matrices in formula \ref{eq:pop}).
\item These computations are inherently dependent, so it can only run sequentially.
\item Outputs a data file that can be fed to \texttt{spell-marker}.
\end{itemize}
......@@ -313,9 +313,9 @@ Global software organization is displayed in figure \ref{fig:pipeline}. More det
\section{Input files}
\subsection{Pedigree}
\subsection{File format}
\subsubsection{File format}
See \texttt{spell-pedigree} man page (at appendix \ref{ch:spell:predigree})
\subsection{File sample}
\subsubsection{File sample}
\begin{lstlisting}[frame=single,caption={Pedigree (.ped)}]
Founders,1,0,0
Founders,2,0,0
......@@ -330,8 +330,76 @@ F3,10,8,8
F3,11,8,8
\end{lstlisting}
\subsection{Marker observations}
\subsubsection{File format}
\texttt{spell-marker} understand a few common formats, based on MapMaker RAW format (without traits) :
\begin{itemize}
\item A line beginning with {\tt data type} followed by ignored text
\item A line containing four integer values : number of markers, number of individuals, two ignored values
\item A line per marker beginning with starred(*) marker name followed by a space and by allele observed or inferred for each individual (a character per individual).
\end{itemize}
Build in allele code are :
\begin{description}
\item[02] SNP observations, where 0 and 2 are homozygous and 1 is heterozygous. These observations type are relevant for any individual in the pedigree.
\item[ABHCD] MapMaker like Parental Origin inferred observations. These are relevant for inbred lines crosses products. Let's consider the cross $A|A \times B|B$:
\begin{itemize}
\item The child is typed A and the allele A is not dominant. The only possible genotype is $A|A$. This is encoded by the character {\tt A} in MapMaker.
\item The child is typed A and the allele A is dominant. The possible genotype are $A|A$, $A|B$ and $B|A$. This is encoded by the character {\tt D} in MapMaker.
\item The child is typed B and the allele B is not dominant. The only possible genotype is $B|B$. This is encoded by the character {\tt B} in MapMaker.
\item The child is typed B and the allele B is dominant. The possible genotype are $A|B$, $B|A$ and $B|B$. This is encoded by the character {\tt C} in MapMaker.
\item The child is typed AB (the allele A and B are codominant). The possible genotype are $A|B$ and $B|A$. This is encoded by the character {\tt H} in MapMaker.
\item The child in not typed. The possible genotypes are $A|A$, $A|B$, $B|A$ and $B|B$. This is encoded by the character {\tt -} in MapMaker.
\end{itemize}
The parental origin letters can be overridden in the command line.
\item[CP] Outbred observations as defined in Cathagene. These observations are relevant for all known phases situations, including cases where one parent is homozygous, when 3 or 4 different alleles are present. Lets consider the cross $A|B \times C|D$: The possibles child genotypes are $A|C$, $A|D$, $B|C$ and $B|D$. Carthagene format actually enables the user to express any subset of the 4 different possibilities using a single hexadecimal digit (0-f).
\begin{center}
\begin{tabular}{cc}
Code & Possible genotypes \\
\hline
1 & $A|C$ \\
2 & $A|D$ \\
3 & $A|C$,$A|D$ \\
4 & $B|C$ \\
5 & $A|C$,$B|C$ \\
6 & $A|D$,$B|C$ \\
7 & $A|C$,$A|D$,$B|C$ \\
8 & $B|D$ \\
9 & $A|C$,$B|D$ \\
a & $A|D$,$B|D$ \\
b & $A|C$,$A|D$,$B|D$ \\
c & $B|C$,$B|D$ \\
d & $A|C$,$B|C$,$B|D$ \\
e & $A|D$,$B|C$,$B|D$ \\
0 or f or - & $A|C$,$A|D$,$B|C$,$B|D$ \\
\end{tabular}
\end{center}
\end{description}
Other allele code can be defined via a JSON file. (see appendix \ref{ch:spell:marker} for format and sample files)
\subsubsection{File sample}
\begin{lstlisting}[frame=single,caption={Marker alleles (.gen)}]
data type random example
42 42 0 0
*M1 21221-0212-122-20000-101022220100202102200
*M2 012120-0221010220101112222101211122120211-
*M3 0221022000211200012112-020000-101221222202
*M4 00112-021012200110101221221222112120100120
*M5 2211-00211121002221--2-20002102-1011220211
*M6 01211-201202221121002-12211200000001011001
*M7 202212202-00--10-101221200-112001-110-220-
*M8 222-22-02102002002220112-021-022--12012-11
*M9 0210-10-1122212-21000-2200-121200200222211
...
\end{lstlisting}
Note that {\tt data type} is irrelevant
\subsection{Genetic map}
\subsection{File format}
\subsubsection{File format}
One line per linkage group (space separated) :
\begin{itemize}
\item Starred(*) name for this linkage group
......@@ -340,7 +408,7 @@ One line per linkage group (space separated) :
\item Series of distance in cM and name of next marker
\end{itemize}
\subsection{File sample}
\subsubsection{File sample}
\begin{lstlisting}[frame=single,breaklines=false,caption={Genetic map (.map)}]
*Chrom1 3 M11 10.5 M12 30.3 M13
*Chrom2 17 M21 5.5 M22 0 M23 2 M24 5 M25 8 M26 11 M27 2.2 M28 2.5 M29 ...
......@@ -355,7 +423,7 @@ One line per linkage group (space separated) :
\sloppy
\input{spell-pedigree.tex}
\fussy
\chapter{\texttt{spell-marker} man page}
\chapter{\texttt{spell-marker} man page} \label{ch:spell:marker}
\input{spell-marker.tex}
\chapter{\texttt{spell-qtl} man page}
\input{spell-qtl.tex}
......
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment