Commit f584bb9b authored by Edlira Nano's avatar Edlira Nano
Browse files

corrected compar + manual + readme_file

git-svn-id: https://subversion.renater.fr/masschroq/trunk@2278 e4b6dbb4-9209-464b-83f7-6257456c460c
parent d7572f42
......@@ -11,7 +11,8 @@
\usepackage{amsmath}
\usepackage{amsfonts}
\usepackage{amssymb}
%\usepackage{amsthm}
\usepackage{amsthm}
\usepackage{varioref}
\usepackage{mathrsfs}
%\usepackage[all, 2cell, emtex]{xy}
......@@ -112,7 +113,6 @@
firstline={#1},
lastline={#2}]{peptide_example_tsv_file.txt}}
\usepackage{appendix}
\newcommand{\bi}{\begin{itemize}}
......@@ -126,10 +126,16 @@
\newcommand{\mexe}{\texttt{masschroq}}
\newcommand{\ttt}[1]{\texttt{#1}}
\theoremstyle{plain}
\newtheorem{note}{Note}
\labelformat{note}{ Note~#1}
\pagestyle{headings} %pour mettre des entetes avec les titres des
%sections en haut de page
\begin{document}
\begin{titlepage}
\begin{latexonly}
......@@ -311,12 +317,6 @@ peptide, and allows matching of previously missed peaks in some cases.
For more details on the post-matching mode and when to use it, see section
\ref{peak_matching}.
\textdbend The masschroqML schema has changed in {\Mv}: the current
schema version being $1.2$ (the same version number as
MassChroQ). Older masschroqML files stay functional with the new
schema, but new masschroqML files (containing the post matching
feature for example) will not work with older schema versions.
\subsection{The comparison result file}
The \emph{tsv} result files produced by {\M} are well suited for
automatic statistical analysis in external tools, like the \emph{R}
......@@ -332,6 +332,13 @@ from now on this has the effect of creating three \emph{tsv} files :
the \emph{\_pep}, the \emph{\_prot} and the \emph{\_compar} extension
files.
\subsection{New masschroqML schema}
The masschroqML schema has changed in {\Mv}: the current
schema version being $1.2$ (the same version number as
MassChroQ). Older masschroqML files stay functional with the new
schema, but new masschroqML files (containing the post matching
feature for example) will not work with older schema versions.
\section{{\M} features overview}\label{groups-sec}
{\M} has been designed to perform quantification on a wide range of
......@@ -485,14 +492,16 @@ on the command-line console.
Please refer to the \ttt{masschroq\_readme.pdf} file that comes with your
Windows installation for further details.
\textdbend The \emph{masschroqML} format is an XML format which in Windows are
by default opened with Internet Explorer, and are impossible to
\begin{note}
The \emph{masschroqML} format is an XML format. On Windows systems, xml files are
by default opened with Internet Explorer or Notepad, and are impossible/difficult to
edit. When you need to edit these files, we strongly recommend you to
open them with text editors (other than Notepad, try it, you
will see why), for example the free \emph{Notepad++} editor which
will see why), for example the free open source \emph{Notepad++} text editor which
offers syntax highlighting with a nice look. And if you cannot stand
Windows problems anymore, we recommend you the latest Ubuntu Desktop
edition (all the advantages of Linux but nice and easy graphical use).
\end{note}
\subsection{SVN repository}
The subversion repository located at
......@@ -596,7 +605,8 @@ Running \ttt{masschroq} on this input file will produce a new XML input file nam
\verb!parsed-peptides_input_file.xml! containing the original
\verb!input_file.xml! with all the identified peptides integrated and
organized in the masschroqML format.
\textdbend At this point {\M} automatically continues execution on the
\begin{note}
At this point {\M} automatically continues execution on the
newly generated \verb!parsed-peptides_input_file.xml! performing the
analysis instructions it contains. If you do not want {\M} to continue
the analysis, but only parse the peptide files, you should use the the
......@@ -608,7 +618,7 @@ This will produce a new XML input file named
\verb!parsed-peptides_input_file.xml! containing the original
\verb!input_file.xml! with all the identified peptides integrated and
organized in the masschroqML format but will not continue analysis on it.
\end{note}
\subsection{Temporary working directory option}
While it parses the mzXML/mzML data files, {\M} writes the spectra
......@@ -631,10 +641,12 @@ masschroq --tmp-dir DIRECTORY input_file.xml
{\M} will then perform analysis on \ttt{input\_file.xml} and will put the
temporary files in \ttt{DIRECTORY} instead of its current working directory.
\textdbend The total size of temporary files produced during an
\begin{note}
The total size of temporary files produced during an
analysis, is very close to the total size of the data
files (mzXML/mzML files) being analyzed . Be careful when specifying
another working directory not to choose one that has size limitations.
\end{note}
\chapter{How {\M} works}\label{works-sec}
In this section we give an in-depth explanation of {\M}'s
......@@ -832,7 +844,8 @@ The intensity of XICs in {\M} can be represented in two different ways:
The user can choose one of this representations in the XIC extraction
parameters.
\textdbend In {\M} we have purposefully chosen to
\begin{note}
In {\M} we have purposefully chosen to
perform quantification on the extracted XICs as explained above,
rather than on feature detection on the 2D virtual image which many
other software use. Indeed, the latter needs high resolution in MS
......@@ -840,6 +853,7 @@ mode in order to be able to identify isotopic profiles. By contrast,
quantification based on XICs can be used with low-resolution as well as with
high-resolution mass spectrometers by simply adapting the window size
of XIC extraction.
\end{note}
\section{XIC filtering}
The following XIC filters are implemented in {\M} :
......@@ -900,10 +914,12 @@ The Zivy peak detection method has widely replaced the Moulon one in
practice in our laboratory, giving much more accurate and precise
results.
\textdbend The Zivy peak detection method is a peak localization
\begin{note}
The Zivy peak detection method is a peak localization
method: its purpose is to determine the peak
positions and the peak boundaries on the signal. Peak intensities an
peak area are then computed on the original unaltered signal.
\end{note}
This method uses morphological opening and closing signal transforms
with small flat linear structural elements (also known as respectively
......@@ -914,12 +930,14 @@ morphological transforms have been widely used in image processing to
remove noise and to detect peaks or edges showing their efficiency in
particular in noisy signals (see \cite{handbook} and \cite{edge}).
\textdbend On a one-dimensional signal, the open (resp. close) transform
\begin{note}
On a one-dimensional signal, the open (resp. close) transform
with a flat linear structural element (i.e. a segment) of size R is
equivalent to replacing the signal values at every point by the
maximum of the minimum (resp. minimum of the maximum) of all the
points in a neighborhood of radius R (see \cite{leymarie} and
\cite{kluwer}). This is what we do in {\M}.
\end{note}
Schematically, as illustrated in the figure below, opening
and closing transforms with flat linear structural elements both smooth and
......@@ -973,12 +991,14 @@ Here is how the Zivy peak detection algorithm works:
\ei
\textdbend In the figure above one can see different interesting
\begin{note}
In the figure above one can see different interesting
cases: for example the very intense peak on the left of the unique
detected peak
is eliminated because its intensity in the open signal does not exceed
the open threshold (too thin to be a relevant peak). This peak is
indeed a noisy pulsing spectrometer effect.
\end{note}
\newpage
\begin{lstlisting}[style=algo, mathescape, caption = The Zivy peak
......@@ -1214,7 +1234,8 @@ generated by other external alignment tools. The user just puts the
\ttt{.time} files in the same directory as the run files, {\M} loads
them and analyzes the runs with the aligned time values automatically.
\textdbend For this to work, the masschroqML input file should not
\begin{note}
For this to work, the masschroqML input file should not
contain an alignment instruction (the <align> tag in the masschroqML
file) that concerns the runs whose .time file have
been provided. Indeed, if an alignment instruction asks
......@@ -1223,6 +1244,7 @@ destroys these \ttt{.time} values and performs the
asked alignment instead. Thus, the alignment instructions in the masschroqML
input files have the priority over the preloaded \ttt{.time}
alignment files.
\end{note}
\subsubsection*{Cascade alignments}
......@@ -1522,7 +1544,8 @@ each identified protein;
\ei
\ei
\textdbend The user does not have to fill the \ttt{<protein\_list>} and
\begin{note}
The user does not have to fill the \ttt{<protein\_list>} and
\ttt{<peptide\_list>} blocks:
\bi
\item The {\Xp}, when asked to export its results in masschroqML
......@@ -1534,6 +1557,7 @@ described in the previous section, it will create a new file named
\ttt{parsed-peptides\_input\_file.xml} already containing the above peptide
and protein blocks.
\ei
\end{note}
The peptide and protein list blocks are not mandatory in a masschroqML file.
\newpage
......@@ -1555,16 +1579,20 @@ consisting in two modifications at Nter and K both with mass modification
value of $28$; and label \ttt{iso2} label containing Nter and K
modifications each of value $32$.
\textdbend Different \ttt{isotope\_label} elements can be defined, for
\begin{note}
Different \ttt{isotope\_label} elements can be defined, for
example to perform quantification on different groups of differently
labeled samples.
\end{note}
\textdbend The identified peptides in the \ttt{peptide\_list} block
\begin{note}
The identified peptides in the \ttt{peptide\_list} block
do not take into account the isotopic labelings (their MH
value is not the modified one). {\M} computes their isotopic masses
during quantification (if quantification on isotopes has been asked)
using the information contained in the \ttt{isotope\_label\_list}
block if any.
\end{note}
The \ttt{isotope\_label\_list} block is not mandatory in a masschroqML
file.
......@@ -1586,16 +1614,20 @@ moving window used to smooth the MS/MS retention times to $5$. For
details on every alignment parameter see the parameters cheat-sheet on
section \ref{cheat-sheet}.
\textdbend In the \ttt{align} element (lines $57$ and $58$), we give
\begin{note}
In the \ttt{align} element (lines $57$ and $58$), we give
{\M} the order to perform alignment on groups
$G1$ and $G2$ using respectively the \ttt{my\_ms2} method and the
\ttt{my\_obiwarp} method. All the attributes here are mandatory.
\end{note}
\textdbend The \ttt{reference\_data\_id} attribute (lines $57$ and $58$)
\begin{note}
The \ttt{reference\_data\_id} attribute (lines $57$ and $58$)
indicates the data file in the group that will be used as a reference
for the alignment : the other samples of the group will be aligned
towards this reference sample. Thus, the choice of the reference
sample in alignments is very important.
\end{note}
The \ttt{alignments} block is not mandatory in a masschroqML
file.
......@@ -1732,7 +1764,8 @@ we have done in the example above. We have chosen to export results
into a tsv file that will be called \ttt{result1.tsv}, a gnumeric file
that will be called \ttt{result2.gnumeric}, etc.
\textdbend The \ttt{tsv} and \ttt{xhtmltable} outputs
\begin{note}
The \ttt{tsv} and \ttt{xhtmltable} outputs
will create two files : one with extension \ttt{\_pep} containing the
quantification results, and a second one with extension
\ttt{\_prot} resuming for each peptide, the corresponding proteins and
......@@ -1742,6 +1775,7 @@ sheets for each of them. The \ttt{tsv} output will also create a third file with
the extension \ttt{\_copmpar}, containing the quantification results
sorted in a different way, well suited for a first direct visual
verification of the results.
\end{note}
In the example above three files named
\ttt{results\_pep.tsv}, \ttt{results\_prot.tsv} and
......@@ -1763,7 +1797,8 @@ times of this sample as they appear in the raw data file;
computed aligned retention times.
\ei
\textdbend The \ttt{.time} files are useful for analysis and alignment
\begin{note}
The \ttt{.time} files are useful for analysis and alignment
checking, but they can also be used to avoid realigning samples or to
inject external alignment values. Indeed, {\M} automatically loads
previously generated \ttt{.time} files right after he parses the run
......@@ -1771,6 +1806,7 @@ files. This way, one does not have to repeat alignment on previously
aligned files. Also it can use an external alignment tool and inject
its results via \ttt{.time} files in masschroq's analysis. For details
on this see section \ref{repeat-align}.
\end{note}
\newpage
......@@ -1859,9 +1895,11 @@ format. It contains from left to right order:
these deviation values are already smoothed.
\ei
\textdbend Alignment traces are only available for the in-house
\begin{note}
Alignment traces are only available for the in-house
developed MS2 alignment method. The third-party OBi-Warp alignment method
that we have integrated in {\M} does not support trace files.
\end{note}
\newpage
......@@ -1910,20 +1948,26 @@ presented in appendix \ref{pep-app} :
In this example values are separated by tabulations. Other correct
separation characters are comma ``$,$'' and semi-colon ``$;$''.
\textdbend In a given peptide file, separation characters should not
\begin{note}
In a given peptide file, separation characters should not
be melted: one has to exclusively use the tab, the comma or the
semi-colon separator everywhere in the file. Melting them will cause masschroq to exit with
a parsing error.
\end{note}
\subsection*{Header specification}
As shown above, the header for the peptide files is:
\peplst{1}{1}
\textdbend The first five columns are mandatory. The sixth column (\ttt{mods}) is optional.
\begin{note}
The first five columns are mandatory. The sixth column (\ttt{mods}) is optional.
\end{note}
\textdbend The header (the first five columns exactly) is mandatory:
\begin{note}
The header (the first five columns exactly) is mandatory:
it must be present in all of your peptide files. If another header is
used masschroq will exit with a parse error.
\end{note}
The other correct headers, depending on the chosen separation
character are :
......@@ -1956,7 +2000,8 @@ The accepted values for a peptide text file are :
\end{itemize}
\textdbend One line represents a given peptide sequence
\begin{note}
One line represents a given peptide sequence
in a given charge state in the given scan number of the corresponding
run data, identified in the given protein. Hence, more than one line can
be found for the same peptide sequence, with different scan numbers,
......@@ -1964,6 +2009,7 @@ charge states, or protein description values. This also means that for
the same peptide sequence, with same MH, same scan number and same charge
state but belonging to two different proteins, two lines should be put in the
file, one per each protein (as in lines 5 and 6 above).
\end{note}
\newpage
......@@ -1972,13 +2018,17 @@ file, one per each protein (as in lines 5 and 6 above).
In this chapter you will find the list of all {\M} parameters with an
explanation, recommended values for them and some practical advice.
\textdbend An archive containing several ready-to-use examples
\begin{note}
An archive containing several ready-to-use examples
illustrating different alignment and quantification methods is
available on the \href{\sitemasschroq}{MassChroQ homepage}.
\end{note}
\textdbend In all the following moving-window filters or transforms,
\begin{note}
In all the following moving-window filters or transforms,
the user is asked to enter a half window size parameter. The corresponding
window in {\M} will then be of size: $2 * half\_window + 1$.
\end{note}
\section{Alignment parameters}
\subsection*{The \emph{ObiWarp} alignment parameters}
......
......@@ -85,48 +85,48 @@ ComparTsvQuantifResults::setLocaleAndPrecisionForAllStreams()
_compar_output_stream->setRealNumberNotation(QTextStream::SmartNotation);
}
void
ComparTsvQuantifResults::setMatchedPeaks(const std::vector<xicPeak *> * p_v_peak_list)
{
vector<xicPeak *>::const_iterator itp;
for (itp = p_v_peak_list->begin();
itp != p_v_peak_list->end();
++itp)
{
const Peptide * p_peptide(_current_quanti_item->getPeptide());
QString pepId(""), isotope_label(""), sampleId(""), tempArea(""), currentZ, pepSequence("");
_current_sample_id.clear();
const Peptide * p_peptide(_current_quanti_item->getPeptide());
// we only print peptide information in the compar file
if (p_peptide != NULL)
{
vector<xicPeak *>::const_iterator itp;
QString pepId, pepSequence, tempArea, currentZ, protIds;
// we only print peptide data
if (p_peptide != NULL)
currentZ.setNum(getCurrentZ());
pepSequence = p_peptide->getSequence();
// get the current sample id and add it to the _sampleIds list
_current_sample_id = _current_quantify_id;
_current_sample_id.append("_").append(_current_group_id);
_current_sample_id.append("_").append(_current_msrun_id);
const IsotopeLabel * isotope = p_peptide->getIsotopeLabel();
if (isotope != NULL)
{
currentZ.setNum(getCurrentZ());
pepId = p_peptide->getXmlId();
pepId.append("_").append(currentZ);
sampleId = _current_quantify_id;
sampleId.append("_");
sampleId.append(_current_group_id);
sampleId.append("_");
sampleId.append(((*itp)->getMsrun())->getXmlId());
if (p_peptide->getIsotopeLabel() != NULL)
{
isotope_label = p_peptide->getIsotopeLabel()->getXmlId();
sampleId.append("_");
sampleId.append(isotope_label);
}
// adding sample to the sample map
if ( !_sampleIds.contains(sampleId) )
{
_sampleIds << sampleId;
}
QString isotope_label = isotope->getXmlId();
_current_sample_id.append("_").append(isotope_label);
}
if ( !_sampleIds.contains(_current_sample_id) )
{
_sampleIds << _current_sample_id;
}
// construct the pepId (key of _map_peptide_quantification)
pepId = p_peptide->getXmlId();
pepId.append("_").append(currentZ);
mcq_double area((*itp)->get_area());
tempArea = _map_peptide_quantification[pepId][sampleId];
// construct the print information for every detected peak
for (itp = p_v_peak_list->begin(); itp != p_v_peak_list->end(); ++itp)
{
// first get the area and add it to the _map_peptide_quantification
// if no value is found for current sample ID, put an empty string (very important)
// if more than one values are found, append them
tempArea = _map_peptide_quantification[pepId][_current_sample_id];
mcq_double area((*itp)->get_area());
if ( ! tempArea.isEmpty())
{
tempArea.append("|").append(formatCell(area));
......@@ -135,28 +135,22 @@ ComparTsvQuantifResults::setMatchedPeaks(const std::vector<xicPeak *> * p_v_peak
{
tempArea = formatCell(area);
}
_map_peptide_quantification[pepId][sampleId] = tempArea;
_map_peptide_quantification[pepId][_current_sample_id] = tempArea;
// print information
QStringList list;
list << formatCell(p_peptide->getXmlId());
list << formatCell(_current_quanti_item->get_mz());
list << formatCell(currentZ);
pepSequence = p_peptide->getSequence();
list << formatCell(pepSequence);
vector <const Protein *> prots = p_peptide->getProteinList();
unsigned int prot_size = prots.size();
QString protIds("");
if (prot_size > 0)
vector <const Protein *>::iterator it;
for (it = prots.begin(); it != prots.end(); ++it)
{
vector <const Protein *>::iterator it;
for (it = prots.begin();
it != prots.end();
++it)
{
protIds.append((*it)->getXmlId()).append(" ");
}
protIds.append((*it)->getXmlId()).append(" ");
}
list << formatCell(protIds);
_map_peptide_description[pepId] = list;
}
......@@ -170,10 +164,7 @@ ComparTsvQuantifResults::debriefing()
/// print results
this->printHeaders();
QString sampId, pepId, tmpArea;
std::map<QString, std::map<QString, QString> >::const_iterator itpep;
std::map<QString, QString>::const_iterator itsamp;
std::map<QString, QStringList>::const_iterator itfind;
for (itpep = _map_peptide_quantification.begin();
......@@ -181,7 +172,7 @@ ComparTsvQuantifResults::debriefing()
++itpep)
{
QStringList list;
pepId = itpep->first;
QString pepId = itpep->first;
itfind = _map_peptide_description.find(pepId);
if (itfind != _map_peptide_description.end())
......@@ -189,14 +180,16 @@ ComparTsvQuantifResults::debriefing()
list << itfind->second;
}
std::map<QString, QString> inner_map = itpep->second;
for (itsamp = inner_map.begin();
itsamp != inner_map.end();
++itsamp)
QStringList::const_iterator sampleIterator;
for (sampleIterator = _sampleIds.constBegin();
sampleIterator != _sampleIds.constEnd();
++sampleIterator)
{
list << formatCell(itsamp->second);
QString sampId = (*sampleIterator);
QString tmpArea = _map_peptide_quantification[pepId][sampId];
list << formatCell(tmpArea);
}
printLine(list);
}
}
......
......@@ -56,17 +56,16 @@ public:
protected:
/// pure virtual methods
virtual void setOutputFilesAndStreams(const QString & filename);
virtual void setLocaleAndPrecisionForAllStreams();
/// override action on QuantifResultsBase for stored results
/// override methods on QuantifResultsBase
virtual void setMatchedPeaks(const std::vector<xicPeak *> * p_v_peak_list);
/// some printing and formating methods
virtual void printHeaders();
virtual void printLine(const QStringList & list);
/// Output file for compar results
......@@ -77,13 +76,16 @@ protected:
std::map< QString, std::map<QString, QString> > _map_peptide_quantification;
/// hash map : peptide-z id -> protein id's separated by a space
std::map<QString, QStringList> _map_peptide_description;
/// list of all the sample IDs
/// list of all the sample id-s
QStringList _sampleIds;
private :
/// The streams corresponding to the output file
QTextStream * _compar_output_stream;
/// Unique key identifying the sample (sample id)
QString _current_sample_id;
};
#endif /* COMPAR_TSV_QUANTIF_RESULTS_H_ */
......@@ -181,9 +181,10 @@ QuantifResultsBase::setMatchedPeaks(const std::vector<xicPeak *> * p_v_peak_list
if (p_peptide != NULL)
{
pepId = p_peptide->getXmlId();
if (p_peptide->getIsotopeLabel() != NULL)
const IsotopeLabel * isotope = p_peptide->getIsotopeLabel();
if (isotope != NULL)
{
isotope_label = p_peptide->getIsotopeLabel()->getXmlId();
isotope_label = isotope->getXmlId();
}
pepSequence = p_peptide->getSequence();
pepMods = p_peptide->getMods();
......
......@@ -30,6 +30,45 @@ MassChroQ version 1.2 - \emph{Longteeth Crocklet}}}
Congratulations! MassChroQ version 1.2, called \emph{Longteeeth Crocklet},
is now installed on your Windows system.
\section*{What is new in {\Mv}?}
\subsection*{The post-matching feature}
The main novelty in {\Mv} is the introduction of the peptide peak post-matching feature.
Indeed, before version $1.2$, {\M} performed peak
matching during peptide quantification peak by peak. The post-matching mode
adds a peak matching step at the end of the quantification in each
group: for each peptide to be quantified, its previously unmatched
peaks are rematched by taking into account the
retention times of the previously matched peaks of this peptide in the
group. This gives a finer retention time computation for each
peptide, and allows matching of previously missed peaks in some cases.
For more details on the post-matching mode and when to use it, see the MassChroQ manual
that comes with your installation.
\subsection*{The comparison result file}
The \emph{tsv} result files produced by {\M} are well suited for
automatic statistical analysis in external tools, like the \emph{R}
software. But they are not well suited for simple manual or visual
overview or comparison purposes. Therefore, in version $1.2$, a new type of
\emph{tsv} result file has been added: the \emph{comparison result
file}. The comparison result file is well suited for immediate visual
comparison of results and simple manual statistical analysis on them.
The comparison type of file is automatically generated with the
other old \emph{tsv} result files, this means that when the user
chooses to export the quantification results in \emph{tsv} format,
from now on this has the effect of creating three \emph{tsv} files :
the \emph{\_pep}, the \emph{\_prot} and the \emph{\_compar} extension
files.
\subsection*{New masschroqML schema}
The masschroqML schema has changed in {\Mv}: the current
schema version being $1.2$ (the same version number as