Skip to content
Snippets Groups Projects
Commit 61cbe804 authored by Raphaël Flores's avatar Raphaël Flores
Browse files

Rework static content. Remove useless linefeeds.

parent a08c0c2d
No related branches found
No related tags found
No related merge requests found
# Help section
> Note: this application is responsive, meaning that its rendering will adapt to the screen's size of your device, hence location of items below can be different than stated.
You can seek data by entering one or several term(s) in the main search box (_e.g._ **wheat yield** or **Vitis**).
> Note: there is no boolean operator neither quote character allowing to refine your query, any entered term will be used to find data.
......@@ -13,4 +15,3 @@ You can then filter this list of results by using the facets/categories on the l
When categories allowing filtering are too large (_i.e._ they contain more than 10 elements), they appear as search box in which you can chose among available terms.
> Note: applying several filters in the same facet/category will search for documents related to any of the selected terms (resulting in a `OR` operator between all the category's terms).
> In contrary, applying filters in different facets/categories will search for documents having both filters (resulting in an `AND` operator for the different facets).
......@@ -68,7 +68,7 @@ The value must be unique in your own dataset and should be clear enough to help
## description
Description of the entry.
It must contains all the relevant keywords allowing to find your entry and to understand what the entry is.
It must contain all the relevant keywords allowing to find your entry and to understand what the entry is.
It is the most important field for the discoverability of the data since it is the one used to match terms searched by users.
It is up to you to provide the most relevant description allowing to match the entry, but keep in mind that the more precise the description is, better the ranking in the search tool would be.
......@@ -114,7 +114,7 @@ The value is constrained. You must use one of the following values, the one fitt
## taxon
Genus or species (in the binomial form) of the entity, without the author abbreviation (eg Populus, Vitis vinifera).
Genus or species (in the binomial form) of the entity, without the author abbreviation (_e.g._ Populus, Vitis vinifera).
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
......@@ -179,20 +179,20 @@ Country from which the entity originally comes from.
## originLatitude
Latitude of the country from which the entity originally comes from.
It must be in a decimal degree format (eg 3.9988889, -53).
It must be part of a decimal degrees format (_e.g._ `3.9988889` for `3.9988889,-53` coordinates).
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | None |
| Optional | 1 | -90 < Lat. < 90 |
## originLongitude
Longitude of the country from which the entity originally comes from.
It must be in a decimal degree format (eg 3.9988889, -53).
It must be part of a decimal degrees format (_e.g._ `-53` for `3.9988889,-53` coordinates).
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | None |
| Optional | 1 | -180 < Long. < 180 |
## countryOfCollect
......@@ -205,20 +205,20 @@ Country from which the entity was collected.
## collectLatitude
Latitude of the country from which the entity was collected.
It must be in a decimal degree format (eg 3.9988889, -53).
It must be part of a decimal degrees format (_e.g._ `3.9988889` for `3.9988889,-53` coordinates).
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | None |
| Optional | 1 | -90 < Lat. < 90 |
## collectLongitude
Longitude of the country from which the entity was collected.
It must be in a decimal degree format (eg 3.9988889, -53).
It must be part of a decimal degrees format (_e.g._ `-53` for `3.9988889,-53` coordinates).
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | None |
| Optional | 1 | -180 < Long. < 180 |
[&#8593;](#top)
......@@ -232,20 +232,20 @@ The file(s) can be either sent to us or published in a web folder from where it
## TSV (Tabulation Separated Values)
The order of the fields matters as in any TSV file.
Take care to remove any unnecessary tabulation or return line between fields in order to comply with the expected format.
Take care to remove any tabulation or return line that can be present in the fields content in order to comply with the expected format.
No double quotes are needed.
```csv example
pillarName databaseSource portalURL identifier name description dataURL domain taxon materialType biotopeType countryOfOrigin originLatitude originLongitude countryOfCollect collectLatitude collectLongitude
Pilier Forêt Forest Tree GnpIS https://urgi.versailles.inra.fr/faidare/?germplasmLists=Forest%20BRC https://doi.org/10.15454/0FZNAO 661300375 661300375 is a Populus x generosa accession (number: 661300375, https://doi.org/10.15454/0FZNAO) maintained by the Forest BRC (managed by INRA) and held by INRA-ONF. It is a clone/clone of biological status interspecific cross/croisement interspécifique. This accession is also known as: 0054B165. This accession is part of collection(s): breeding_gispeuplier, mapping_pedigree_0504B. This accession has phenotyping data: bacterial canker resistance test of mapping pedigree 0504B, clonal test of mapping pedigree 0504B in nursery. This accession has genotyping data: Popyomics_Orleans https://urgi.versailles.inra.fr/faidare/germplasm?pui=https://doi.org/10.15454/0FZNAO Plantae Populus x generosa Specimen
Pilier Micro-organisme CIRM-CF http://139.124.42.231/~davnav/BRFM/search_strain2.php BRFM 902 BRFM 902 Pycnoporus sanguineus BRFM 902 GUY110 burnt wood, Macouria Polyporaceae Polyporales Basidiomycota http://139.124.42.231/~davnav/BRFM/fiche.php?BRFM_Number=902 Fungi Pycnoporus sanguineus Wood French Guiana 3.9988889 -53 French Guiana 3.9988889 -53
```csv
pillarName databaseSource portalURL identifier name description dataURL domain taxon materialType biotopeType countryOfOrigin originLatitude originLongitude countryOfCollect collectLatitude collectLongitude
Pilier Forêt Forest Tree GnpIS https://urgi.versailles.inra.fr/faidare/?germplasmLists=Forest%20BRC https://doi.org/10.15454/0FZNAO 661300375 661300375 is a Populus x generosa accession (number: 661300375, https://doi.org/10.15454/0FZNAO) maintained by the Forest BRC (managed by INRA) and held by INRA-ONF. It is a clone/clone of biological status interspecific cross/croisement interspécifique. This accession is also known as: 0054B165. This accession is part of collection(s): breeding_gispeuplier, mapping_pedigree_0504B. This accession has phenotyping data: bacterial canker resistance test of mapping pedigree 0504B, clonal test of mapping pedigree 0504B in nursery. This accession has genotyping data: Popyomics_Orleans https://urgi.versailles.inra.fr/faidare/germplasm?pui=https://doi.org/10.15454/0FZNAO Plantae Populus x generosa Specimen
Pilier Micro-organisme CIRM-CF http://139.124.42.231/~davnav/BRFM/search_strain2.php BRFM 902 BRFM 902 Pycnoporus sanguineus BRFM 902 GUY110 burnt wood, Macouria Polyporaceae Polyporales Basidiomycota http://139.124.42.231/~davnav/BRFM/fiche.php?BRFM_Number=902 Fungi Pycnoporus sanguineus Wood French Guiana 3.9988889 -53 French Guiana 3.9988889 -53
```
## JSON (JavaScript Object Notation)
The order of the fields does not matter. All entries should be aggregated into a single array per file.
```json example
```json
[
{
"pillarName": "Pilier Forêt",
......@@ -299,6 +299,7 @@ The order of the fields does not matter. All entries should be aggregated into a
# Data availability & update
You can generate one or several files containing your public data as long as each of them complies with the format defined above.
We will decide with you on the best way to exchange them on a regular basis: a simple web (or FTP) server is a good solution since it allows us to check if a new version of your files has been produced.
Once they are generated, you will have to provide a way for us to fetch them on a regular basis. We can help you decide of the best way to do this exchange. Using a simple web (or FTP) server is a good solution since it allows us to check if a new version of your files has been produced.
[&#8593;](#top)
# How to join the wheatIS and/or the Plant federations of searchable data?
If you want your information system to be referenced, you have to provide [TSV](#tsv-tabulation-separated-values) or [JSON](#json-javascript-object-notation) files with metadata only.
The metadata format must follow the indications below and we invite you to [contact us](mailto:urgi-contact@inra.fr?subject=%5BWheatIS%2FData%20Discovery%5D)
as soon as possible so that we can provide help and discuss the best way to go ahead.
If you want your information system to be referenced, you have to provide [TSV](#tsv-tabulation-separated-values) or [JSON](#json-javascript-object-notation) files with metadata only. The metadata format must follow the indications below and we invite you to [contact us](mailto:urgi-contact@inra.fr?subject=%5BWheatIS%2FData%20Discovery%5D) as soon as possible so that we can provide help and discuss the best way to go ahead.
Note that since the tool makes a backlink to your information system, we need a URL allowing researchers
to get detailed information about the indexed entry directly in your information system.
Note that since the tool makes a backlink to your information system, we need a URL allowing researchers to get detailed information about the indexed entry directly in your information system.
# Overview of the metadata associated to each searchable entry/document
- a short _[name](#name)_ identifying uniquely the entry, ie. `BTH_Le_Moulon_2000_SetA`
- an _[url](#url)_ linking back to the entry in your own web application, ie.
https://urgi.versailles.inra.fr/ephesis/ephesis/viewer.do#trialCard/trialId=56
- a _[description](#description)_, of the entry that contains all the relevant keywords
allowing to find your entry. All the terms of this field are used by the search tool to allow users to
find entries
- an _[entryType](#entrytype)_ describing the type of the entry, that could be any of the terms listed
in the dedicated section below
- a _[species](#species)_ field, containing the species related to the entry (zero, one or several,
but it is highly recommended to provide at least one)
- a _[node](#node)_, the name of your laboratory/institute, it should be the same for all the entities
you manage
- a _[database](#database)_, the name of the database from which the entry has been extracted. It
can differ from one entry to another if you handle several databases
- an _[url](#url)_ linking back to the entry in your own web application, ie. https://urgi.versailles.inra.fr/ephesis/ephesis/viewer.do#trialCard/trialId=56
- a _[description](#description)_, of the entry that contains all the relevant keywords allowing to find your entry. All the terms of this field are used by the search tool to allow users to find entries
- an _[entryType](#entrytype)_ describing the type of the entry, that could be any of the terms listed in the dedicated section below
- a _[species](#species)_ field, containing the species related to the entry (zero, one or several, but it is highly recommended to provide at least one)
- a _[node](#node)_, the name of your laboratory/institute, it should be the same for all the entities you manage
- a _[database](#database)_, the name of the database from which the entry has been extracted. It can differ from one entry to another if you handle several databases
## Detailed specifications for the metadata fields
### name
[&#8593;](#top)
The value of the `name` field must be unique in your own dataset and should be clear enough to help
scientists to identify at the first glance this entry among the other.
The value of the `name` field must be unique in your own dataset and should be clear enough to help scientists to identify at the first glance this entry among the other.
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
......@@ -41,9 +30,7 @@ scientists to identify at the first glance this entry among the other.
### entryType
[&#8593;](#top)
The `entryType` field is not constrained on its value (it is not implemented as an
enumeration), but it is highly recommended to use any of following values fitting the best with
your data :
The `entryType` field is not constrained on its value (it is not implemented as an enumeration), but it is highly recommended to use any of following values fitting the best with your data:
- Genome annotation
- Gene annotation
......@@ -67,10 +54,7 @@ your data :
- File
- Study
> **Note:** looking in the [data-discovery](https://urgi.versailles.inra.fr/data-discovery) portal at
the number of documents matching an entryType (see `Data type` filter on the left side) can help you
to choose the data type having the closest meaning. If you choose a different data type than those
already provided, be aware that this entry type will not appear in the list of possible filters.
> **Note:** looking in the [data-discovery](https://urgi.versailles.inra.fr/data-discovery) portal at the number of documents matching an entryType (see `Data type` filter on the left side) can help you to choose the data type having the closest meaning. If needed you still can chose a different data type than those already available.
| Status | Cardinality | Constraints |
......@@ -82,21 +66,13 @@ already provided, be aware that this entry type will not appear in the list of p
### description
[&#8593;](#top)
The `description` field is the most important for the discoverability of the data since it is the one
used to match terms from users.
The `description` field is the most important for the discoverability of the data since it is the one used to match terms from users.
It is up to you to provide the most relevant description allowing to match the entry, but keep in
mind that the more precise the description is, better the ranking in the search tool would be.
It is up to you to provide the most relevant description allowing to match the entry, but keep in mind that the more precise the description is, better the ranking in the search tool would be.
The search tool is based on Elasticsearch. It therefore relies on Apache Lucene indexes in which
the ranking is related to the term frequency/inverse document frequency (the used algorithm is
currently BM25). That means that an entry having a description with a searched term appearing several
times inside it but very rarely in all other documents will be likely returned with a high score.
You can get more information on [similarity in Elasticsearch](https://www.elastic.co/blog/found-similarity-in-elasticsearch).
The search tool is based on Elasticsearch. It therefore relies on Apache Lucene indexes in which the ranking is related to the term frequency/inverse document frequency (the used algorithm is currently BM25). That means that an entry having a description with a searched term appearing several times inside it but very rarely in all other documents will be likely returned with a high score. You can get more information on [similarity in Elasticsearch](https://www.elastic.co/blog/found-similarity-in-elasticsearch).
Also, be aware that we are adding all the other fields contents to the content of the description
when preparing the data for indexation so that the name of the entry, its species etc... can be
searched. It is therefore not necessary to add them explicitly in the description.
Also, be aware that we are adding all the other fields contents to the content of the description when preparing the data for indexation so that the name of the entry, its species etc... can be searched. It is therefore not necessary to add them explicitly in the description.
>>>
Note: be aware that all the data you provide in the file should be open access.
......@@ -111,8 +87,7 @@ Note: be aware that all the data you provide in the file should be open access.
### url
[&#8593;](#top)
The `url` must be a valid URL so that this backlinks to your information system display information
on the entry.
The `url` must be a valid URL so that this backlinks to your information system display information on the entry.
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
......@@ -123,17 +98,11 @@ on the entry.
### species
[&#8593;](#top)
`species` field is not mandatory, but it can be used to filter data. It can contain zero, one or more
value according to the type of data.
It is recommended to favor the binomial form without the Author abbreviatoin (ie `L.`), eg Vitis
vinifera, Quercus Robur, Triticum aestivum
`species` field is not mandatory, but it can be used to filter data. It can contain zero, one or more value according to the type of data.
It is recommended to favor the binomial form without the Author abbreviation (ie `L.`), _e.g._ _Vitis vinifera_, _Quercus robur_, _Triticum aestivum_.
>>>
Note: for WheatIS federation specifically, the species name must match one of the values below in order
to be available through the [WheatIS search tool](https://urgi.versailes.inra.fr/wheatis), otherwise the
entry will be filtered out. Note that if you have wheat and non wheat data, you don't have to provide
them twice in dedicated files, the loading process will manage to generate the relevant indexes for the
general plant search and the WheatIS search.
Note: for WheatIS federation specifically, the species name must match one of the values below in order to be available through the [WheatIS search tool](https://urgi.versailes.inra.fr/wheatis), otherwise the entry will be filtered out. Note that if you have wheat and non wheat data, you don't have to provide them twice in dedicated files, the loading process will manage to generate the relevant indexes for the general plant search and the WheatIS search.
- Aegilops*
- Hordeum*
......@@ -150,8 +119,7 @@ general plant search and the WheatIS search.
### database
[&#8593;](#top)
The name of the `database` from which the entry has been extracted. It
can differ from one entry to another if you handle several databases.
The name of the `database` from which the entry has been extracted. It can differ from one entry to another if you handle several databases.
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
......@@ -162,8 +130,7 @@ can differ from one entry to another if you handle several databases.
### node
[&#8593;](#top)
The `node` is the name of you laboratory/institute, it should be the same for all the entities
you manage, ie. INRA-URGI, EBI, IPK, USDA-ARS, CIMMYT...
The `node` is the name of you laboratory/institute, it should be the same for all the entities you manage, ie. INRA-URGI, EBI, IPK, USDA-ARS, CIMMYT...
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
......@@ -174,17 +141,14 @@ you manage, ie. INRA-URGI, EBI, IPK, USDA-ARS, CIMMYT...
# Formatting
How to format the data to send to us?
You can use either TSV or JSON format. The file(s) can be either sent to us or published in a web folder
from where it will be regularly updated (see [Data availability & update](#data-availability-update)
section). Below you find two kind of examples of what is expected with 2 entries:
You can use either TSV or JSON format. The file(s) can be either sent to us or published in a web folder from where it will be regularly updated (see [Data availability & update](#data-availability-update) section). Below you find two kind of examples of what is expected with 2 entries:
## TSV (Tabulation Separated Values)
[&#8593;](#top)
The order of the fields matters as in any TSV file.
Take care to remove any unnecessary tabulation or return line between fields in order to comply with
the expected format.
No double quotes are needed.
The order of the fields matters as in any TSV file. Take care to remove any tabulation or return line that can be present in the fields content in order to comply with the expected format.
No double quotes are needed.
The header is not needed, it is displayed here only for documentation purpose.
```csv
#name entryType node databaseName url species description
......@@ -227,8 +191,6 @@ The order of the fields does not matter. All entries should be aggregated into a
# Data availability & update
[&#8593;](#top)
You can generate one or several files containing your public data as long as each of them complies
with the format defined above.
You can generate one or several files containing your public data as long as each of them complies with the format defined above.
We will decide with you on the best way to exchange them on a regular basis: a simple web (or FTP)
server is a good solution since it allows us to check if a new version of your files has been produced.
Once they are generated, you will have to provide a way for us to fetch them on a regular basis. We can help you decide of the best way to do this exchange. Using a simple web (or FTP) server is a good solution since it allows us to check if a new version of your files has been produced.
\ No newline at end of file
......@@ -9,9 +9,9 @@ The logo is the property of INRA and you are not allowed to re-use it for your o
## Content
The portal maintained by INRA allows to find public data across a federation of databases. The licences associated to data are therefore defined the institutes in charge of them.
The portal maintained by INRA allows to find public data across a federation of databases. The licences associated to data are therefore defined by the institutes in charge of them.
Users are sole responsible for the searches they carry out, as well as for the interpretation and use they make of the results.
Users are sole responsible for the searches they carry out, as well as for the interpretation and for the use they make of the results.
The access to the web site can be interrupted at any moment and without prior warning in case of force majeure or if the editor decides to terminate its provision of service.
......@@ -31,16 +31,16 @@ The portal links to external web sites. INRA does not take responsibility of the
Technical data (date, hour, IP address of the computer of the visitor, pages viewed) are collected only for the statistical analysis of the usage of the portal.
These data are kept confidential and not transmitted to any other party. They are stored on INRA’s private servers for 5 years.
During visits on URGI web site and GnpIS portal, a cookie can be automatically installed on visitor’s web browsers to retrieve statistics on the pages that are visited and support improvements of the services provided by the web site.
During visits on URGI web site and its hosted applications, a cookie can be automatically installed on visitor’s web browsers to retrieve statistics on the pages that are visited and support improvements of the services provided by the web site.
Visitors can configure their web browsers in order to be informed of the setting of cookies and refuse them.
According to the European Regulation on the protection of personal data (EU Regulation 2016/679), you have the right to access, rectify, oppose and delete information about yourself.
If you wish to exercise this right and obtain information about yourself, please contact us :
- Land mail: URGI - INRA, Route de Saint-Cyr, 78026 Versailles Cedex
- Or via the [contact form](https://urgi.versailles.inra.fr/Contact-us)
If you wish to exercise this right and obtain information about yourself, please contact us:
- By [email](mailto:urgi-contact@inra.fr?subject=%5BData%20Discovery%5D%20GPDR%20request)
- Or via any other way available on our [contact form](https://urgi.versailles.inra.fr/Contact-us)
## Modifications
The editor might change the terms of use and user’s rights without prior warning.
Last update: 2019 June 6th
Last update: 2019 June 25th
0% Loading or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment