Commit 50bbe132 authored by Célia Michotey's avatar Célia Michotey
Browse files

Create redame to join RARe federation. GNP-5442.

parent d37c1b26
# How to join the RARe federation of searchable data?
The purpose of this web portal is to facilitate the discoverability of genetic resources
The purpose of this web portal is to facilitate the discoverability of public genetic resources
managed by different laboratories accross the world.
Initiated within transPLANT (EC FP7, contract number `283496`; http://www.transplantdb.eu/), WheatIS
(www.wheatis.org) then Elixir-fr/IFB (ANR, contract number `11-INBS-0013`) projects & collaborations, we are
now able to index and make findable data from any species of any kind of type.
Initiated within **[transPLANT](http://transplantdb.eu/)** (EC FP7, contract number `283496`), **[WheatIS](www.wheatis.org)**
(expert working group of the [Wheat Initiative](https://www.wheatinitiative.org/)) then
**[Elixir-fr/IFB](https://www.france-bioinformatique.fr/en/elixir-fr)** (ANR, contract number `11-INBS-0013`)
projects & collaborations, we are now able to index and make findable data from any species of any kind of data.
If you want your information system to be referenced, you have to provide CSV or JSON files with metadata only. The metadata format must folow the indications below and we invite you to [contact us](mailto:urgi-contact@inra.fr)
as soon as possible so that we can provide help and discuss the best way to go ahead.
If you want your information system to be referenced, you have to provide [TSV](#tsv-tabulation-separated-values) or
[JSON](#json-javascript-object-notation) files with metadata only.
The metadata format must folow the indications below (see [Data Specifications](#data-specifications)) and we invite you to
[contact us](mailto:urgi-contact@inra.fr?subject=%5BRARe%2FData%20Discovery%5D) as soon as possible so that we can provide help
and discuss the best way to go ahead.
Note that since the tool makes a backlink to your information system, we need a URL allowing researchers
to get more detailed information about the indexed entry directly in your information system.
to get more detailed information about the indexed data directly in your information system.
# Specifications for each findable entry/document:
# Data Specifications
TODO: fill brief version of specifications
Be aware that all the data you provide in the file should be open access.
## List of fields
An entry/document must be created for each searchable data.
Each entry/document is described with the following fields:
TODO: fill complete list of fields
## pillarName
## Formatting
Name of your pillar.
It should be the same for all the entities you manage.
The value is constrained. You must use one of the following values:
- Pilier Animal
- Pilier Environnement
- Pilier Forêt
- Pilier Micro-organisme
- Pilier Plante
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Mandatory | 1 | One of the provided list |
## databaseSource
Name of the database from which the entry has been extracted.
It can differ from one entry to another if you handle several databases.
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Mandatory | 1 | None |
## portalURL
URL to access to the website of the `databaseSource` (database from which the entry was extracted).
It must be a valid URL, so that this backlinks to your own information system.
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Mandatory | 1 | None |
## identifier
Identifier of the entity.
It is only used to uniquely identify the entry among all data, it is not displayed on the search tool.
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Mandatory | 1 | Unique |
## name
Name of the entry.
The value must be unique in your own dataset and should be clear enough to help
scientists to identify at the first glance this entry among other.
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Mandatory | 1 | Unique |
## description
Description of the entry. It must contains all the relevant keywords allowing to find your entry.
It is the most important field for the discoverability of the data since it's the one used to match terms searched by users.
It is up to you to provide the most relevant description allowing to match the entry, but keep in
mind that the more precise the description is, better the ranking in the search tool would be.
Since the search tool is based on Elasticsearch, it relies on Apache Lucene indexes and the ranking
will be related to the term frequency/inverse document frequency (the used algorithm is currently
BM25). That means an entry having a description with a searched term appearing several times inside
it but very rarely in all other documents will be likely returned with a high score. You can get
more info on [similarity in Elasticsearch](https://www.elastic.co/blog/found-similarity-in-elasticsearch).
Also, be aware that we are adding the content of all other fields in the description when integrating
the data, so the name of the entry, its species etc... can also be searched. It is therefore not necessary
to add them explicitly in the description.
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Mandatory | 1 | None |
## dataURL
URL to access to the entity information on the website of the `databaseSource` (database from which the entry was extracted).
It must be a valid URL, so that this backlinks to the entry in your own information system.
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Mandatory | 1 | None |
## domain
Taxonomic domain of the entity.
The value is constrained. You must use one of the following values, the one fitting the best your data:
- Animalia
- Archaea
- Bacteria
- Chromista
- Fungi
- Plantae
- Protozoa
- Environment sampling
- Consortium
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Mandatory | 1 | One of the provided list |
## taxon
Genus or species (in the binomial form) of the entity, without the author abbreviation
(eg Populus, Vitis vinifera).
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Mandatory | 1 | None |
## materialType
Type of material available for this entry.
The value is constrained. You must use one of the following values, the one fitting the best your data:
- Biological liquid
- Budstick/Cutting
- Culture cell/strain
- DNA
- Embryo
- Environmental sample
- Genome library
- Pollen
- Seed
- Specimen
- Tissue sample
- Transcriptome library
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | One of the provided list |
## biotopeType
Biotope or habitat where the entity mainly lives.
The value is constrained. You must use one of the following values, the one fitting the best your data:
- Animal
- Beverage
- Environment
- Fermented food
- Food
- Fruit
- Fungi
- Hospital
- Human
- Industry
- Laboratory
- Microcosm
- Plant
- Soil
- Water
- Wood
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | One of the provided list |
## countryOfOrigin
Country from which the entity originally comes from.
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | None |
## originLatitude
Latitude of the country from which the entity originally comes from.
It must be in a decimal degree format (eg 3.9988889, -53).
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | None |
## originLongitude
Longitude of the country from which the entity originally comes from.
It must be in a decimal degree format (eg 3.9988889, -53).
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | None |
## countryOfCollect
Country from which the entity was collected.
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | None |
## collectLatitude
Latitude of the country from which the entity was collected.
It must be in a decimal degree format (eg 3.9988889, -53).
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | None |
## collectLongitude
Longitude of the country from which the entity was collected.
It must be in a decimal degree format (eg 3.9988889, -53).
| Status | Cardinality | Constraints |
| :---: | :---: | :---: |
| Optional | 1 | None |
# Data formatting
How to format the data to send to us?
How to format the data to send to us?
You can use either TSV or JSON format. The file(s) can be either sent to us or published in a web folder
from where it will be regularly updated (see [Data availability & update](#data-availability-update)
section). Below you find two kind of examples of what is expected with 2 entries:
### TSV (Tabulation Separated Values)
## TSV (Tabulation Separated Values)
The order of the field matters as in any CSV/TSV file. Take care to remove any tabulation and return
The order of the field matters as in any TSV file. Take care to remove any tabulation and return
line from each field in order to comply with the expected format.
No double quotes is needed neither.
No double quotes is needed neither.
The header is not needed, displayed here only for documentation purpose.
```csv
TODO update with RARe CSV format
pillarName databaseSource portalURL identifier name description dataURL domain taxon materialType biotopeType countryOfOrigin originLatitude originLongitude countryOfCollect collectLatitude collectLongitude
Pilier Forêt Forest Tree GnpIS https://urgi.versailles.inra.fr/faidare/?germplasmLists=Forest%20BRC https://doi.org/10.15454/0FZNAO 661300375 661300375 is a Populus x generosa accession (number: 661300375, https://doi.org/10.15454/0FZNAO) maintained by the Forest BRC (managed by INRA) and held by INRA-ONF. It is a clone/clone of biological status interspecific cross/croisement interspécifique. This accession is also known as: 0054B165. This accession is part of collection(s): breeding_gispeuplier, mapping_pedigree_0504B. This accession has phenotyping data: bacterial canker resistance test of mapping pedigree 0504B, clonal test of mapping pedigree 0504B in nursery. This accession has genotyping data: Popyomics_Orleans https://urgi.versailles.inra.fr/faidare/germplasm?pui=https://doi.org/10.15454/0FZNAO Plantae Populus x generosa Specimen
Pilier Micro-organisme CIRM-CF http://139.124.42.226/~davnav/BRFM/search_strain2.php BRFM 902 BRFM 902 Pycnoporus sanguineus BRFM 902 GUY110 burnt wood, Macouria Polyporaceae Polyporales Basidiomycota http://139.124.42.226/~davnav/BRFM/fiche.php?BRFM_Number=902 Fungi Pycnoporus sanguineus Wood French Guiana 3.9988889 -53 French Guiana 3.9988889 -53
```
### JSON (JavaScript Object Notation)
## JSON (JavaScript Object Notation)
The order of the fields does not matter. All entries should be aggregated into a single array per file.
```json
[
{
"TODO": "update with RARe JSON format"
"pillarName": "Pilier Forêt",
"databaseSource": "Forest Tree GnpIS",
"portalURL": "https://urgi.versailles.inra.fr/gnpis-core/#form/germplasmLists=Forest+BRC",
"identifier": "https://doi.org/10.15454/0FZNAO",
"name": 661300375,
"description": "661300375 is a Populus x generosa accession (number: 661300375, https://doi.org/10.15454/0FZNAO) maintained by the Forest BRC (managed by INRA) and held by INRA-ONF. It is a clone/clone of biological status interspecific cross/croisement interspécifique. This accession is also known as: 0054B165. Its taxon is also known as: P. deltoides x P. trichocarpa, Populus deltoides x Populus trichocarpa, Populus trichocarpa x Populus deltoides, Populus x generosa A. Henry, Populus x interamericana, P. trichocarpa x P. deltoides, P. xgenerosa Henry, P xinteramericana. This accession is part of collection(s): breeding_gispeuplier, mapping_pedigree_0504B. This accession has phenotyping data: bacterial canker resistance test of mapping pedigree 0504B - QTL mapping of bacterial canker resistance, clonal test of mapping pedigree 0504B in nursery - QTL mapping of a list of phenotypic traits. This accession has genotyping data: Popyomics_Orleans",
"dataURL": "https://urgi.versailles.inra.fr/gnpis-core/#accessionCard/pui=https://doi.org/10.15454/0FZNAO",
"domain": "Plantae",
"taxon": "Populus x generosa",
"family": "Salicaceae",
"genus": "Populus",
"species": "Populus x generosa",
"materialType": Specimen,
"biotopeType": null,
"countryOfOrigin": null,
"latitudeOfOrigin": null,
"longitudeOfOrigin": null
"countryOfCollect": null,
"latitudeOfCollect": null,
"longitudeOfCollect": null
},
{
"pillarName": "Pilier Micro-organisme",
"databaseSource": "CIRM-CF",
"portalURL": "http://139.124.42.226/~davnav/BRFM/search_strain2.php",
"identifier": "BRFM 902",
"name": "BRFM 902",
"description": "Pycnoporus sanguineus BRFM 902 GUY110 burnt wood, Macouria Polyporaceae Polyporales Basidiomycota",
"dataURL": "http://139.124.42.226/~davnav/BRFM/fiche.php?BRFM_Number=902",
"domain": "Fungi",
"taxon": "Pycnoporus sanguineus",
"family": null,
"genus": null,
"species": null,
"materialType": null,
"biotopeType": "Wood",
"countryOfOrigin": "French Guiana",
"latitudeOfOrigin": "3.9988889",
"longitudeOfOrigin": "-53",
"countryOfCollect": "French Guiana",
"latitudeOfCollect": "3.9988889",
"longitudeOfCollect": "-53"
}
]
```
## Data availability & update
# Data availability & update
You can generate one or several files containing your data as long as each of them complies
You can generate one or several files containing your public data as long as each of them complies
with the format defined above.
Once they are generated, you will have to provide a way for us to fetch them on a regular basis: a
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment