Commit fee14d68 authored by Raphaël Flores's avatar Raphaël Flores
Browse files

Create instructions for loading data into local Elasticsearch using crafted...

Create instructions for loading data into local Elasticsearch using crafted docker image. GNP-5999 GNP-6000.
parent 36ee82cc
data/
.git/
frontend/node_modules/
backend/out
frontend/.gradle
.gradle/
FROM alpine
LABEL Author="Raphaël FLORES <raphael.flores@inrae.fr>"
COPY scripts/createSuggestions.sh scripts/index.sh scripts/createIndexAndAliases4CI.sh scripts/harvestCI.sh scripts/to_bulk.jq /opt/scripts/
COPY scripts/filters/* /opt/scripts/filters/
# COPY dao settings
COPY backend/src/test/resources/fr/inra/urgi/datadiscovery/dao/rare/settings.json /opt/backend/src/test/resources/fr/inra/urgi/datadiscovery/dao/rare/settings.json
COPY backend/src/test/resources/fr/inra/urgi/datadiscovery/dao/wheatis/settings.json /opt/backend/src/test/resources/fr/inra/urgi/datadiscovery/dao/wheatis/settings.json
COPY backend/src/test/resources/fr/inra/urgi/datadiscovery/dao/data-discovery/settings.json /opt/backend/src/test/resources/fr/inra/urgi/datadiscovery/dao/data-discovery/settings.json
# COPY dao mappings
COPY /backend/src/main/resources/fr/inra/urgi/datadiscovery/domain/rare/RareGeneticResource.mapping.json /opt/backend/src/main/resources/fr/inra/urgi/datadiscovery/domain/rare/RareGeneticResource.mapping.json
COPY /backend/src/main/resources/fr/inra/urgi/datadiscovery/domain/wheatis/WheatisGeneticResource.mapping.json /opt/backend/src/main/resources/fr/inra/urgi/datadiscovery/domain/wheatis/WheatisGeneticResource.mapping.json
COPY /backend/src/main/resources/fr/inra/urgi/datadiscovery/domain/data-discovery/WheatisGeneticResource.mapping.json /opt/backend/src/main/resources/fr/inra/urgi/datadiscovery/domain/data-discovery/WheatisGeneticResource.mapping.json
# COPY suggestions settings
COPY backend/src/main/resources/fr/inra/urgi/datadiscovery/domain/suggestions.mapping.json /opt/backend/src/main/resources/fr/inra/urgi/datadiscovery/domain/suggestions.mapping.json
COPY backend/src/test/resources/fr/inra/urgi/datadiscovery/dao/settings-suggestions.json /opt/backend/src/test/resources/fr/inra/urgi/datadiscovery/dao/settings-suggestions.json
RUN apk add --update --no-cache bash curl jq parallel wget grep gzip sed date coreutils
RUN chmod +x /opt/scripts/index.sh
RUN mkdir ~/.parallel && touch ~/.parallel/will-cite
ENTRYPOINT ["/opt/scripts/index.sh"]
\ No newline at end of file
......@@ -183,8 +183,49 @@ by these tests. The documentation is generated in the folder `backend/build/asci
Before all, if you have cloned the repository without fetching the data (see [Data handling](#data-handling) section), take care to get it before running any indexing script.
### TL;DR
Run the dockerized script for indexing data in your local Elasticsearch that you are expected to have already launched with `docker-compose up`:
```sh
docker run -t --volume $(pwd)/data:/opt/data/ --volume /tmp/bulk:/tmp/bulk/ --network=container:elasticsearch registry.forgemia.inra.fr/urgi-is/docker-rare/data-discovery-loader:latest --help
```
Example for indexing RARe data:
```sh
docker run -t --volume $(pwd)/data:/opt/data/ --volume /tmp/bulk:/tmp/bulk/ --network=container:elasticsearch registry.forgemia.inra.fr/urgi-is/docker-rare/data-discovery-loader:latest -host elasticsearch -app rare -env dev
```
If you need to spread the load on several CPUs, duplicate the value of `host` argument to simulate several Elasticsearch nodes, ie. below to use 4 CPUs:
```sh
docker run -t --volume $(pwd)/data:/opt/data/ --volume /tmp/bulk:/tmp/bulk/ --network=container:elasticsearch registry.forgemia.inra.fr/urgi-is/docker-rare/data-discovery-loader:latest -host "elasticsearch elasticsearch elasticsearch elasticsearch" -app rare -env dev
```
Output logs should be available in directory `/tmp/bulk/rare-dev`.
### Portability
#### Docker
[TL;DR](#TLDR) section above expects to have built the docker image as follow:
```sh
# build the image
docker build -t registry.forgemia.inra.fr/urgi-is/docker-rare/data-discovery-loader:latest .
# Login before pushing the image
docker login registry.forgemia.inra.fr/urgi-is/docker-rare -u <your ForgeMIA username>
# push the built image
docker push registry.forgemia.inra.fr/urgi-is/docker-rare/data-discovery-loader:latest
```
That should ease the indexing of data without having to craft a dedicated environment, which is explained below.
#### UNIX/BSD
Feedback related to portability on MacOS and other GNU/Linux distro is really welcomed.
For MacOS, care to use latest GNU Parallel and Bash v4 versions, not the version provided by default via Brew.
......
......@@ -6,6 +6,7 @@ ORANGE='\033[0;33m'
BOLD='\033[1m'
RED_BOLD="${RED}${BOLD}"
NC='\033[0m' # No format
export SHELL=/bin/bash
help() {
cat <<EOF
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment