README.md 15 KB
Newer Older
1
2
# Rare project - Data discovery

Célia Michotey's avatar
Célia Michotey committed
3
[[_TOC_]]
4

5
6
## Contribute

7
If you want to install the program locally, it's cool, just keep reading at [Setup section and beyond](#setup).
8
9
10

### Data

11
You might probably want to know how to contribute to the federation of data. That's great, let's have a look at the [WheatIS/Plant guide](./HOW-TO-JOIN-WHEATIS-AND-PLANT-FEDERATIONS.md) or the [RARe guide](./HOW-TO-JOIN-RARe-FEDERATION.md) to know how to.
12

13
14
15
### Code

If you do want to contribute to code, that's great also, have a look at [CONTRIBUTING.md](./CONTRIBUTING.MD).
16

17
18
## Setup

19
20
21
22
23
24
25
26
27
28
29
30
31
32
### Requirements

The application itself is running under a Java OpenJDK 8+: <https://openjdk.java.net/install/>

For getting the data, you need to install Git LFS: <https://git-lfs.github.com/>

The indexing process depends on the following tools, you need to have them installed and available in your `PATH` variable:

- JQ 1.6+: <https://stedolan.github.io/jq/>
- GNU Parallel (recent enough version): <https://www.gnu.org/software/parallel/>
- GNU coreutils (sed, date...): <https://www.gnu.org/software/coreutils/>
- GNU GZIP: <https://git.savannah.gnu.org/cgit/gzip.git>
- GNU Bash v4+: <https://www.gnu.org/software/bash/>

33

34
35
### Data handling

36
At the moment, all data is located next to the code in the `data` directory. If you want to have a look at the code only, you can ignore this directory at git clone step by setting the variable `GIT_LFS_SKIP_SMUDGE=1`, ie.:
37
38
39
40
41
42
43
44
45
46
47
48

```sh
GIT_LFS_SKIP_SMUDGE=1 git clone git@forgemia.inra.fr:urgi-is/data-discovery.git
```

After clone done, if you want to fetch some of the data (for instance for RARe only), let's run:

```sh
$ git lfs pull -I data/rare/
Downloading LFS objects: 100% (16/16), 8.8 MB | 0 B/s
```

49
50
51
52
53
54
Git might request you to enable additional parameters, which is acceptable:

```sh
git config lfs.https://forgemia.inra.fr/urgi-is/data-discovery.git/info/lfs.locksverify true
```

55
56
### Backend

Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
57
The project uses Spring Boot for the backend.
58
59
60

You need to install:

Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
61
- a recent enough JDK11
62

63
64
65
The docker images need quite a bit of resources,
so make sure you have at least 4g of RAM configured (Docker Desktop / Resources / Memory).

66
The application expects to connect on an Elasticsearch instance running on `http://127.0.0.1:9200`.
Exbrayat Cédric's avatar
Exbrayat Cédric committed
67
68
To have such an instance, simply run:

Raphaël Flores's avatar
Raphaël Flores committed
69
```sh
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
70
docker compose up
Raphaël Flores's avatar
Raphaël Flores committed
71
```
Exbrayat Cédric's avatar
Exbrayat Cédric committed
72

Raphaël Flores's avatar
Raphaël Flores committed
73
And this will start Elasticsearch and a Kibana instance (allowing to explore the data on <http://localhost:5601>).
Exbrayat Cédric's avatar
Exbrayat Cédric committed
74

75
76
77
Then at the root of the application, run `./gradlew build` to download the dependencies.
Then run `./gradlew bootRun` to start the app.

78
You can stop the Elasticsearch and Kibana instances by running:
Exbrayat Cédric's avatar
Exbrayat Cédric committed
79

Raphaël Flores's avatar
Raphaël Flores committed
80
```sh
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
81
docker compose stop
Raphaël Flores's avatar
Raphaël Flores committed
82
```
83

84
or run the following command to also remove the stopped containers as well as any networks that were created:
85

Raphaël Flores's avatar
Raphaël Flores committed
86
```sh
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
87
docker compose down
Raphaël Flores's avatar
Raphaël Flores committed
88
```
89

90
91
### Frontend

Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
92
The project uses Angular for the frontend, with the Angular CLI.
93
94
95

You need to install:

Raphaël Flores's avatar
Raphaël Flores committed
96
- a recent enough NodeJS (ie. v12 LTS) is required for Angular 8.
97
98
99
100
101
- Yarn as a package manager (see [here to install](https://yarnpkg.com/en/docs/install))

Then in the `frontend` directory, run `yarn` to download the dependencies.
Then run `yarn start` to start the app, using the proxy conf to reroute calls to `/api` to the backend.

Raphaël Flores's avatar
Raphaël Flores committed
102
The application will be available on:
103
104

- <http://localhost:4000/rare-dev> for RARe (runs with: `yarn start:rare` or simply `yarn start`)
Raphaël Flores's avatar
Raphaël Flores committed
105
- <http://localhost:4000/brc4env-dev> for RARe with basket (runs with: `yarn start:brc4env`)
106
- <http://localhost:4100/wheatis-dev> for WheatIS (runs with: `yarn start:wheatis`)
107
- <http://localhost:4200/faidare-dev> for Faidare (runs with: `yarn start:faidare`)
108
109

See [./frontend/package.json (scripts section)](./frontend/package.json) for other yarn commands.
110
111
112
113
114

## Build

To build the app, just run:

Raphaël Flores's avatar
Raphaël Flores committed
115
116
117
```sh
./gradlew assemble
```
118
119
120

or

Raphaël Flores's avatar
Raphaël Flores committed
121
122
123
```sh
./gradlew assemble -Papp=wheatis
```
124

125
or
126

Raphaël Flores's avatar
Raphaël Flores committed
127
```sh
128
./gradlew assemble -Papp=brc4env
Raphaël Flores's avatar
Raphaël Flores committed
129
```
130

131
This will build a standalone jar at `backend/build/libs/`, that you can run with either:
132

Raphaël Flores's avatar
Raphaël Flores committed
133
134
```sh
java -jar backend/build/libs/rare.jar
135
java -jar backend/build/libs/brc4env.jar
Raphaël Flores's avatar
Raphaël Flores committed
136
java -jar backend/build/libs/wheatis.jar
137
java -jar backend/build/libs/faidare.jar
Raphaël Flores's avatar
Raphaël Flores committed
138
```
Raphaël Flores's avatar
Raphaël Flores committed
139

140
And the full app running on:
141

142
143
- <http://localhost:8080/rare-dev>
- <http://localhost:8180/wheatis-dev>
144
- <http://localhost:8280/faidare-dev>
145
- <http://localhost:8580/brc4env-dev>
146
147
148
149
150

## CI

The `.gitlab-ci.yml` file describes how Gitlab is running the CI jobs.

151
152
153
It uses a base docker image named `urgi/docker-browsers`
available on [DockerHub](https://hub.docker.com/r/urgi/docker-browsers/)
and [INRA-MIA Gitlab](https://forgemia.inra.fr/urgi-is/docker-rare).
Raphaël Flores's avatar
Raphaël Flores committed
154
The image is based on `openjdk:8` and adds all stuff needed to run the tests
155
(ie. a Chrome binary with a headless Chrome in `--no-sandbox` mode).
156
157
158
159
160
161

We install `node` and `yarn` in `/tmp` (this is not the case for local builds)
to avoid symbolic links issues on Docker.

You can approximate what runs on CI by executing:

Raphaël Flores's avatar
Raphaël Flores committed
162
163
164
```sh
docker run --rm -v "$PWD":/home/rare -w /home/rare urgi/docker-browsers ./gradlew build
```
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
165

166
167
Or also run a gitlab-runner as Gitlab-CI would do (minus the environment variables and caching system):

Raphaël Flores's avatar
Raphaël Flores committed
168
169
170
```sh
gitlab-runner exec docker test
```
171
172
173
174
175

## Documentation

An API documentation describing most of the webservices can be generated using the
build task `asciidoctor`, which executes tests and generates documentation based on snippets generated
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
176
by these tests. The documentation is generated in the folder `backend/build/docs/asciidoc/index.html`
177

Raphaël Flores's avatar
Raphaël Flores committed
178
179
180
```sh
./gradlew asciidoctor
```
181

Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
182
183
## Harvest

184
Before all, if you have cloned the repository without fetching the data (see [Data handling](#data-handling) section), take care to get it before running any indexing script.
185

186
187
### TL;DR

188
Data (from master branch) indexing to your local Elasticsearch is done using the following command. Note that your local Elasticsearch instance should be already runing using `docker-compose up`:
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205

```sh
docker run -t --volume $(pwd)/data:/opt/data/ --volume /tmp/bulk:/tmp/bulk/ --network=container:elasticsearch registry.forgemia.inra.fr/urgi-is/docker-rare/data-discovery-loader:latest --help
```

Example for indexing RARe data:

```sh
docker run -t --volume $(pwd)/data:/opt/data/ --volume /tmp/bulk:/tmp/bulk/ --network=container:elasticsearch registry.forgemia.inra.fr/urgi-is/docker-rare/data-discovery-loader:latest -host elasticsearch -app rare -env dev
```

If you need to spread the load on several CPUs, duplicate the value of `host` argument to simulate several Elasticsearch nodes, ie. below to use 4 CPUs:

```sh
docker run -t --volume $(pwd)/data:/opt/data/ --volume /tmp/bulk:/tmp/bulk/ --network=container:elasticsearch registry.forgemia.inra.fr/urgi-is/docker-rare/data-discovery-loader:latest -host "elasticsearch elasticsearch elasticsearch elasticsearch" -app rare -env dev
```

206
207
208
209
210
211
212
213
Take care to use your branch's `CI_COMMIT_REF_SLUG` instead of `latest` docker tag if you have modified the indexing scripts, Elasticsearch mappings or settings, see <https://docs.gitlab.com/ee/ci/variables/predefined_variables.html>. Example for branch `story/faidare-fusion`, use docker tag `story-faidare-fusion`:

```sh
docker run -t --volume $(pwd)/data:/opt/data/ --volume /tmp/bulk:/tmp/bulk/ --network=container:elasticsearch registry.forgemia.inra.fr/urgi-is/docker-rare/data-discovery-loader:story-faidare-fusion -host "elasticsearch elasticsearch elasticsearch elasticsearch" -app rare -env dev
```

If you need to test the docker loader with your local changes, look at job named `build-loader-docker-image` in the `.gitlab-ci.yml` at the root of the project to see how to build the image with your custom docker tag.

214
215
Output logs should be available in directory `/tmp/bulk/rare-dev`.

216
217
### Portability

218
219
#### Docker

Raphaël Flores's avatar
Raphaël Flores committed
220
[TL;DR](#TLDR) section above expects to have an available docker image on the forgemia docker registry. You can update or push such an image using the following:
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236

```sh
# build the image
docker build -t registry.forgemia.inra.fr/urgi-is/docker-rare/data-discovery-loader:latest .

# Login before pushing the image
docker login registry.forgemia.inra.fr/urgi-is/docker-rare -u <your ForgeMIA username>

# push the built image
docker push registry.forgemia.inra.fr/urgi-is/docker-rare/data-discovery-loader:latest
```

That should ease the indexing of data without having to craft a dedicated environment, which is explained below.

#### UNIX/BSD

237
238
239
Feedback related to portability on MacOS and other GNU/Linux distro is really welcomed.

For MacOS, care to use latest GNU Parallel and Bash v4 versions, not the version provided by default via Brew.
240
241
Don't use zsh!

242
243
244
245
246
Install the following packages to be able to run the scripts:

```sh
brew install gnu-sed coreutils parallel
```
247

248
Harvesting (i.e. importing JSON documents into Elasticsearch) consists in creating the necessary index and aliases and Elasticsearch templates.
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
249

Raphaël Flores's avatar
Raphaël Flores committed
250
To create the index and its aliases execute the script below for local dev environment:
251

Raphaël Flores's avatar
Raphaël Flores committed
252
```sh
Raphaël Flores's avatar
Raphaël Flores committed
253
./scripts/index.sh -app rare|brc4env|wheatis|data-discovery --local
Raphaël Flores's avatar
Raphaël Flores committed
254
```
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
255

256
The `-app` parameter will trigger a harvest of the resources stored in the Git LFS subdirectories `data/rare` and `data/faidare` filtered or not (`wheatis` and `brc4env` rely on `faidare` and `rare` data respectively).
Raphaël Flores's avatar
Raphaël Flores committed
257

258
259
## Indices and aliases

260
The application uses several physical indices:
261

262
- one to store physical resources, containing the main content
263
- one to store suggestions, used for the search type-ahead feature only
Raphaël Flores's avatar
Raphaël Flores committed
264
265

Both indices must be created explicitly before using the application. If not, requests to the web services will return errors.
266

267
Each index and alias below refers to `rare` application in `dev` environment, the equivalent shall be created for `wheatis` and `data-discovery` app in `dev` environment as same as in `beta` or `staging` or `prod` environments. For brevity, only `rare-dev` is explained here.
Raphaël Flores's avatar
Raphaël Flores committed
268
269
{: .alert .alert-info}

270
271
272
273
274
275
276
277
The application doesn't use the physical resources index directly. Instead, it uses two aliases, that must be created before using the application:

- `rare-dev-resource-alias` is the alias used by the application to store and search for documents
- `rare-dev-suggestions-alias` is the alias used by the application to store and search for suggestions, only used for completion service.

In normal operations, these two aliases should refer to physical indices having a timestamp such as `rare-dev-tmstp1579877133-suggestions` and `rare-dev-tmstp1579877133-resource-index`. Those timestamps allow for reindexing data without breaking another runnning application having another timestamp. The alias switch being done atomicly, we always see data in the web interface.

Using two aliases is useful when deleting obsolete documents. This is actually done by removing everything and then harvesting the new JSON files again, to re-populate the index from scratch.
278

279
280
## Spring Cloud config

281
On bootstrap, the application will try to connect to a remote Spring Cloud config server to fetch its configuration. The details of this remote server are filled in the `application.yml` file. By default, it tries to connect to the local server on <http://localhost:8888> but it can of course be changed, or even configured via the `SPRING_CONFIG_URI` environment variable.
282

283
It will try to fetch the configuration for the application name specified in the profile-specific `spring.cloud.config.name` property.
284
If such a configuration is not found, it will then fallback to the local `application.yml` properties.
285

286
287
288
To avoid running the Spring Cloud config server every time when developing the application,
all the properties are still available in `application.yml` even if they are configured on the remote Spring Cloud server as well.

289
If you want to use the Spring Cloud config app locally, see <https://forgemia.inra.fr/urgi-is/data-discovery-config>
290

291
The configuration is currently only read on startup, meaning the application has to be reboot if the configuration is changed on the Spring Cloud server. For a dynamic reload without restarting the application, see <http://cloud.spring.io/spring-cloud-static/Finchley.SR1/single/spring-cloud.html#refresh-scope>
292
to check what has to be changed.
293

294
In case of testing configuration from the config server, one may use a dedicated branch on `data-discovery-config` project and append the `--spring.cloud.config.label=<branch name to test>` parameter when starting the application's executable jar, or use the corresponding Spring [env variable](https://docs.spring.io/spring-boot/docs/current/reference/html/spring-boot-features.html#boot-features-external-config-relaxed-binding-from-environment-variables) (_ie._ `SPRING_CLOUD_CONFIG_LABEL`). More info on [how to pass a parameter to a Spring Boot app](https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html#boot-features-external-config).
295

296
297
## Building other apps

298
By default, the built application is RARe (without basket, i.e. without the possibility to add accessions to a basket and create an accession order on the rare-basket application). But this project actually allows building other applications (RARe with basket and WheatIS, for the moment, but more could come).
299
300
301
302

To build a different app, specify an `app` property when building. For example, to assemble
the WheatIS app, run the following command

Raphaël Flores's avatar
Raphaël Flores committed
303
304
305
```sh
./gradlew assemble -Papp=wheatis
```
306

307
308
You can also run the backend WheatIS application using

Raphaël Flores's avatar
Raphaël Flores committed
309
310
311
```sh
./gradlew bootRun -Papp=wheatis
```
312

313
314
315
To assemble the RARe app with support for adding accessions to a basket, run the following command

```sh
316
./gradlew assemble -Papp=brc4env
317
318
319
320
321
```

You can also run the backend RARe application with basket support using

```sh
322
./gradlew bootRun -Papp=brc4env
323
324
```

325
326
Adding this property has the following consequences:

327
328
- the generated jar file (in `backend/build/libs`) is named `wheatis.jar` (resp. `brc4env.jar` instead of `rare.jar`;
- the Spring active profile is `wheatis-app` (resp. `brc4env-app`) instead of `rare-app`;
329
- the frontend application built and embedded inside the jar file is the WheatIS frontend application (resp. the RARe application with basket support) instead of the RARe frontend application, i.e. the frontend command `yarn build:wheatis` (resp. `yarn build:brc4env`) is executed instead of the command `yarn:rare`.
330

331
Since the active Spring profile is different, all the properties specific to this profile
332
are applied. In particular:
333
334
335

- the context path of the application is `/wheatis-dev` instead of `/rare-dev`;
- the Elasticsearch prefix used for the index aliases is different.
336

Raphaël Flores's avatar
Raphaël Flores committed
337
See the `backend/src/main/resources/application.yml` file for details.
338
339

You can adapt the elasticsearch index used with the following parameter
340

Raphaël Flores's avatar
Raphaël Flores committed
341
```sh
342
java -jar backend/build/libs/faidare.jar --data-discovery.elasticsearch-prefix="faidare-staging-"
Raphaël Flores's avatar
Raphaël Flores committed
343
```
344

345
346
347
For debuging:

```sh
348
java -jar backend/build/libs/faidare.jar --debug
349
350
```

351
352
353
## Configuration

The RARe and RARe with basket applications can be configured to apply an implicit filtering on the searches,
354
aggregations, and pillar list. There is currently only one implicit filter that can be added, which is a filter on the pillar name.
355
356
357
358
359
360
361
362
363
364

To activate it, add the following YAML configuration under the appropriate profile:

```yaml
rare:
  implicit-terms:
    PILLAR:
      - Pilier Forêt
      - Pilier Micro-organisme
```