README.md 13.1 KB
Newer Older
1
2
# Rare project - Data discovery

3
4
5
6
7
- [Rare project - Data discovery](#rare-project---data-discovery)
  - [Contribute](#contribute)
    - [Data](#data)
    - [Code](#code)
  - [Setup](#setup)
8
    - [Requirements](#requirements)
9
    - [Data handling](#data-handling)
10
11
12
13
14
15
    - [Backend](#backend)
    - [Frontend](#frontend)
  - [Build](#build)
  - [CI](#ci)
  - [Documentation](#documentation)
  - [Harvest](#harvest)
16
    - [Portability](#portability)
17
18
19
  - [Indices and aliases](#indices-and-aliases)
  - [Spring Cloud config](#spring-cloud-config)
  - [Building other apps](#building-other-apps)
20
  - [Configuration](#configuration)
21

22
23
## Contribute

24
If you want to install the program locally, it's cool, just keep reading at [Setup section and beyond](#setup).
25
26
27

### Data

28
You might probably want to know how to contribute to the federation of data. That's great, let's have a look at the [WheatIS/Plant guide](./HOW-TO-JOIN-WHEATIS-AND-PLANT-FEDERATIONS.md) or the [RARe guide](./HOW-TO-JOIN-RARe-FEDERATION.md) to know how to.
29

30
31
32
### Code

If you do want to contribute to code, that's great also, have a look at [CONTRIBUTING.md](./CONTRIBUTING.MD).
33

34
35
## Setup

36
37
38
39
40
41
42
43
44
45
46
47
48
49
### Requirements

The application itself is running under a Java OpenJDK 8+: <https://openjdk.java.net/install/>

For getting the data, you need to install Git LFS: <https://git-lfs.github.com/>

The indexing process depends on the following tools, you need to have them installed and available in your `PATH` variable:

- JQ 1.6+: <https://stedolan.github.io/jq/>
- GNU Parallel (recent enough version): <https://www.gnu.org/software/parallel/>
- GNU coreutils (sed, date...): <https://www.gnu.org/software/coreutils/>
- GNU GZIP: <https://git.savannah.gnu.org/cgit/gzip.git>
- GNU Bash v4+: <https://www.gnu.org/software/bash/>

50

51
52
### Data handling

53
At the moment, all data is located next to the code in the `data` directory. If you want to have a look at the code only, you can ignore this directory at git clone step by setting the variable `GIT_LFS_SKIP_SMUDGE=1`, ie.:
54
55
56
57
58
59
60
61
62
63
64
65

```sh
GIT_LFS_SKIP_SMUDGE=1 git clone git@forgemia.inra.fr:urgi-is/data-discovery.git
```

After clone done, if you want to fetch some of the data (for instance for RARe only), let's run:

```sh
$ git lfs pull -I data/rare/
Downloading LFS objects: 100% (16/16), 8.8 MB | 0 B/s
```

66
67
68
69
70
71
Git might request you to enable additional parameters, which is acceptable:

```sh
git config lfs.https://forgemia.inra.fr/urgi-is/data-discovery.git/info/lfs.locksverify true
```

72
73
### Backend

74
The project uses Spring (5.x) for the backend, with Spring Boot.
75
76
77
78
79

You need to install:

- a recent enough JDK8

80
81
82
The docker images need quite a bit of resources,
so make sure you have at least 4g of RAM configured (Docker Desktop / Resources / Memory).

83
The application expects to connect on an Elasticsearch instance running on `http://127.0.0.1:9200`.
Exbrayat Cédric's avatar
Exbrayat Cédric committed
84
85
To have such an instance, simply run:

Raphaël Flores's avatar
Raphaël Flores committed
86
87
88
```sh
docker-compose up
```
Exbrayat Cédric's avatar
Exbrayat Cédric committed
89

Raphaël Flores's avatar
Raphaël Flores committed
90
And this will start Elasticsearch and a Kibana instance (allowing to explore the data on <http://localhost:5601>).
Exbrayat Cédric's avatar
Exbrayat Cédric committed
91

92
93
94
Then at the root of the application, run `./gradlew build` to download the dependencies.
Then run `./gradlew bootRun` to start the app.

95
You can stop the Elasticsearch and Kibana instances by running:
Exbrayat Cédric's avatar
Exbrayat Cédric committed
96

Raphaël Flores's avatar
Raphaël Flores committed
97
98
99
```sh
docker-compose stop
```
100

101
or run the following command to also remove the stopped containers as well as any networks that were created:
102

Raphaël Flores's avatar
Raphaël Flores committed
103
104
105
```sh
docker-compose down
```
106

107
108
### Frontend

109
The project uses Angular (8.x) for the frontend, with the Angular CLI.
110
111
112

You need to install:

Raphaël Flores's avatar
Raphaël Flores committed
113
- a recent enough NodeJS (ie. v12 LTS) is required for Angular 8.
114
115
116
117
118
- Yarn as a package manager (see [here to install](https://yarnpkg.com/en/docs/install))

Then in the `frontend` directory, run `yarn` to download the dependencies.
Then run `yarn start` to start the app, using the proxy conf to reroute calls to `/api` to the backend.

Raphaël Flores's avatar
Raphaël Flores committed
119
The application will be available on:
120
121

- <http://localhost:4000/rare-dev> for RARe (runs with: `yarn start:rare` or simply `yarn start`)
122
- <http://localhost:4000/rare-dev> for RARe with basket (runs with: `yarn start:rare-with-basket`)
123
- <http://localhost:4100/wheatis-dev> for WheatIS (runs with: `yarn start:wheatis`)
124
- <http://localhost:4200/data-discovery-dev> for DataDiscovery (runs with: `yarn start:data-discovery`)
125
126

See [./frontend/package.json (scripts section)](./frontend/package.json) for other yarn commands.
127
128
129
130
131

## Build

To build the app, just run:

Raphaël Flores's avatar
Raphaël Flores committed
132
133
134
```sh
./gradlew assemble
```
135
136
137

or

Raphaël Flores's avatar
Raphaël Flores committed
138
139
140
```sh
./gradlew assemble -Papp=wheatis
```
141

142
or
143

Raphaël Flores's avatar
Raphaël Flores committed
144
```sh
145
./gradlew assemble -Papp=rare-with-basket
Raphaël Flores's avatar
Raphaël Flores committed
146
```
147

148
This will build a standalone jar at `backend/build/libs/`, that you can run with either:
149

Raphaël Flores's avatar
Raphaël Flores committed
150
151
```sh
java -jar backend/build/libs/rare.jar
152
java -jar backend/build/libs/rare-with-basket.jar
Raphaël Flores's avatar
Raphaël Flores committed
153
154
155
java -jar backend/build/libs/wheatis.jar
java -jar backend/build/libs/data-discovery.jar
```
Raphaël Flores's avatar
Raphaël Flores committed
156

157
And the full app running on:
158

159
160
161
- <http://localhost:8080/rare-dev>
- <http://localhost:8180/wheatis-dev>
- <http://localhost:8280/data-discovery-dev>
162
- <http://localhost:8580/rare-with-basket-dev>
163
164
165
166
167

## CI

The `.gitlab-ci.yml` file describes how Gitlab is running the CI jobs.

168
169
170
It uses a base docker image named `urgi/docker-browsers`
available on [DockerHub](https://hub.docker.com/r/urgi/docker-browsers/)
and [INRA-MIA Gitlab](https://forgemia.inra.fr/urgi-is/docker-rare).
Raphaël Flores's avatar
Raphaël Flores committed
171
The image is based on `openjdk:8` and adds all stuff needed to run the tests
172
(ie. a Chrome binary with a headless Chrome in `--no-sandbox` mode).
173
174
175
176
177
178

We install `node` and `yarn` in `/tmp` (this is not the case for local builds)
to avoid symbolic links issues on Docker.

You can approximate what runs on CI by executing:

Raphaël Flores's avatar
Raphaël Flores committed
179
180
181
```sh
docker run --rm -v "$PWD":/home/rare -w /home/rare urgi/docker-browsers ./gradlew build
```
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
182

183
184
Or also run a gitlab-runner as Gitlab-CI would do (minus the environment variables and caching system):

Raphaël Flores's avatar
Raphaël Flores committed
185
186
187
```sh
gitlab-runner exec docker test
```
188
189
190
191
192

## Documentation

An API documentation describing most of the webservices can be generated using the
build task `asciidoctor`, which executes tests and generates documentation based on snippets generated
193
by these tests. The documentation is generated in the folder `backend/build/asciidoc/html5/index.html`
194

Raphaël Flores's avatar
Raphaël Flores committed
195
196
197
```sh
./gradlew asciidoctor
```
198

Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
199
200
## Harvest

201
Before all, if you have cloned the repository without fetching the data (see [Data handling](#data-handling) section), take care to get it before running any indexing script.
202
203
204

### Portability

205
206
207
Feedback related to portability on MacOS and other GNU/Linux distro is really welcomed.

For MacOS, care to use latest GNU Parallel and Bash v4 versions, not the version provided by default via Brew.
208
209
Don't use zsh!

210
211
212
213
214
Install the following packages to be able to run the scripts:

```sh
brew install gnu-sed coreutils parallel
```
215

216
Harvesting (i.e. importing JSON documents into Elasticsearch) consists in creating the necessary index and aliases and Elasticsearch templates.
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
217

Raphaël Flores's avatar
Raphaël Flores committed
218
To create the index and its aliases execute the script below for local dev environment:
219

Raphaël Flores's avatar
Raphaël Flores committed
220
```sh
221
./scripts/index.sh -app rare|wheatis|data-discovery --local
Raphaël Flores's avatar
Raphaël Flores committed
222
```
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
223

224
The -app parameter will trigger a harvest of the resources stored in the Git LFS directories `data/rare`, `data/wheatis` and `data/data-discovery` respectively.
Raphaël Flores's avatar
Raphaël Flores committed
225

226
227
## Indices and aliases

228
The application uses several physical indices:
229

230
- one to store physical resources, containing the main content
231
- one to store suggestions, used for the search type-ahead feature only
Raphaël Flores's avatar
Raphaël Flores committed
232
233

Both indices must be created explicitly before using the application. If not, requests to the web services will return errors.
234

235
Each index and alias below refers to `rare` application in `dev` environment, the equivalent shall be created for `wheatis` and `data-discovery` app in `dev` environment as same as in `beta` or `staging` or `prod` environments. For brevity, only `rare-dev` is explained here.
Raphaël Flores's avatar
Raphaël Flores committed
236
237
{: .alert .alert-info}

238
239
240
241
242
243
244
245
The application doesn't use the physical resources index directly. Instead, it uses two aliases, that must be created before using the application:

- `rare-dev-resource-alias` is the alias used by the application to store and search for documents
- `rare-dev-suggestions-alias` is the alias used by the application to store and search for suggestions, only used for completion service.

In normal operations, these two aliases should refer to physical indices having a timestamp such as `rare-dev-tmstp1579877133-suggestions` and `rare-dev-tmstp1579877133-resource-index`. Those timestamps allow for reindexing data without breaking another runnning application having another timestamp. The alias switch being done atomicly, we always see data in the web interface.

Using two aliases is useful when deleting obsolete documents. This is actually done by removing everything and then harvesting the new JSON files again, to re-populate the index from scratch.
246

247
248
## Spring Cloud config

249
On bootstrap, the application will try to connect to a remote Spring Cloud config server to fetch its configuration. The details of this remote server are filled in the `bootstrap.yml` file. By default, it tries to connect to the local server on <http://localhost:8888> but it can of course be changed, or even configured via the `SPRING_CONFIG_URI` environment variable.
250
251
252

It will try to fetch the configuration for the application name `rare`, and the default profile.
If such a configuration is not found, it will then fallback to the local `application.yml` properties.
253

254
255
256
To avoid running the Spring Cloud config server every time when developing the application,
all the properties are still available in `application.yml` even if they are configured on the remote Spring Cloud server as well.

257
If you want to use the Spring Cloud config app locally, see <https://forgemia.inra.fr/urgi-is/data-discovery-config>
258

259
The configuration is currently only read on startup, meaning the application has to be reboot if the configuration is changed on the Spring Cloud server. For a dynamic reload without restarting the application, see <http://cloud.spring.io/spring-cloud-static/Finchley.SR1/single/spring-cloud.html#refresh-scope>
260
to check what has to be changed.
261

262
In case of testing configuration from the config server, one may use a dedicated branch on `data-discovery-config` project and append the `--spring.cloud.config.label=<branch name to test>` parameter when starting the application's executable jar, or use the corresponding Spring [env variable](https://docs.spring.io/spring-boot/docs/current/reference/html/spring-boot-features.html#boot-features-external-config-relaxed-binding-from-environment-variables) (_ie._ `SPRING_CLOUD_CONFIG_LABEL`). More info on [how to pass a parameter to a Spring Boot app](https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html#boot-features-external-config).
263

264
265
## Building other apps

266
By default, the built application is RARe (without basket, i.e. without the possibility to add accessions to a basket and create an accession order on the rare-basket application). But this project actually allows building other applications (RARe with basket and WheatIS, for the moment, but more could come).
267
268
269
270

To build a different app, specify an `app` property when building. For example, to assemble
the WheatIS app, run the following command

Raphaël Flores's avatar
Raphaël Flores committed
271
272
273
```sh
./gradlew assemble -Papp=wheatis
```
274

275
276
You can also run the backend WheatIS application using

Raphaël Flores's avatar
Raphaël Flores committed
277
278
279
```sh
./gradlew bootRun -Papp=wheatis
```
280

281
282
283
284
285
286
287
288
289
290
291
292
To assemble the RARe app with support for adding accessions to a basket, run the following command

```sh
./gradlew assemble -Papp=rare-with-basket
```

You can also run the backend RARe application with basket support using

```sh
./gradlew bootRun -Papp=rare-with-basket
```

293
294
Adding this property has the following consequences:

295
296
- the generated jar file (in `backend/build/libs`) is named `wheatis.jar` (resp. `rare-with-basket.jar` instead of `rare.jar`;
- the Spring active profile in `bootstrap.yml` is `wheatis-app` (resp. `rare-with-basket-app`) instead of `rare-app`;
297
- the frontend application built and embedded inside the jar file is the WheatIS frontend application (resp. the RARe application with basket support) instead of the RARe frontend application, i.e. the frontend command `yarn build:wheatis` (resp. `yarn build:rare-with-basket`) is executed instead of the command `yarn:rare`.
298

299
Since the active Spring profile is different, all the properties specific to this profile
300
are applied. In particular:
301
302
303

- the context path of the application is `/wheatis-dev` instead of `/rare-dev`;
- the Elasticsearch prefix used for the index aliases is different.
304

Raphaël Flores's avatar
Raphaël Flores committed
305
See the `backend/src/main/resources/application.yml` file for details.
306
307

You can adapt the elasticsearch index used with the following parameter
308

Raphaël Flores's avatar
Raphaël Flores committed
309
310
311
```sh
java -jar backend/build/libs/data-discovery.jar --data-discovery.elasticsearch-prefix="data-discovery-staging-"
```
312

313
314
315
316
317
318
For debuging:

```sh
java -jar backend/build/libs/data-discovery.jar --debug
```

319
320
321
## Configuration

The RARe and RARe with basket applications can be configured to apply an implicit filtering on the searches,
322
aggregations, and pillar list. There is currently only one implicit filter that can be added, which is a filter on the pillar name.
323
324
325
326
327
328
329
330
331
332

To activate it, add the following YAML configuration under the appropriate profile:

```yaml
rare:
  implicit-terms:
    PILLAR:
      - Pilier Forêt
      - Pilier Micro-organisme
```