README.md 10.7 KB
Newer Older
1
2
# Rare project - Data discovery

3
4
5
6
7
- [Rare project - Data discovery](#rare-project---data-discovery)
  - [Contribute](#contribute)
    - [Data](#data)
    - [Code](#code)
  - [Setup](#setup)
8
    - [Data handling](#data-handling)
9
10
11
12
13
14
    - [Backend](#backend)
    - [Frontend](#frontend)
  - [Build](#build)
  - [CI](#ci)
  - [Documentation](#documentation)
  - [Harvest](#harvest)
15
    - [Portability](#portability)
16
17
18
  - [Indices and aliases](#indices-and-aliases)
  - [Spring Cloud config](#spring-cloud-config)
  - [Building other apps](#building-other-apps)
19

20
21
## Contribute

22
If you want to install the program locally, it's cool, just keep reading at [Setup section and beyond](#setup).
23
24
25

### Data

26
You might probably want to know how to contribute to the federation of data. That's great, let's have a look at the [WheatIS/Plant guide](./HOW-TO-JOIN-WHEATIS-AND-PLANT-FEDERATIONS.md) or the [RARe guide](./HOW-TO-JOIN-RARe-FEDERATION.md) to know how to.
27

28
29
30
### Code

If you do want to contribute to code, that's great also, have a look at [CONTRIBUTING.md](./CONTRIBUTING.MD).
31

32
33
## Setup

34
35
### Data handling

36
At the moment, all data is located next to the code in the `data` directory. If you want to have a look at the code only, you can ignore this directory at git clone step by setting the variable `GIT_LFS_SKIP_SMUDGE=1`, ie.:
37
38
39
40
41
42
43
44
45
46
47
48

```sh
GIT_LFS_SKIP_SMUDGE=1 git clone git@forgemia.inra.fr:urgi-is/data-discovery.git
```

After clone done, if you want to fetch some of the data (for instance for RARe only), let's run:

```sh
$ git lfs pull -I data/rare/
Downloading LFS objects: 100% (16/16), 8.8 MB | 0 B/s
```

49
50
### Backend

51
The project uses Spring (5.x) for the backend, with Spring Boot.
52
53
54
55
56

You need to install:

- a recent enough JDK8

57
58
59
The docker images need quite a bit of resources,
so make sure you have at least 4g of RAM configured (Docker Desktop / Resources / Memory).

60
The application expects to connect on an Elasticsearch instance running on `http://127.0.0.1:9200`.
Exbrayat Cédric's avatar
Exbrayat Cédric committed
61
62
To have such an instance, simply run:

Raphaël Flores's avatar
Raphaël Flores committed
63
64
65
```sh
docker-compose up
```
Exbrayat Cédric's avatar
Exbrayat Cédric committed
66

Raphaël Flores's avatar
Raphaël Flores committed
67
And this will start Elasticsearch and a Kibana instance (allowing to explore the data on <http://localhost:5601>).
Exbrayat Cédric's avatar
Exbrayat Cédric committed
68

69
70
71
Then at the root of the application, run `./gradlew build` to download the dependencies.
Then run `./gradlew bootRun` to start the app.

72
You can stop the Elasticsearch and Kibana instances by running:
Exbrayat Cédric's avatar
Exbrayat Cédric committed
73

Raphaël Flores's avatar
Raphaël Flores committed
74
75
76
```sh
docker-compose stop
```
77
78

or run:
79

Raphaël Flores's avatar
Raphaël Flores committed
80
81
82
```sh
docker-compose down
```
83

84
to also remove the stopped containers as well as any networks that were created.
Exbrayat Cédric's avatar
Exbrayat Cédric committed
85

86
87
### Frontend

88
The project uses Angular (8.x) for the frontend, with the Angular CLI.
89
90
91

You need to install:

Raphaël Flores's avatar
Raphaël Flores committed
92
- a recent enough NodeJS (ie. v12 LTS) is required for Angular 8.
93
94
95
96
97
- Yarn as a package manager (see [here to install](https://yarnpkg.com/en/docs/install))

Then in the `frontend` directory, run `yarn` to download the dependencies.
Then run `yarn start` to start the app, using the proxy conf to reroute calls to `/api` to the backend.

Raphaël Flores's avatar
Raphaël Flores committed
98
The application will be available on:
99
100
101

- <http://localhost:4000/rare-dev> for RARe (runs with: `yarn start:rare` or simply `yarn start`)
- <http://localhost:4100/wheatis-dev> for WheatIS (runs with: `yarn start:wheatis`)
102
- <http://localhost:4200/data-discovery-dev> for DataDiscovery (runs with: `yarn start:data-discovery`)
103
104

See [./frontend/package.json (scripts section)](./frontend/package.json) for other yarn commands.
105
106
107
108
109

## Build

To build the app, just run:

Raphaël Flores's avatar
Raphaël Flores committed
110
111
112
```sh
./gradlew assemble
```
113
114
115

or

Raphaël Flores's avatar
Raphaël Flores committed
116
117
118
```sh
./gradlew assemble -Papp=wheatis
```
119
120
121

which is a shortcut for

Raphaël Flores's avatar
Raphaël Flores committed
122
123
124
```sh
./gradlew assemble -Papp=wheatis && java -jar path/to/wheatis.jar
```
125

126
This will build a standalone jar at `backend/build/libs/`, that you can run with either:
127

Raphaël Flores's avatar
Raphaël Flores committed
128
129
130
131
132
```sh
java -jar backend/build/libs/rare.jar
java -jar backend/build/libs/wheatis.jar
java -jar backend/build/libs/data-discovery.jar
```
Raphaël Flores's avatar
Raphaël Flores committed
133

134
And the full app running on:
135

136
137
138
- <http://localhost:8080/rare-dev>
- <http://localhost:8180/wheatis-dev>
- <http://localhost:8280/data-discovery-dev>
139
140
141
142
143

## CI

The `.gitlab-ci.yml` file describes how Gitlab is running the CI jobs.

144
145
146
It uses a base docker image named `urgi/docker-browsers`
available on [DockerHub](https://hub.docker.com/r/urgi/docker-browsers/)
and [INRA-MIA Gitlab](https://forgemia.inra.fr/urgi-is/docker-rare).
Raphaël Flores's avatar
Raphaël Flores committed
147
The image is based on `openjdk:8` and adds all stuff needed to run the tests
148
(ie. a Chrome binary with a headless Chrome in `--no-sandbox` mode).
149
150
151
152
153
154

We install `node` and `yarn` in `/tmp` (this is not the case for local builds)
to avoid symbolic links issues on Docker.

You can approximate what runs on CI by executing:

Raphaël Flores's avatar
Raphaël Flores committed
155
156
157
```sh
docker run --rm -v "$PWD":/home/rare -w /home/rare urgi/docker-browsers ./gradlew build
```
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
158

159
160
Or also run a gitlab-runner as Gitlab-CI would do (minus the environment variables and caching system):

Raphaël Flores's avatar
Raphaël Flores committed
161
162
163
```sh
gitlab-runner exec docker test
```
164
165
166
167
168

## Documentation

An API documentation describing most of the webservices can be generated using the
build task `asciidoctor`, which executes tests and generates documentation based on snippets generated
169
by these tests. The documentation is generated in the folder `backend/build/asciidoc/html5/index.html`
170

Raphaël Flores's avatar
Raphaël Flores committed
171
172
173
```sh
./gradlew asciidoctor
```
174

Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
175
176
## Harvest

177
Before all, if you have cloned the repository without fetching the data (see [Data handling](#data-handling) section), take care to get it before running any indexing script.
178
179
180

### Portability

181
182
183
184
185
186
187
188
Feedback related to portability on MacOS and other GNU/Linux distro is really welcomed.

For MacOS, care to use latest GNU Parallel and Bash v4 versions, not the version provided by default via Brew.
Install the following packages to be able to run the scripts:

```sh
brew install gnu-sed coreutils parallel
```
189

190
Harvesting (i.e. importing JSON documents into Elasticsearch) consists in creating the necessary index and aliases and Elasticsearch templates.
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
191

Raphaël Flores's avatar
Raphaël Flores committed
192
To create the index and its aliases execute the script below for local dev environment:
193

Raphaël Flores's avatar
Raphaël Flores committed
194
195
196
```sh
./scripts/index.sh -app rare|wheat|data-discovery --local
```
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
197

198
The -app parameter will trigger a harvest of the resources stored in the Git LFS directories `data/rare`, `data/wheatis` and `data/data-discovery` respectively.
Raphaël Flores's avatar
Raphaël Flores committed
199

200
201
## Indices and aliases

202
The application uses several physical indices:
203

204
205
- one to store physical resources, containing the main content
- one to store suggestions, use for the search type-ahead feature only
Raphaël Flores's avatar
Raphaël Flores committed
206
207

Both indices must be created explicitly before using the application. If not, requests to the web services will return errors.
208

209
Each index and alias below refers to `rare` application in `dev` environment, the equivalent shall be created for `wheatis` and `data-discovery` app in `dev` environment as same as in `beta` or `staging` or `prod` environments. For brevity, only `rare-dev` is explained here.
Raphaël Flores's avatar
Raphaël Flores committed
210
211
{: .alert .alert-info}

212
213
214
215
216
217
218
219
The application doesn't use the physical resources index directly. Instead, it uses two aliases, that must be created before using the application:

- `rare-dev-resource-alias` is the alias used by the application to store and search for documents
- `rare-dev-suggestions-alias` is the alias used by the application to store and search for suggestions, only used for completion service.

In normal operations, these two aliases should refer to physical indices having a timestamp such as `rare-dev-tmstp1579877133-suggestions` and `rare-dev-tmstp1579877133-resource-index`. Those timestamps allow for reindexing data without breaking another runnning application having another timestamp. The alias switch being done atomicly, we always see data in the web interface.

Using two aliases is useful when deleting obsolete documents. This is actually done by removing everything and then harvesting the new JSON files again, to re-populate the index from scratch.
220

221
222
## Spring Cloud config

223
On bootstrap, the application will try to connect to a remote Spring Cloud config server to fetch its configuration. The details of this remote server are filled in the `bootstrap.yml` file. By default, it tries to connect to the local server on <http://localhost:8888> but it can of course be changed, or even configured via the `SPRING_CONFIG_URI` environment variable.
224
225
226

It will try to fetch the configuration for the application name `rare`, and the default profile.
If such a configuration is not found, it will then fallback to the local `application.yml` properties.
227

228
229
230
To avoid running the Spring Cloud config server every time when developing the application,
all the properties are still available in `application.yml` even if they are configured on the remote Spring Cloud server as well.

231
If you want to use the Spring Cloud config app locally, see <https://forgemia.inra.fr/urgi-is/data-discovery-config>
232

233
The configuration is currently only read on startup, meaning the application has to be reboot if the configuration is changed on the Spring Cloud server. For a dynamic reload without restarting the application, see <http://cloud.spring.io/spring-cloud-static/Finchley.SR1/single/spring-cloud.html#refresh-scope>
234
to check what has to be changed.
235

236
In case of testing configuration from the config server, one may use a dedicated branch on `data-discovery-config` project and append the `--spring.cloud.config.label=<branch name to test>` parameter when starting the application's executable jar. More info on [how to pass a parameter to a Spring Boot app](https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html#boot-features-external-config).
237

238
239
240
241
242
243
244
245
## Building other apps

By default, the built application is RARe. But this project actually allows building other
applications (WheatIS, for the moment, but more could come).

To build a different app, specify an `app` property when building. For example, to assemble
the WheatIS app, run the following command

Raphaël Flores's avatar
Raphaël Flores committed
246
247
248
```sh
./gradlew assemble -Papp=wheatis
```
249

250
251
You can also run the backend WheatIS application using

Raphaël Flores's avatar
Raphaël Flores committed
252
253
254
```sh
./gradlew bootRun -Papp=wheatis
```
255

256
257
Adding this property has the following consequences:

258
259
260
261
- the generated jar file (in `backend/build/libs`) is named `wheatis.jar` instead of `rare.jar`;
- the Spring active profile in `bootstrap.yml` is `wheatis-app` instead of `rare-app`;
- the frontend application built and embedded inside the jar file is the WheatIS frontend application instead of the  RARe frontend application, i.e. the frontend command `yarn build:wheatis` is executed instead of the command `yarn:rare`.

262
263
Since the active Spring profile is different, all the properties specific to this profile
are applies. In particular:
264
265
266

- the context path of the application is `/wheatis-dev` instead of `/rare-dev`;
- the Elasticsearch prefix used for the index aliases is different.
267

Raphaël Flores's avatar
Raphaël Flores committed
268
See the `backend/src/main/resources/application.yml` file for details.
269
270

You can adapt the elasticsearch index used with the following parameter
271

Raphaël Flores's avatar
Raphaël Flores committed
272
273
274
```sh
java -jar backend/build/libs/data-discovery.jar --data-discovery.elasticsearch-prefix="data-discovery-staging-"
```