README.md 6 KB
Newer Older
1
2
3
4
5
6
7
8
9
10
11
12
13
# Rare project - Data discovery

## Setup

### Backend

The project uses Spring (5.x) for the backend,
with Spring Boot.

You need to install:

- a recent enough JDK8

Exbrayat Cédric's avatar
Exbrayat Cédric committed
14
15
16
17
18
19
20
21
The application expects to connect on an ElasticSearch instance running on `http://127.0.0.1:9300`,
in a cluster named `es-rare`.
To have such an instance, simply run:

    docker-compose up

And this will start ElasticSearch and a Kibana instance (allowing to explore the data on http://localhost:5601).

22
23
24
Then at the root of the application, run `./gradlew build` to download the dependencies.
Then run `./gradlew bootRun` to start the app.

Exbrayat Cédric's avatar
Exbrayat Cédric committed
25
26
27
28
You can stop the Elastic Search and Kibana instances by running:

    docker-compose stop

29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
### Frontend

The project uses Angular (6.x) for the frontend,
with the Angular CLI.

You need to install:

- a recent enough NodeJS (8.11+)
- Yarn as a package manager (see [here to install](https://yarnpkg.com/en/docs/install))

Then in the `frontend` directory, run `yarn` to download the dependencies.
Then run `yarn start` to start the app, using the proxy conf to reroute calls to `/api` to the backend.

The application will be available on http://localhost:4200

## Build

To build the app, just run:

    ./gradlew assemble

This will build a standalone jar at `backend/build/libs/rare.jar`, that you can run with:

    java -jar backend/build/libs/rare.jar

And the full app runs on http://localhost:8080


## CI

The `.gitlab-ci.yml` file describes how Gitlab is running the CI jobs.

It uses a base docker image named `ninjasquad/docker-rare`
available on [DockerHub](https://hub.docker.com/r/ninjasquad/docker-rare/)
and [Github](https://github.com/Ninja-Squad/docker-rare).
The image is based on `openjdk:8` and adds a Chrome binary to let us run the frontend tests
(with a headless Chrome in `--no-sandbox` mode).

We install `node` and `yarn` in `/tmp` (this is not the case for local builds)
to avoid symbolic links issues on Docker.

You can approximate what runs on CI by executing:

    docker run --rm -v "$PWD":/home/rare -w /home/rare ninjasquad/docker-rare ./gradlew build
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
73
74
75
76
77
78
79
80
81
82
83
84
85

## Harvest

Harvesting (i.e. importing genetic resources stored in JSON files into ElasticSearch) consists in
placing the JSON files into a directory where the server can find them.

This directory, by default is `/tmp/rare/resources`. But it's externalized into the Spring Boot property
`rare.resource-dir`, so it can be easily changed by modifying the value of this property (using an 
environment variable for example).

The files must have the extension `.json`, and must be stored in that directory (not in a sub-directory).
Once the files are ready and the server is started, the harvest is triggered by sending a POST request
to the endpoint `/api/harvests`, without any request body.
86
This endpoint, as well as the actuator endpoints, is only accessible to an authenticated user. The user (`rare`) and its password (`f01a7031fc17`) are configured in the application.yml file (and can thus be overridden using environment variables for example).
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
87
88
89

Example with the `http` command ([HTTPie](https://httpie.org/)):

90
    http --auth rare:f01a7031fc17 POST http://localhost:8080/api/harvests
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
91
92
93
    
Example with the `curl` command:

94
    curl -i -X POST -u rare:f01a7031fc17 http://localhost:8080/api/harvests
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
95
96
97
98
99
100
101
102
103
104
105
    
The harvest job is executed asynchronously, and a response is immediately sent back, with the URL allowing
to get the result of the job. For example:

    HTTP/1.1 201 
    Content-Length: 0
    Date: Tue, 24 Jul 2018 12:58:04 GMT
    Location: http://localhost:8080/api/harvests/abb5784d-3006-48fb-b5db-d3ff9583e8b9
    
To get the result of the job, you can then send a GET request to the returned URL:

106
    http --auth rare:f01a7031fc17 GET http://localhost:8080/api/harvests/abb5784d-3006-48fb-b5db-d3ff9583e8b9
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
107
108
109

or

110
    curl -u rare:f01a7031fc17 http://localhost:8080/api/harvests/abb5784d-3006-48fb-b5db-d3ff9583e8b9
Jean-Baptiste Nizet's avatar
Jean-Baptiste Nizet committed
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
    
`http` has the advantage of nicely formetting the returned JSON.

The response contains a detailed report containing the start instant, and the list of files
that have been processed, with the number of successfully imported resources, and the errors
that occurred, if any.

It's only when the property `endInstant` of the returned JSON is non-null that the job is complete.
```
{
    "endInstant": "2018-07-24T12:56:28.077Z",
    "files": [
        {
            "errorCount": 0,
            "errors": [],
            "fileName": "rare_pilier_microbial.json",
            "successCount": 10
        },
        {
            "errorCount": 2,
            "errors": [
                {
                    "column": 4,
                    "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"name\"])",
                    "index": 4790,
                    "line": 105594
                },
                {
                    "column": 4,
                    "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"countryOfCollect\"])",
                    "index": 5905,
                    "line": 130127
                }
            ],
            "fileName": "rare_pilier_plant.json",
            "successCount": 14522
        }
    ],
    "globalErrors": [],
    "id": "55e70557-79e8-4e40-a44b-2ef4b3df076a",
    "startInstant": "2018-07-24T12:56:27.322Z"
}
```
154
155
156
157
158
159
160
161
162
163

In case you lost the response to the POST request and thus don't know what the URL of the harvest is, 
you can list the harvests, in descending order of their start instant, by sending a GET request to
`/api/harvests`:

    http --auth rare:f01a7031fc17 GET http://localhost:8080/api/harvests

or

    curl -u rare:f01a7031fc17 http://localhost:8080/api/harvests