Commit 7e8486ce authored by Raphaël Flores's avatar Raphaël Flores Committed by Raphaël Flores
Browse files

Add CONTRIBUTING, update documentation.

parent 0f66e031
# How to contribute to DataDiscovery
## Git management
### Branches
- One stable & protected [master](/) branch
- Feature branches for development following the pattern `[dev_type]/[dev_name]` (ie. `chore/explaining_how_to_merge`) where `[dev_type]` can be:
- fix (bug fixes)
- feat (new feature)
- style (style modification)
- refactor (code refactoring)
- chore (base maintenance such as version bump)
- test (for anything related to tests)
### Commits & branches merge
- Commit name should follow pattern `[dev_type]: [brief description of the commit, lower than 50 characters]. [Issue KEY]`
- All branches must be merged via a `merge request` (MR)
- Merge requests should be created at the time of the branch creation in order to allow reviewer to comment and follow the developments, specify the `WIP` tag in the MR name (go further: *[feature highlight WIP](https://about.gitlab.com/2016/01/08/feature-highlight-wip/)*).
Example:
```sh
git checkout -b chore/explaining_how_to_merge
git push --set-upstram origin chore/explaining_how_to_merge
```
Returns a link for creating the merge request easily:
```sh
Total 0 (delta 0), reused 0 (delta 0)
remote:
remote: To create a merge request for chore/explaining_how_to_merge, visit:
remote:   https://forgemia.inra.fr/urgi-is/faidare/merge_requests/new?merge_request%5Bsource_branch%5D=chore/explaining_how_to_merge
remote:
To forgemia.inra.fr:urgi-is/faidare.git
* [new branch]      chore/explaining_how_to_merge -> chore/explaining_how_to_merge
La branche 'chore/explaining_how_to_merge' est paramétrée pour suivre la branche distante 'chore/explaining_how_to_merge' depuis 'origin'.
```
- A `git rebase` is strongly recommanded before merging a MR
- [Git rebase official documentation](https://git-scm.com/book/en/v2/Git-Branching-Rebasing)
- [How to keep a clean history](https://about.gitlab.com/2018/06/07/keeping-git-commit-history-clean/)
- Merge requests should be reviewed by at least 1 colleague
- Continuous Integration is launched automatically by Gitlab on each commit push or merge request creation.
## Data management
- Git LFS (look at [how to install](https://git-lfs.github.com/)) feature is enabled on this project in order to store JSON data
- Another dedicated Git LFS project (internal only) will be created to handle all private + public JSON files
- **TODO**: refer to the good Git LFS project in the CI settings for indexing the relevant data into the relevant Elasticsearch indices/instances
- the JSON files generation is handled by an external ET (extract/transform) tool.
## Development environment
- Look at the [README.md](README.md) for installation and execution instructions.
- Recommanded IDE are [Visual Studio Code](https://code.visualstudio.com/) or [Intellij IDEA](https://www.jetbrains.com/idea/)
- Use linting to apply code standards within the team:
- Use `ng lint` (for frontend code only)
- Use [Checkstyle](https://checkstyle.org/) and [PMD](https://pmd.github.io/) (**TODO**: implement) for backend (+frontend?) code
- All runtime variables should be externalized from the code in order to facilitate the CI management (database host/port, application name, public URL, JSON location...) and the adoption by partners
# Rare project - Data discovery # Rare project - Data discovery
- [Rare project - Data discovery](#rare-project---data-discovery)
- [Contribute](#contribute)
- [Data](#data)
- [Code](#code)
- [Setup](#setup)
- [Backend](#backend)
- [Frontend](#frontend)
- [Build](#build)
- [CI](#ci)
- [Documentation](#documentation)
- [Harvest](#harvest)
- [Indices and aliases](#indices-and-aliases)
- [Spring Cloud config](#spring-cloud-config)
- [Building other apps](#building-other-apps)
## Contribute ## Contribute
If you want to install the program on-premise, it's cool, just keep reading at [Setup section and beyond](#setup).
### Data
You might probably want to know how to contribute to the federation of data. That's great, let's have a look at the [WheatIS/Plant guide](./HOW-TO-JOIN-WHEATIS-AND-PLANT-FEDERATIONS.md) or the [RARe guide](./HOW-TO-JOIN-RARe-FEDERATION.md) to know how to. You might probably want to know how to contribute to the federation of data. That's great, let's have a look at the [WheatIS/Plant guide](./HOW-TO-JOIN-WHEATIS-AND-PLANT-FEDERATIONS.md) or the [RARe guide](./HOW-TO-JOIN-RARe-FEDERATION.md) to know how to.
If you do want to contribute to code or even only install the program on-premise it's great also, just keep reading below. ### Code
If you do want to contribute to code, that's great also, have a look at [CONTRIBUTING.md](./CONTRIBUTING.MD).
## Setup ## Setup
### Backend ### Backend
The project uses Spring (5.x) for the backend, The project uses Spring (5.x) for the backend, with Spring Boot.
with Spring Boot.
You need to install: You need to install:
...@@ -27,11 +46,11 @@ And this will start Elasticsearch and a Kibana instance (allowing to explore the ...@@ -27,11 +46,11 @@ And this will start Elasticsearch and a Kibana instance (allowing to explore the
Then at the root of the application, run `./gradlew build` to download the dependencies. Then at the root of the application, run `./gradlew build` to download the dependencies.
Then run `./gradlew bootRun` to start the app. Then run `./gradlew bootRun` to start the app.
You can stop the Elastic Search and Kibana instances by running: You can stop the Elasticsearch and Kibana instances by running:
docker-compose stop docker-compose stop
or or run:
docker-compose down docker-compose down
...@@ -43,34 +62,41 @@ The project uses Angular (7.x) for the frontend, with the Angular CLI. ...@@ -43,34 +62,41 @@ The project uses Angular (7.x) for the frontend, with the Angular CLI.
You need to install: You need to install:
- a recent enough NodeJS (8.11+). Node 10 is recommended for Angular 7. - a recent enough NodeJS, v10+ is required for Angular 7.
- Yarn as a package manager (see [here to install](https://yarnpkg.com/en/docs/install)) - Yarn as a package manager (see [here to install](https://yarnpkg.com/en/docs/install))
Then in the `frontend` directory, run `yarn` to download the dependencies. Then in the `frontend` directory, run `yarn` to download the dependencies.
Then run `yarn start` to start the app, using the proxy conf to reroute calls to `/api` to the backend. Then run `yarn start` to start the app, using the proxy conf to reroute calls to `/api` to the backend.
The application will be available on: The application will be available on:
- http://localhost:4000/rare-dev for RARe (runs with: `yarn start:rare` or simply `yarn start`)
- http://localhost:4100/wheatis-dev for WheatIS (runs with: `yarn start:wheatis`) - <http://localhost:4000/rare-dev> for RARe (runs with: `yarn start:rare` or simply `yarn start`)
- <http://localhost:4100/wheatis-dev> for WheatIS (runs with: `yarn start:wheatis`)
- <http://localhost:4200/data-discovery-dev> for WheatIS (runs with: `yarn start:data-discovery`)
See [./frontend/package.json (scripts section)](./frontend/package.json) for other yarn commands.
## Build ## Build
To build the app, just run: To build the app, just run:
./gradlew assemble ./gradlew assemble
or or
./gradlew assemble -Papp=wheatis ./gradlew assemble -Papp=wheatis
This will build a standalone jar at `backend/build/libs/rare.jar` or `backend/build/libs/wheatis.jar`, that you can run with: This will build a standalone jar at `backend/build/libs/`, that you can run with either:
java -jar backend/build/libs/rare.jar java -jar backend/build/libs/rare.jar
java -jar backend/build/libs/wheatis.jar java -jar backend/build/libs/wheatis.jar
java -jar backend/build/libs/data-discovery.jar
And the full app run on: And the full app running on:
- http://localhost:8080/rare-dev
- http://localhost:8180/wheatis-dev
- <http://localhost:8080/rare-dev>
- <http://localhost:8180/wheatis-dev>
- <http://localhost:8280/data-discovery-dev>
## CI ## CI
...@@ -97,132 +123,57 @@ Or also run a gitlab-runner as Gitlab-CI would do (minus the environment variabl ...@@ -97,132 +123,57 @@ Or also run a gitlab-runner as Gitlab-CI would do (minus the environment variabl
An API documentation describing most of the webservices can be generated using the An API documentation describing most of the webservices can be generated using the
build task `asciidoctor`, which executes tests and generates documentation based on snippets generated build task `asciidoctor`, which executes tests and generates documentation based on snippets generated
by these tests. The documentation is generated in the folder `backend/build/asciidoc/html5/index.html`/ by these tests. The documentation is generated in the folder `backend/build/asciidoc/html5/index.html`
./gradlew asciidoctor ./gradlew asciidoctor
## Harvest ## Harvest
Harvesting (i.e. importing documents stored in JSON files into Elasticsearch) consists in Harvesting (i.e. importing JSON documents stored into Elasticsearch) consists in creating the necessary index and aliases and Elasticsearch templates.
creating the necessary index and aliases and Elasticsearch templates.
To create the index and its aliases execute the script below for local dev environment: To create the index and its aliases execute the script below for local dev environment:
./scripts/createIndexAndAliases.sh ./scripts/index.sh -app rare|wheat|data-discovery --local
This script is a wrapper for the `./scripts/createIndexAndAliases4CI.sh` which handle some parameters to create
indices, aliases and so on, on another (possible remote) Elasticsearch for fitting to a specific environment:
./scripts/createIndexAndAliases4CI.sh -host localhost -app rare -env dev
You can run the scripts: The -app parameter will trigger a harvest of the resources stored in the Git LFS directories `data/rare`, `data/wheatis` and `data/data-discovery` respectively.
./scripts/harvestRare.sh
./scripts/harvestWheatis.sh
to trigger a harvest of the resources stored in the Git LFS directories `data/rare` and `data/wheatis` respectively.
## Indices and aliases ## Indices and aliases
The application uses several physical indices, which (at least the resources index) can be rolled over automatically based on the policies defined in the The application uses several physical indices:
`./backend/src/test/resources/fr/inra/urgi/datadiscovery/dao/*_policy.json` files. This is based on the
[Index Lifecyle Management](https://www.elastic.co/guide/en/elasticsearch/reference/6.6/index-lifecycle-management.html)
provided by Elasticsearch.
* one to store physical resources, containing the main content - one to store physical resources, containing the main content
* one to store suggestions, use for the search type-ahead feature only - one to store suggestions, use for the search type-ahead feature only
Both indices must be created explicitly before using the application. If not, requests to the web services will return errors. Both indices must be created explicitly before using the application. If not, requests to the web services will return errors.
Each index and alias below refers to `rare` application in `dev` environment, the equivalent shall be created for `wheatis` Each index and alias below refers to `rare` application in `dev` environment, the equivalent shall be created for `wheatis` and `data-discovery` app in `dev` environment as same as in `beta` or `staging` or `prod` environments. For brevity, only `rare-dev` is explained here.
app in `dev` environment as same as in `beta` or `prod` environments. For brevity, only `rare-dev` is explained here.
{: .alert .alert-info} {: .alert .alert-info}
The application doesn't use the physical resources index directly. Instead, it uses two aliases, that must be created The application doesn't use the physical resources index directly. Instead, it uses two aliases, that must be created before using the application:
before using the application:
* `rare-dev-resource-index` is the alias used by the application to search for documents
* `rare-dev-resource-harvest-index` is the alias used by the application to store documents when the harvest is triggered.
In normal operations, these two aliases should refer to the same physical resource index. The script - `rare-dev-resource-alias` is the alias used by the application to store and search for documents
`createIndexAndAliases.sh` creates a physical index (named `rare-dev-resource-physical-index`) and creates these two aliases - `rare-dev-suggestions-alias` is the alias used by the application to store and search for suggestions, only used for completion service.
referring to this physical index.
Once the index and the aliases have been created, a harvest can be triggered. The first operation that a harvest In normal operations, these two aliases should refer to physical indices having a timestamp such as `rare-dev-tmstp1579877133-suggestions` and `rare-dev-tmstp1579877133-resource-index`. Those timestamps allow for reindexing data without breaking another runnning application having another timestamp. The alias switch being done atomicly, we always see data in the web interface.
does is to create or update (put) the mapping for the document entity into the index aliased by `rare-dev-resource-harvest-index`.
Then it parses the JSON files and stores them into this same index. Since the `rare-dev-resource-index` alias
normally refers to the same physical index, searches will find the resources stored by the harvester.
### Why two aliases Using two aliases is useful when deleting obsolete documents. This is actually done by removing everything and then harvesting the new JSON files again, to re-populate the index from scratch.
Using two aliases is useful when deleting obsolete documents. This is actually done by removing everything
and then harvesting the new JSON files again, to re-populate the index from scratch.
Two scenarios are possible:
#### Deleting with some downtime
The harvest duration depends on the performance of Elasticsearch, of the performance of the harvester, and
of course, of the number of documents to harvest. If you don't mind about having a period of time
where the documents are not available, you can simply
- delete the physical index;
- re-create it with its aliases;
- trigger a new harvest.
Keep in mind that, with the current known set of documents (17172), on a development machine where everything is running
concurrently, when both the Elasticsearch server and the application are hot, a harvest only takes 12 seconds.
So, even if you have 10 times that number of documents (170K documents), it should only take around 2 minutes of downtime.
If you have 100 times that number of documents (1.7M documents), it should take around 20 minutes, which is still not a
very long time.
(Your mileage may vary: I assumed a linear complexity here).
Here are curl commands illustrating the above scenario:
```
# delete the physical index and its aliases
curl -X DELETE "localhost:9200/rare-dev-resource-physical-index"
# recreate the physical index and its aliases
curl -X PUT "localhost:9200/rare-dev-resource-physical-index" -H 'Content-Type: application/json' -d'
{
"aliases" : {
"rare-dev-resource-index" : {},
"rare-dev-resource-harvest-index" : {}
}
"settings": ...
}
'
```
**NOTE**: Every time a physical index is created, the settings must be specified, the same ay as in the
`createIndexAndAliases.sh` script. The exact content of the settings is omitted here for brevity and readability.
{: .alert .alert-info}
## Spring Cloud config ## Spring Cloud config
On bootstrap, the application will try to connect to a remote Spring Cloud config server to fetch its configuration. On bootstrap, the application will try to connect to a remote Spring Cloud config server to fetch its configuration. The details of this remote server are filled in the `bootstrap.yml` file. By default, it tries to connect to the local server on <http://localhost:8888> but it can of course be changed, or even configured via the `SPRING_CONFIG_URI` environment variable.
The details of this remote server are filled in the `bootstrap.yml` file.
By default, it tries to connect to the local server on http://localhost:8888
but it can of course be changed, or even configured via the `SPRING_CONFIG_URI` environment variable.
It will try to fetch the configuration for the application name `rare`, and the default profile. It will try to fetch the configuration for the application name `rare`, and the default profile.
If such a configuration is not found, it will then fallback to the local `application.yml` properties. If such a configuration is not found, it will then fallback to the local `application.yml` properties.
To avoid running the Spring Cloud config server every time when developing the application, To avoid running the Spring Cloud config server every time when developing the application,
all the properties are still available in `application.yml` even if they are configured on the remote Spring Cloud server as well. all the properties are still available in `application.yml` even if they are configured on the remote Spring Cloud server as well.
If you want to use the Spring Cloud config app locally, see https://forgemia.inra.fr/urgi-is/data-discovery-config If you want to use the Spring Cloud config app locally, see <https://forgemia.inra.fr/urgi-is/data-discovery-config>
The configuration is currently only read on startup, The configuration is currently only read on startup, meaning the application has to be reboot if the configuration is changed on the Spring Cloud server. For a dynamic reload without restarting the application, see <http://cloud.spring.io/spring-cloud-static/Finchley.SR1/single/spring-cloud.html#refresh-scope>
meaning the application has to be reboot if the configuration is changed on the Spring Cloud server.
For a dynamic reload without restarting the application,
see http://cloud.spring.io/spring-cloud-static/Finchley.SR1/single/spring-cloud.html#refresh-scope
to check what has to be changed. to check what has to be changed.
In case of testing configuration from the config server, one may use a dedicated branch on `data-discovery-config` project In case of testing configuration from the config server, one may use a dedicated branch on `data-discovery-config` project and append the `--spring.cloud.config.label=<branch name to test>` parameter when starting the application's executable jar. More info on [how to pass a parameter to a Spring Boot app](https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html#boot-features-external-config).
and append the `--spring.cloud.config.label=<branch name to test>` parameter when starting the application's executable jar.
More info on how pass a parameter to a Spring Boot app:
https://docs.spring.io/spring-boot/docs/current/reference/html/boot-features-external-config.html#boot-features-external-config
## Building other apps ## Building other apps
...@@ -240,18 +191,18 @@ You can also run the backend WheatIS application using ...@@ -240,18 +191,18 @@ You can also run the backend WheatIS application using
Adding this property has the following consequences: Adding this property has the following consequences:
- the generated jar file (in `backend/build/libs`) is named `wheatis.jar` instead of `rare.jar`; - the generated jar file (in `backend/build/libs`) is named `wheatis.jar` instead of `rare.jar`;
- the Spring active profile in `bootstrap.yml` is `wheatis-app` instead of `rare-app`; - the Spring active profile in `bootstrap.yml` is `wheatis-app` instead of `rare-app`;
- the frontend application built and embedded inside the jar file is the WheatIS frontend application instead of the - the frontend application built and embedded inside the jar file is the WheatIS frontend application instead of the RARe frontend application, i.e. the frontend command `yarn build:wheatis` is executed instead of the command `yarn:rare`.
RARe frontend application, i.e. the frontend command `yarn build:wheatis` is executed instead of the command `yarn:rare`.
Since the active Spring profile is different, all the properties specific to this profile Since the active Spring profile is different, all the properties specific to this profile
are applies. In particular: are applies. In particular:
- the context path of the application is `/wheatis-dev` instead of `/rare-dev`; - the context path of the application is `/wheatis-dev` instead of `/rare-dev`;
- the Elasticsearch prefix used for the index aliases is different. - the Elasticsearch prefix used for the index aliases is different.
See the `backend/src/main/resources/application.yml` file for details. See the `backend/src/main/resources/application.yml` file for details.
You can adapt the elasticsearch index used with the following parameter You can adapt the elasticsearch index used with the following parameter
java -jar backend/build/libs/data-discovery.jar --data-discovery.elasticsearch-prefix="data-discovery-staging-"
java -jar backend/build/libs/data-discovery.jar --data-discovery.elasticsearch-prefix="data-discovery-staging-"
...@@ -15,13 +15,14 @@ DESCRIPTION: ...@@ -15,13 +15,14 @@ DESCRIPTION:
Wrapper script used to create index and aliases then index data for Data Discovery portals (RARe, WheatIS and DataDiscovery) Wrapper script used to create index and aliases then index data for Data Discovery portals (RARe, WheatIS and DataDiscovery)
USAGE: USAGE:
$0 -host <ES host> -port <ES port> -app <application name> -env <environment name> [-h|--help] $0 [-host <ES host> -port <ES port> -app <application name> -env <environment name>] [--local] [-h|--help]
PARAMS: PARAMS:
-host the hostname or IP of Elasticsearch node (default: $ES_HOST), can contain several hosts (space separated, between quotes) if you want to spread the indexing load on several hosts -host the hostname or IP of Elasticsearch node (default: $ES_HOST), can contain several hosts (space separated, between quotes) if you want to spread the indexing load on several hosts
-port the port value of the targeted Elasticsearch endpoint ($ES_PORT by default) -port the port value of the targeted Elasticsearch endpoint ($ES_PORT by default)
-app the name of the targeted application: rare, wheatis or data-discovery -app the name of the targeted application: rare, wheatis or data-discovery
-env the environment name of the targeted application (dev, beta, prod ...) -env the environment name of the targeted application (dev, beta, prod ...)
--local use local environment for rare application (by default) and ignore all other options except -app if provided at a previous position
--clean clean the previous existing indices and rollover alias --clean clean the previous existing indices and rollover alias
-h or --help print this help -h or --help print this help
...@@ -49,6 +50,7 @@ while [ -n "$1" ]; do ...@@ -49,6 +50,7 @@ while [ -n "$1" ]; do
-port) ES_PORT=$2;shift 2;; -port) ES_PORT=$2;shift 2;;
-app) APP_NAME=$2;shift 2;; -app) APP_NAME=$2;shift 2;;
-env) APP_ENV=$2;shift 2;; -env) APP_ENV=$2;shift 2;;
--local) [ -z "$APP_NAME" ] && APP_NAME="rare" ; APP_ENV="dev"; ES_HOSTS="localhost"; ES_HOST="localhost" ; ES_PORT="9200"; ES_HOST='localhost';shift; break;;
--clean) CLEAN=1;shift 1;; --clean) CLEAN=1;shift 1;;
--) shift;break;; --) shift;break;;
-*) echo -e "${RED_BOLD}Unknown option: $1 ${NC}\n"&& help && echo;exit 1;; -*) echo -e "${RED_BOLD}Unknown option: $1 ${NC}\n"&& help && echo;exit 1;;
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment