Verified Commit b11527b6 authored by Raphaël Flores's avatar Raphaël Flores
Browse files

Update harvesting scripts to be usable with new index.sh wrapper.

parent 18a4edbd
# How to contribute to DataDiscovery
- [How to contribute to DataDiscovery](#how-to-contribute-to-datadiscovery)
- [Git management](#git-management)
- [Branches](#branches)
- [Commits & branches merge](#commits--branches-merge)
- [Data management](#data-management)
- [Development environment](#development-environment)
## Git management
### Branches
......@@ -58,8 +65,7 @@
- Git LFS is used to store with the application the JSON data ready to be loaded in elasticsearch. If you clone the repository without installing git LFS, the data will not be cloned. To get the JSON data, you must install git LFS (look at [how to install](https://git-lfs.github.com/)).
- Another dedicated Git LFS project (internal only) will be created to handle all private + public JSON files
- **TODO**: refer to the good Git LFS project in the CI settings for indexing the relevant data into the relevant Elasticsearch indices/instances
- the JSON files generation is handled by an external ET (extract/transform) tool.
- the JSON files generation is handled by an external ET (extract/transform) tool, only per-app suggestions are generated here using `./scripts/createSuggestions.sh`.
## Development environment
......
......@@ -12,6 +12,7 @@
- [CI](#ci)
- [Documentation](#documentation)
- [Harvest](#harvest)
- [Portability](#portability)
- [Indices and aliases](#indices-and-aliases)
- [Spring Cloud config](#spring-cloud-config)
- [Building other apps](#building-other-apps)
......@@ -170,6 +171,12 @@ by these tests. The documentation is generated in the folder `backend/build/asci
## Harvest
Before all, if you've cloned the repository with git LFS data skipped (see [Data handling](#data-handling) section), take care to fetch the relevant data before running any indexing script.
### Portability
Feedback related to portability on MacOS and other GNU/Linux distro is really welcomed. For MacOS, care to use latest GNU Parallel and Bash v4 versions, not the version provided by default via Brew.
Harvesting (i.e. importing JSON documents into Elasticsearch) consists in creating the necessary index and aliases and Elasticsearch templates.
To create the index and its aliases execute the script below for local dev environment:
......
......@@ -4,10 +4,10 @@
BASEDIR=$(dirname "$0")
# RARe index/alias
sh $BASEDIR/createIndexAndAliases4CI.sh -host localhost -app rare -env dev
sh $BASEDIR/index.sh -host localhost -app rare -env dev --no-data
# WheatIS index/alias
sh $BASEDIR/createIndexAndAliases4CI.sh -host localhost -app wheatis -env dev
sh $BASEDIR/index.sh -host localhost -app wheatis -env dev --no-data
# DataDiscovery index/alias
sh $BASEDIR/createIndexAndAliases4CI.sh -host localhost -app data-discovery -env dev
sh $BASEDIR/index.sh -host localhost -app data-discovery -env dev --no-data
......@@ -87,8 +87,7 @@ check_acknowledgment() {
jq '.acknowledged? == true' ${TMP_FILE} | grep 'true' >/dev/null || {
((CODE++)) ;
echo -e "${RED_BOLD}ERROR: unexpected response from previous command:${NC}\n${ORANGE}$(cat ${TMP_FILE})${NC}";
}
> ${TMP_FILE}
} > ${TMP_FILE}
}
# each curl command below sees its output checked for acknowledgment from Elasticsearch, else display an error with colorized output
......
......@@ -8,7 +8,7 @@ RED_BOLD="${RED}${BOLD}"
NC='\033[0m' # No format
ES_HOST=localhost
ES_HOSTS=${ES_HOST}
ES_HOSTS="${ES_HOST}"
ES_PORT=9200
TIMESTAMP=""
......
......@@ -3,4 +3,4 @@
# delegates to parameterized script
BASEDIR=$(dirname "$0")
sh $BASEDIR/harvestCI.sh -app data-discovery -env dev
sh $BASEDIR/index.sh -app data-discovery --local
......@@ -3,4 +3,4 @@
# delegates to parameterized script
BASEDIR=$(dirname "$0")
sh $BASEDIR/harvestCI.sh -app rare -env dev
sh $BASEDIR/index.sh -app rare --local
......@@ -3,4 +3,4 @@
# delegates to parameterized script
BASEDIR=$(dirname "$0")
sh $BASEDIR/harvestCI.sh -app wheatis -env dev
sh $BASEDIR/index.sh -app wheatis --local
#!/bin/bash
#set -x
RED='\033[0;31m'
GREEN='\033[0;32m'
ORANGE='\033[0;33m'
......@@ -37,6 +35,7 @@ APP_NAME=""
APP_ENV=""
TIMESTAMP=$(date +%s)
CLEAN=0
INDEX=1
# any params
[ -z "$1" ] && echo && help
......@@ -51,6 +50,7 @@ while [ -n "$1" ]; do
-app) APP_NAME=$2;shift 2;;
-env) APP_ENV=$2;shift 2;;
--local) [ -z "$APP_NAME" ] && APP_NAME="rare" ; APP_ENV="dev"; ES_HOSTS="localhost"; ES_HOST="localhost" ; ES_PORT="9200"; ES_HOST='localhost';shift; break;;
--no-data) INDEX=0;shift 1;;
--clean) CLEAN=1;shift 1;;
--) shift;break;;
-*) echo -e "${RED_BOLD}Unknown option: $1 ${NC}\n"&& help && echo;exit 1;;
......@@ -75,14 +75,15 @@ curl -s -m 5 ${ES_HOST}:${ES_PORT} > /dev/null
PREVIOUS_TIMESTAMP=$(curl -s "${ES_HOST}:${ES_PORT}/_cat/indices/${APP_NAME}*${APP_ENV}-tmstp*" | sed -r "s/.*-tmstp([0-9]+).*/\1/g" | sort -ru | head -1) # no index yet created with current timestamp. So using the latest as previous timestamp.
# Create index, aliases with their mapping
sh ${BASEDIR}/createIndexAndAliases4CI.sh -host "$ES_HOST" -port "$ES_PORT" -app "$APP_NAME" -env "$APP_ENV" -timestamp "$TIMESTAMP"
sh "${BASEDIR}"/createIndexAndAliases4CI.sh -host "$ES_HOST" -port "$ES_PORT" -app "$APP_NAME" -env "$APP_ENV" -timestamp "$TIMESTAMP"
CODE=$?
[ $CODE -gt 0 ] && { echo -e "${RED_BOLD}Error when creating index, see errors above. Exiting.${NC}" ; exit $CODE ; }
#exit 0
echo
# Does index data in created indices
sh ${BASEDIR}/harvestCI.sh -host "$ES_HOSTS" -port "$ES_PORT" -app "$APP_NAME" -env "$APP_ENV" -timestamp "$TIMESTAMP"
if [ "1" -eq "${INDEX}" ]; then
sh "${BASEDIR}"/harvestCI.sh -host "$ES_HOSTS" -port "$ES_PORT" -app "$APP_NAME" -env "$APP_ENV" -timestamp "$TIMESTAMP"
fi
CODE=$?
[ $CODE -gt 0 ] && { echo -e "${RED_BOLD}Error when indexing data, see errors above. Exiting.${NC}" ; exit $CODE ; }
......
Markdown is supported
0% or .
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment