Skip to content
GitLab
Menu
Projects
Groups
Snippets
Help
Help
Support
Community forum
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
Menu
Open sidebar
urgi-is
data-discovery
Commits
f77f9f12
Commit
f77f9f12
authored
Aug 08, 2018
by
Jean-Baptiste Nizet
Browse files
feat: use aliases for the search and for the harvest, explain deletion and migration procedures
parent
933d6aa0
Changes
11
Hide whitespace changes
Inline
Side-by-side
README.md
View file @
f77f9f12
...
...
@@ -74,9 +74,13 @@ You can approximate what runs on CI by executing:
## Harvest
Harvesting (i.e. importing genetic resources stored in JSON files into ElasticSearch) consists in
placing the JSON files into a directory where the server can find them.
creating the necessary index and aliases, and then
placing the JSON files into a directory where the server can find them.
This directory, by default is
`/tmp/rare/resources`
. But it's externalized into the Spring Boot property
To create the index and its aliases execute the script
./scripts/createIndexAndAliases.sh
The directory, by default is
`/tmp/rare/resources`
. But it's externalized into the Spring Boot property
`rare.resource-dir`
, so it can be easily changed by modifying the value of this property (using an
environment variable for example).
...
...
@@ -87,4 +91,101 @@ build task `asciidoctor`, which executes tests and generates documentation based
by these tests. The documentation is generated in the folder
`backend/build/asciidoc/html5/index.html`
/
./gradlew asciidoctor
## Indices and aliases
The application uses two physical indices:
*
one to store the harvest results. This one is created automatically if it doesn't exist yet when the application starts.
It doesn't contain important data, and can be deleted and recreated if really needed.
*
one to store physical resources. This one must be created explicitly before using the application. If not,
requests to the web services will return errors.
The application doesn't use the physical resources index directly. Instead, it uses two aliases, that must be created
before using the application:
*
`resource-index`
is the alias used by the application to search for genetic resources
*
`resource-harvest-index`
is the alias used by the application to store genetic resources when the harvest is triggered.
In normal operations, these two aliases should refer to the same physical resource index. The script
`createIndexAndAliases.sh`
creates a physical index (named
`resource-physical-index`
) and creates these two aliases
referring to this physical index.
Once the index and the aliases have been created, a harvest can be triggered. The first operation that a harvest
does is to create or update (put) the mapping for the genetic resource entity into the index aliased by
`resource-harvest-index`
.
Then it parses the JSON files and stores them into this same index. Since the
`resource-index`
alias
normally refers to the same physical index, searches will find the resources stored by the harvester.
### Why two aliases
Using two aliases is useful when deleting obsolete documents. This is actually done by removing everything
and then harvesting the new JSON files again, to re-populate the index from scratch.
Two scenarios are possible:
#### Deleting with some downtime
The harvest duration depends on the performance of Elasticsearch, of the performance of the harvester, and
of course, of the number of documents to harvest. If you don't mind about having a period of time
where the documents are not available, you can simply
-
delete the physical index;
-
re-create it with its aliases;
-
trigger a new harvest.
Keep in mind that, with the current known set of documents (17172), on a development machine where everything is running
concurrently, when both the Elasticsearch server and the application are hot, a harvest only takes 12 seconds.
So, even if you have 10 times that number of documents (170K documents), it should only take around 2 minutes of downtime.
If you have 100 times that number of documents (1.7M documents), it should take around 20 minutes, which is still not a
very long time.
(Your mileage may vary: I assumed a linear complexity here).
#### Deleting with no downtime
If you don't want any downtime, you can instead use the following procedure:
-
create a new physical index (let's name it
`resource-new-physical-index`
);
-
delete the
`resource-harvest-index`
alias, and recreate it so that it refers to
`resource-new-physical-index`
;
-
trigger a harvest. During the harvest, the
`resource-index`
alias, used by the search,
still refers to the old physical index, and it thus still works flawlessly;
-
once the harvest is finished, delete the
`resource-index`
alias, and recreate it so that it refers to
`resource-new-physical-index`
. All the search operations will now use the new index, containing up-to-date
documents;
-
delete the old physical index.
### Mapping migration
Another situation where you might need to reindex all the documents is when the mapping has changed and a new version
of the application must be redeployed.
#### Upgrading with some downtime
This is the easiest and safest procedure, that I would recommend:
-
create a new physical index (let's name it
`resource-new-physical-index`
);
-
delete the
`resource-harvest-index`
and the
`resource-index`
aliases, and recreate them both so that they refer to
`resource-new-physical-index`
;
-
stop the existing application, deploy and start the new one;
-
trigger a harvest;
-
once everything is running fine, remove the old physical index.
In case anything goes wrong, the two aliases can always be recreated to refer to the old physical index, and the old
application can be restarted.
#### Upgrading with a very short downtime (or no downtime at all)
-
create a new physical index (let's name it
`resource-new-physical-index`
);
-
delete the
`resource-harvest-index`
alias, and recreate it so that it refers to
`resource-new-physical-index`
;
-
start the new application, on another machine, or on a different port, so that the new application code can be
used to trigger a harvest with the new schema, while the old application is still running and exposed to the users
-
trigger the harvest on the
**new**
application
-
once the harvest is finished, delete the
`resource-harvest-index`
alias, and recreate it so that it refers to
`resource-new-physical-index`
;
-
expose the new application to the users instead of the old one
-
stop the old application
How you execute these various steps depend on the production infrastructure, which is unknown to me. You could
use your own development server to start the new application and do the harvest, and then stop the production application,
deploy the new one and start it. Or you could have a reverse proxy in front of the application, and change its
configuration to route to the new application once the harvest is done, for example.
backend/src/main/java/fr/inra/urgi/rare/dao/GeneticResourceDaoCustom.java
View file @
f77f9f12
...
...
@@ -48,4 +48,9 @@ public interface GeneticResourceDaoCustom {
void
saveAll
(
Collection
<
IndexedGeneticResource
>
indexedGeneticResources
);
Terms
findPillars
();
/**
* Puts the mapping for the alias of the {@link IndexedGeneticResource} document
*/
void
putMapping
();
}
backend/src/main/java/fr/inra/urgi/rare/dao/GeneticResourceDaoImpl.java
View file @
f77f9f12
...
...
@@ -250,6 +250,11 @@ public class GeneticResourceDaoImpl implements GeneticResourceDaoCustom {
return
geneticResources
.
getAggregations
().
get
(
pillarAggregationName
);
}
@Override
public
void
putMapping
()
{
elasticsearchTemplate
.
putMapping
(
IndexedGeneticResource
.
class
);
}
private
IndexQuery
createIndexQuery
(
IndexedGeneticResource
entity
)
{
IndexQuery
query
=
new
IndexQuery
();
query
.
setObject
(
entity
);
...
...
backend/src/main/java/fr/inra/urgi/rare/domain/GeneticResource.java
View file @
f77f9f12
...
...
@@ -10,17 +10,24 @@ import com.fasterxml.jackson.annotation.JsonCreator;
import
com.fasterxml.jackson.annotation.JsonProperty
;
import
org.springframework.data.annotation.Id
;
import
org.springframework.data.elasticsearch.annotations.Document
;
import
org.springframework.data.elasticsearch.annotations.Mapping
;
/**
* A genetic resource, as loaded from a JSON file, and stored in ElasticSearch
* A genetic resource, as loaded from a JSON file, and stored in ElasticSearch.
*
* This document is used by all the search operations, but not by the harvesting process, which instead uses
* {@link IndexedGeneticResource}. Its index is in fact an alias which typically refers to the same physical index as
* the alias used by {@link IndexedGeneticResource}, except when we want to harvest to a new index
* (in order to delete obsolete documents, or to accomodate with incompatible schema changes). In that case, once the
* harvest process is finished, the alias of {@link GeneticResource} can be modified to refer to the new physical
* index, in order to start searching in the newly harvested documents.
*
* @author JB Nizet
*/
@Document
(
indexName
=
"#{@rareProperties.getElasticsearchPrefix()}resource-index"
,
type
=
"#{@rareProperties.getElasticsearchPrefix()}resource"
type
=
"#{@rareProperties.getElasticsearchPrefix()}resource"
,
createIndex
=
false
)
@Mapping
(
mappingPath
=
"fr/inra/urgi/rare/domain/GeneticResource.mapping.json"
)
public
class
GeneticResource
{
/**
...
...
backend/src/main/java/fr/inra/urgi/rare/domain/IndexedGeneticResource.java
View file @
f77f9f12
...
...
@@ -16,17 +16,26 @@ import org.apache.lucene.analysis.en.EnglishAnalyzer;
import
org.apache.lucene.analysis.standard.StandardTokenizer
;
import
org.apache.lucene.analysis.tokenattributes.CharTermAttribute
;
import
org.springframework.data.elasticsearch.annotations.Document
;
import
org.springframework.data.elasticsearch.annotations.Mapping
;
/**
* A class containing all the fields of a GeneticResource, and additional fields used uniquely for indexing
* and which thus make it possible or easier to implement completion suggestions.
*
* This document is used by the harvesting process. Its index is in fact an alias which typically refers to the same
* physical index as the alias used by {@link GeneticResource}, except when we want to harvest to a new index
* (in order to delete obsolete documents, or to accomodate with incompatible schema changes). In that case, once the
* harvest process is finished, the alias of {@link GeneticResource} can be modified to refer to the new physical
* index, in order to start searching in the newly harvested documents.
*
* @author JB Nizet
*/
@Document
(
indexName
=
"#{@rareProperties.getElasticsearchPrefix()}resource-index"
,
indexName
=
"#{@rareProperties.getElasticsearchPrefix()}resource-
harvest-
index"
,
type
=
"#{@rareProperties.getElasticsearchPrefix()}resource"
,
createIndex
=
false
)
@Mapping
(
mappingPath
=
"fr/inra/urgi/rare/domain/GeneticResource.mapping.json"
)
public
final
class
IndexedGeneticResource
{
@JsonUnwrapped
private
final
GeneticResource
geneticResource
;
...
...
backend/src/main/java/fr/inra/urgi/rare/harvest/HarvesterController.java
View file @
f77f9f12
...
...
@@ -3,6 +3,7 @@ package fr.inra.urgi.rare.harvest;
import
java.net.URI
;
import
java.util.Optional
;
import
fr.inra.urgi.rare.dao.GeneticResourceDao
;
import
fr.inra.urgi.rare.dao.HarvestResultDao
;
import
fr.inra.urgi.rare.dto.PageDTO
;
import
fr.inra.urgi.rare.exception.NotFoundException
;
...
...
@@ -31,15 +32,20 @@ public class HarvesterController {
private
final
AsyncHarvester
asyncHarvester
;
private
final
HarvestResultDao
harvestResultDao
;
private
final
GeneticResourceDao
geneticResourceDao
;
public
HarvesterController
(
AsyncHarvester
asyncHarvester
,
HarvestResultDao
harvestResultDao
)
{
HarvestResultDao
harvestResultDao
,
GeneticResourceDao
geneticResourceDao
)
{
this
.
asyncHarvester
=
asyncHarvester
;
this
.
harvestResultDao
=
harvestResultDao
;
this
.
geneticResourceDao
=
geneticResourceDao
;
}
@PostMapping
public
ResponseEntity
<?>
harvest
()
{
this
.
geneticResourceDao
.
putMapping
();
HarvestResultBuilder
resultBuilder
=
HarvestResult
.
builder
();
HarvestResult
temporaryHarvestResult
=
resultBuilder
.
build
();
...
...
backend/src/test/java/fr/inra/urgi/rare/dao/GeneticResourceDaoTest.java
View file @
f77f9f12
...
...
@@ -14,16 +14,21 @@ import fr.inra.urgi.rare.domain.GeneticResource;
import
fr.inra.urgi.rare.domain.IndexedGeneticResource
;
import
org.elasticsearch.search.aggregations.bucket.terms.Terms
;
import
org.elasticsearch.search.aggregations.bucket.terms.Terms.Bucket
;
import
org.junit.jupiter.api.BeforeAll
;
import
org.junit.jupiter.api.BeforeEach
;
import
org.junit.jupiter.api.Nested
;
import
org.junit.jupiter.api.Test
;
import
org.junit.jupiter.api.TestInstance
;
import
org.junit.jupiter.api.extension.ExtendWith
;
import
org.springframework.beans.factory.annotation.Autowired
;
import
org.springframework.boot.test.autoconfigure.json.JsonTest
;
import
org.springframework.context.annotation.Import
;
import
org.springframework.data.domain.PageRequest
;
import
org.springframework.data.domain.Pageable
;
import
org.springframework.data.elasticsearch.core.ElasticsearchTemplate
;
import
org.springframework.data.elasticsearch.core.aggregation.AggregatedPage
;
import
org.springframework.data.elasticsearch.core.mapping.ElasticsearchPersistentEntity
;
import
org.springframework.data.elasticsearch.core.query.AliasBuilder
;
import
org.springframework.test.context.TestPropertySource
;
import
org.springframework.test.context.junit.jupiter.SpringExtension
;
...
...
@@ -31,13 +36,40 @@ import org.springframework.test.context.junit.jupiter.SpringExtension;
@TestPropertySource
(
"/test.properties"
)
@Import
(
ElasticSearchConfig
.
class
)
@JsonTest
@TestInstance
(
TestInstance
.
Lifecycle
.
PER_CLASS
)
class
GeneticResourceDaoTest
{
private
static
final
String
PHYSICAL_INDEX
=
"test-resource-physical-index"
;
@Autowired
private
GeneticResourceDao
geneticResourceDao
;
@Autowired
private
ElasticsearchTemplate
elasticsearchTemplate
;
private
Pageable
firstPage
=
PageRequest
.
of
(
0
,
10
);
@BeforeAll
void
prepareIndex
()
{
ElasticsearchPersistentEntity
indexedGeneticResourceEntity
=
elasticsearchTemplate
.
getPersistentEntityFor
(
IndexedGeneticResource
.
class
);
ElasticsearchPersistentEntity
geneticResourceEntity
=
elasticsearchTemplate
.
getPersistentEntityFor
(
GeneticResource
.
class
);
elasticsearchTemplate
.
deleteIndex
(
PHYSICAL_INDEX
);
elasticsearchTemplate
.
createIndex
(
PHYSICAL_INDEX
);
elasticsearchTemplate
.
addAlias
(
new
AliasBuilder
().
withAliasName
(
indexedGeneticResourceEntity
.
getIndexName
())
.
withIndexName
(
PHYSICAL_INDEX
)
.
build
()
);
elasticsearchTemplate
.
addAlias
(
new
AliasBuilder
().
withAliasName
(
geneticResourceEntity
.
getIndexName
())
.
withIndexName
(
PHYSICAL_INDEX
)
.
build
()
);
elasticsearchTemplate
.
putMapping
(
IndexedGeneticResource
.
class
);
}
@BeforeEach
void
prepare
()
{
geneticResourceDao
.
deleteAll
();
...
...
backend/src/test/java/fr/inra/urgi/rare/harvest/HarvesterControllerDocTest.java
View file @
f77f9f12
...
...
@@ -18,6 +18,7 @@ import java.util.Arrays;
import
java.util.Base64
;
import
java.util.Optional
;
import
fr.inra.urgi.rare.dao.GeneticResourceDao
;
import
fr.inra.urgi.rare.dao.HarvestResultDao
;
import
fr.inra.urgi.rare.doc.DocumentationConfig
;
import
org.hamcrest.CoreMatchers
;
...
...
@@ -54,6 +55,9 @@ class HarvesterControllerDocTest {
@MockBean
private
HarvestResultDao
mockHarvestResultDao
;
@MockBean
private
GeneticResourceDao
mockGeneticResourceDao
;
@Autowired
private
MockMvc
mockMvc
;
...
...
backend/src/test/java/fr/inra/urgi/rare/harvest/HarvesterControllerSecurityTest.java
View file @
f77f9f12
...
...
@@ -5,6 +5,7 @@ import static org.springframework.test.web.servlet.request.MockMvcRequestBuilder
import
static
org
.
springframework
.
test
.
web
.
servlet
.
result
.
MockMvcResultMatchers
.
status
;
import
fr.inra.urgi.rare.config.SecurityConfig
;
import
fr.inra.urgi.rare.dao.GeneticResourceDao
;
import
fr.inra.urgi.rare.dao.HarvestResultDao
;
import
org.junit.jupiter.api.Test
;
import
org.junit.jupiter.api.extension.ExtendWith
;
...
...
@@ -30,6 +31,9 @@ class HarvesterControllerSecurityTest {
@MockBean
private
HarvestResultDao
mockHarvestResultDao
;
@MockBean
private
GeneticResourceDao
mockGeneticResourceDao
;
@Autowired
private
MockMvc
mockMvc
;
...
...
backend/src/test/java/fr/inra/urgi/rare/harvest/HarvesterControllerTest.java
View file @
f77f9f12
package
fr.inra.urgi.rare.harvest
;
import
static
org
.
mockito
.
Mockito
.
verify
;
import
static
org
.
mockito
.
Mockito
.
when
;
import
static
org
.
springframework
.
test
.
web
.
servlet
.
request
.
MockMvcRequestBuilders
.
get
;
import
static
org
.
springframework
.
test
.
web
.
servlet
.
request
.
MockMvcRequestBuilders
.
post
;
...
...
@@ -8,6 +9,7 @@ import static org.springframework.test.web.servlet.result.MockMvcResultMatchers.
import
java.util.Arrays
;
import
java.util.Optional
;
import
fr.inra.urgi.rare.dao.GeneticResourceDao
;
import
fr.inra.urgi.rare.dao.HarvestResultDao
;
import
org.hamcrest.CustomTypeSafeMatcher
;
import
org.hamcrest.Matcher
;
...
...
@@ -35,6 +37,9 @@ class HarvesterControllerTest {
@MockBean
private
HarvestResultDao
mockHarvestResultDao
;
@MockBean
private
GeneticResourceDao
mockGeneticResourceDao
;
@Autowired
private
MockMvc
mockMvc
;
...
...
@@ -60,6 +65,8 @@ class HarvesterControllerTest {
mockMvc
.
perform
(
post
(
"/api/harvests"
))
.
andExpect
(
status
().
isCreated
())
.
andExpect
(
header
().
string
(
HttpHeaders
.
LOCATION
,
matches
(
"^(.*)/api/harvests/(.+)$"
)));
verify
(
mockGeneticResourceDao
).
putMapping
();
}
@Test
...
...
scripts/createIndexAndAliases.sh
0 → 100755
View file @
f77f9f12
#!/bin/bash
BASEDIR
=
$(
dirname
"
$0
"
)
curl
-X
PUT
"localhost:9200/resource-physical-index"
-H
'Content-Type: application/json'
-d
'
{
"aliases" : {
"resource-index" : {},
"resource-harvest-index" : {}
}
}
'
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
.
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment