Verified Commit d048d5bd authored by Bilal.El-Houdaigui's avatar Bilal.El-Houdaigui Committed by Raphaël Flores
Browse files

Add GO Annotation facet. Fix GNP-5446.

Add annotation_id, annotation_name and ancestors to mapping and used annotation_name for GO facet. GNP-5446

add search in annotation_id and ancestors to GO Annotation. GNP-5446

switched GO extraction from annotation_name to the frontend and fixed facet display (To fix: badge show GO id only).GNP-5446

Clean code and facet working with annotation_name (ancestors to add next). GNP-5446

add boolQuery to in order to include ancestors in refinementQuery. GNP-5446

Add descendants-checkbox component (still WIP). GNP-5446

Finalize checkbox button. code cleaning (next fix tests).GNP-5446

Fix broken backend tests. GNP-5446

fix first part of front tests. GNP-5717

fix part of frontend tests. GNP-5717

Remove LOCAL_ID that is not present in master

Fix compil error

Remove unused variable spotted by lint

Improved doc for macos and indexing

Moved to camel case on AnnotationId. WIP.

Fix broken facets/aggregations

Code slight generalisation and concistency

Improved and generalized descendant handling with search on ID rather than full name

Fix test

Remove a rare test from WheatISDocumentDaoTest

Add test for annotation backend

Fix front tests & code cleaning

Add elasticsearch debug options

Remove OpenMinTeD docs from WheatIS data files. GNP-5446.

Remove OpenMinTeD docs from data-discovery data files. GNP-5446.

Adding Biblio data.

Updated URGI node to INRAE-URGI

Avoid 'Argument list too long error'. fix GNP-5892

Add RARE and WheatIS suggestion following Biblio data inclusion. fix GNP-5446

Add datadiscovery suggestion following Biblio data inclusion. fix GNP-5446

Correction of help icon behavior. fix GNP-5892

UI Adjustments

Generate datadiscovery suggestions. fix GNP-5446

Generate rare and wheatis suggestions. fix GNP-5446

Update documentation.
parent faeaa06b
......@@ -62,6 +62,12 @@ $ git lfs pull -I data/rare/
Downloading LFS objects: 100% (16/16), 8.8 MB | 0 B/s
```
Git might request you to enable additional parameters, which is acceptable:
```sh
git config lfs.https://forgemia.inra.fr/urgi-is/data-discovery.git/info/lfs.locksverify true
```
### Backend
The project uses Spring (5.x) for the backend, with Spring Boot.
......@@ -91,14 +97,12 @@ You can stop the Elasticsearch and Kibana instances by running:
docker-compose stop
```
or run:
or run the following command to also remove the stopped containers as well as any networks that were created:
```sh
docker-compose down
```
to also remove the stopped containers as well as any networks that were created.
### Frontend
The project uses Angular (8.x) for the frontend, with the Angular CLI.
......@@ -200,6 +204,8 @@ Before all, if you have cloned the repository without fetching the data (see [Da
Feedback related to portability on MacOS and other GNU/Linux distro is really welcomed.
For MacOS, care to use latest GNU Parallel and Bash v4 versions, not the version provided by default via Brew.
Don't use zsh!
Install the following packages to be able to run the scripts:
```sh
......@@ -211,7 +217,7 @@ Harvesting (i.e. importing JSON documents into Elasticsearch) consists in creati
To create the index and its aliases execute the script below for local dev environment:
```sh
./scripts/index.sh -app rare|wheat|data-discovery --local
./scripts/index.sh -app rare|wheatis|data-discovery --local
```
The -app parameter will trigger a harvest of the resources stored in the Git LFS directories `data/rare`, `data/wheatis` and `data/data-discovery` respectively.
......@@ -303,6 +309,12 @@ You can adapt the elasticsearch index used with the following parameter
java -jar backend/build/libs/data-discovery.jar --data-discovery.elasticsearch-prefix="data-discovery-staging-"
```
For debuging:
```sh
java -jar backend/build/libs/data-discovery.jar --debug
```
## Configuration
The RARe and RARe with basket applications can be configured to apply an implicit filtering on the searches,
......
......@@ -16,100 +16,6 @@ To use the actual deployed web services, you must of course use the actual proto
Some headers are also removed from the snippets here, to improve readability.
== Harvests
=== Trigger a harvest
Harvesting, i.e. importing resources from JSON files into Elasticsearch, is simply done by copying the JSON
files to the configured `data-discovery.resource-dir` directory, and then sending a POST request, without any body,
to trigger the harvest.
Harvested resources which already exist are updated (they are identified by their `identifier` property).
Note that, to avoid letting anyone trigger a harvest, the endpoints are secured using basic authentication. The
user and the password are configured in the Spring configuration. The following snippets assume the user is
`rare`, and the password is `f01a7031fc17`. If you don't provide the authentication information, or if it's incorrect
you'll get back a response with the status `401 Unauthorized`.
.Request
include::{snippets}/harvests/post/http-request.adoc[]
.Curl
include::{snippets}/harvests/post/curl-request.adoc[]
.HTTPie
include::{snippets}/harvests/post/httpie-request.adoc[]
The response is immediate: the harvesting job is executed asynchronously. It only contains a Location header
containing the URL of the endpoint that you can query to know the status of the harvest. Each triggered harvest
has a unique auto-generated ID, found at the end of the URL.
.Response
include::{snippets}/harvests/post/http-response.adoc[]
=== Get a harvest
You can get a harvest to know if the harvest is finished or not, and to know which files have already been harvested,
and which errors occurred during the harvest. The report is pretty detailed, and tries its best to provide indices,
line and colum numbers as well as error messages allowing to identify what and where the errors are.
Note that files are processed sequentially, and that resources are parsed one by one, using a the Jackson streaming
parser. This allows harvesting enormous files if needed without fearing any memory problem. It also allows
parsing a file even if one of its resources has an error and thus can't be parsed correctly.
.Request
include::{snippets}/harvests/get/http-request.adoc[]
.Path parameters
include::{snippets}/harvests/get/path-parameters.adoc[]
.Curl
include::{snippets}/harvests/get/curl-request.adoc[]
.HTTPie
include::{snippets}/harvests/get/httpie-request.adoc[]
.Response
include::{snippets}/harvests/get/http-response.adoc[]
.Response fields
include::{snippets}/harvests/get/response-fields.adoc[]
=== List harvests
If you lost or forgot the URL of the harvest you have triggered, and want to see how it went, don't panic. You can
send a GET request to list the last 10 harvests that have been triggered.
.Request
include::{snippets}/harvests/list/http-request.adoc[]
.Curl
include::{snippets}/harvests/list/curl-request.adoc[]
.HTTPie
include::{snippets}/harvests/list/httpie-request.adoc[]
.Response
include::{snippets}/harvests/list/http-response.adoc[]
.Response fields
include::{snippets}/harvests/list/response-fields.adoc[]
As you can see, you actually get back a page of results. In the unlikely case you want to know about old harvests,
you can request other pages than the latest one by passing the page as request parameter:
.Request
include::{snippets}/harvests/list2/http-request.adoc[]
.Request parameters
include::{snippets}/harvests/list2/request-parameters.adoc[]
.Curl
include::{snippets}/harvests/list2/curl-request.adoc[]
.HTTPie
include::{snippets}/harvests/list2/httpie-request.adoc[]
== Documents (aka Search)
=== Full text search
......
......@@ -5,12 +5,9 @@ import static fr.inra.urgi.datadiscovery.dao.DocumentDao.PORTAL_URL_AGGREGATION_
import static org.elasticsearch.index.query.QueryBuilders.*;
import java.io.IOException;
import java.util.Collection;
import java.util.Collections;
import java.util.HashSet;
import java.util.List;
import java.util.Set;
import java.util.TreeSet;
import java.util.*;
import java.util.regex.Matcher;
import java.util.regex.Pattern;
import java.util.stream.Collectors;
import fr.inra.urgi.datadiscovery.domain.IndexedDocument;
......@@ -20,10 +17,7 @@ import org.elasticsearch.action.search.SearchRequest;
import org.elasticsearch.action.search.SearchResponse;
import org.elasticsearch.client.RequestOptions;
import org.elasticsearch.client.RestHighLevelClient;
import org.elasticsearch.index.query.BoolQueryBuilder;
import org.elasticsearch.index.query.MultiMatchQueryBuilder;
import org.elasticsearch.index.query.QueryBuilder;
import org.elasticsearch.index.query.TermsQueryBuilder;
import org.elasticsearch.index.query.*;
import org.elasticsearch.search.aggregations.AggregationBuilders;
import org.elasticsearch.search.aggregations.Aggregations;
import org.elasticsearch.search.aggregations.bucket.filter.Filter;
......@@ -44,11 +38,7 @@ import org.springframework.data.elasticsearch.core.ElasticsearchRestTemplate;
import org.springframework.data.elasticsearch.core.aggregation.AggregatedPage;
import org.springframework.data.elasticsearch.core.aggregation.impl.AggregatedPageImpl;
import org.springframework.data.elasticsearch.core.mapping.ElasticsearchPersistentEntity;
import org.springframework.data.elasticsearch.core.query.DeleteQuery;
import org.springframework.data.elasticsearch.core.query.FetchSourceFilterBuilder;
import org.springframework.data.elasticsearch.core.query.IndexQuery;
import org.springframework.data.elasticsearch.core.query.NativeSearchQueryBuilder;
import org.springframework.data.elasticsearch.core.query.SourceFilter;
import org.springframework.data.elasticsearch.core.query.*;
/**
* Base class for implementations of {@link DocumentDao}
......@@ -86,9 +76,10 @@ public abstract class AbstractDocumentDaoImpl<D extends SearchDocument, I extend
@Override
public AggregatedPage<D> search(String query,
boolean highlight,
boolean descendants,
SearchRefinements refinements,
Pageable page) {
NativeSearchQueryBuilder builder = getQueryBuilder(query, refinements, page);
NativeSearchQueryBuilder builder = getQueryBuilder(query, refinements, page, descendants);
if (highlight) {
builder.withHighlightFields(
......@@ -97,18 +88,19 @@ public abstract class AbstractDocumentDaoImpl<D extends SearchDocument, I extend
).withHighlightBuilder(new HighlightBuilder().encoder("html"));
}
logger.debug(builder.build().toString());
return elasticsearchTemplate.queryForPage(builder.build(),getDocumentClass(),
documentHighlightMapper);
}
@Override
public AggregatedPage<D> aggregate(String query, SearchRefinements refinements) {
public AggregatedPage<D> aggregate(String query, SearchRefinements refinements, boolean descendants) {
NativeSearchQueryBuilder builder = getQueryBuilder(query, refinements, PageRequest.of(0,1));
NativeSearchQueryBuilder builder = getQueryBuilder(query, refinements, PageRequest.of(0,1), descendants);
getAppAggregations().forEach(appAggregation -> {
FilterAggregationBuilder filterAggregation = createFilterAggregation(appAggregation, refinements);
FilterAggregationBuilder filterAggregation = createFilterAggregation(appAggregation, refinements, descendants);
builder.addAggregation(filterAggregation);
});
......@@ -121,7 +113,24 @@ public abstract class AbstractDocumentDaoImpl<D extends SearchDocument, I extend
return result;
}
private NativeSearchQueryBuilder getQueryBuilder(String query, SearchRefinements refinements, Pageable page) {
protected List<String> getAnnotationsIds(SearchRefinements refinements){
List<String> annotIds = new ArrayList<String>();
Pattern pattern = Pattern.compile("\\(\\w{2,6}:\\d{7}\\)$");
for (AppAggregation term : refinements.getTerms()) {
if(term.getName().equals("annot")) {
Set<String> annotRefinments = refinements.getRefinementsForTerm(term);
for (String annotRefinment : annotRefinments) {
Matcher matcher = pattern.matcher(annotRefinment);
if (matcher.find()) {
annotIds.add(matcher.group(0).replace("(","").replace(")",""));
}
}
}
}
return annotIds;
}
private NativeSearchQueryBuilder getQueryBuilder(String query, SearchRefinements refinements, Pageable page, boolean descendants) {
// this full text query is executed, and its results are used to compute aggregations
MultiMatchQueryBuilder fullTextQuery = multiMatchQuery(query, getSearchableFields().toArray(new String[0]));
......@@ -130,12 +139,12 @@ public abstract class AbstractDocumentDaoImpl<D extends SearchDocument, I extend
// See https://www.elastic.co/guide/en/elasticsearch/reference/current/search-request-post-filter.html
BoolQueryBuilder refinementQuery = boolQuery();
for (AppAggregation term : refinements.getTerms()) {
refinementQuery.must(createRefinementQuery(refinements, term));
refinementQuery.must(createRefinementQuery(refinements, term, descendants));
}
SearchRefinements implicitRefinements = getImplicitSearchRefinements();
for (AppAggregation term : implicitRefinements.getTerms()) {
refinementQuery.must(createRefinementQuery(implicitRefinements, term));
refinementQuery.must(createRefinementQuery(implicitRefinements, term, descendants));
}
// this allows avoiding to get back the suggestions field in the found documents, since we don't care
......@@ -151,10 +160,10 @@ public abstract class AbstractDocumentDaoImpl<D extends SearchDocument, I extend
private FilterAggregationBuilder createFilterAggregation(AppAggregation appAggregation,
SearchRefinements searchRefinements) {
SearchRefinements searchRefinements, boolean descendants) {
return AggregationBuilders.filter(
appAggregation.getName(),
createQueryForAllRefinementsExcept(searchRefinements, appAggregation)
createQueryForAllRefinementsExcept(searchRefinements, appAggregation, descendants)
).subAggregation(
AggregationBuilders.terms(appAggregation.getName())
.field(appAggregation.getField())
......@@ -163,16 +172,16 @@ public abstract class AbstractDocumentDaoImpl<D extends SearchDocument, I extend
}
private QueryBuilder createQueryForAllRefinementsExcept(SearchRefinements refinements,
AppAggregation appAggregation) {
AppAggregation appAggregation, boolean descendants) {
BoolQueryBuilder refinementQuery = boolQuery();
for (AppAggregation term : refinements.getTerms()) {
if (!term.equals(appAggregation)) {
refinementQuery.must(createRefinementQuery(refinements, term));
refinementQuery.must(createRefinementQuery(refinements, term, descendants));
}
}
SearchRefinements implicitRefinements = getImplicitSearchRefinements();
for (AppAggregation term : implicitRefinements.getTerms()) {
refinementQuery.must(createRefinementQuery(implicitRefinements, term));
refinementQuery.must(createRefinementQuery(implicitRefinements, term, descendants));
}
return refinementQuery;
}
......@@ -208,7 +217,7 @@ public abstract class AbstractDocumentDaoImpl<D extends SearchDocument, I extend
* </li>
* <li>
* The field is an array, and can be an empty array. In that case, it's considered by ElasticSearch as
* missing, but the aggregation created in {@link DocumentDaoCustom#search(String, boolean, SearchRefinements, Pageable)}
* missing, but the aggregation created in {@link DocumentDaoCustom#search(String, boolean, boolean, SearchRefinements, Pageable)}
* puts missing values in the bucket with the key {@link SearchDocument#NULL_VALUE}. So, the aggregation
* considers null values and missing values as the same value: {@link SearchDocument#NULL_VALUE}.
* It's still considered, when searching, as a missing value though. So, if {@link SearchDocument#NULL_VALUE}
......@@ -220,16 +229,29 @@ public abstract class AbstractDocumentDaoImpl<D extends SearchDocument, I extend
* </li>
* </ul>
*/
private QueryBuilder createRefinementQuery(SearchRefinements refinements, AppAggregation term) {
private QueryBuilder createRefinementQuery(SearchRefinements refinements, AppAggregation term, boolean descendants) {
Set<String> acceptedValues = refinements.getRefinementsForTerm(term);
TermsQueryBuilder termsQuery = termsQuery(term.getField(), acceptedValues);
BoolQueryBuilder boolQuery = QueryBuilders.boolQuery();
if (term.getName().equals("annot")) {
List<String> goIds = getAnnotationsIds(refinements);
boolQuery.should(termsQuery("annotationId.keyword", goIds));
if(descendants) {
for(int i =0 ; i < goIds.size(); i++) {
boolQuery.should(prefixQuery("ancestors.keyword", goIds.get(i))) ;
}
}
}else{
boolQuery.should(termsQuery);
}
if (acceptedValues.contains(SearchDocument.NULL_VALUE)) {
return boolQuery()
.should(termsQuery)
.should(boolQuery)
.should(boolQuery().mustNot(existsQuery(term.getField())));
}
else {
return termsQuery;
return boolQuery;
}
}
......@@ -338,7 +360,7 @@ public abstract class AbstractDocumentDaoImpl<D extends SearchDocument, I extend
BoolQueryBuilder refinementQuery = boolQuery();
SearchRefinements implicitRefinements = getImplicitSearchRefinements();
for (AppAggregation term : implicitRefinements.getTerms()) {
refinementQuery.must(createRefinementQuery(implicitRefinements, term));
refinementQuery.must(createRefinementQuery(implicitRefinements, term, false));
}
NativeSearchQueryBuilder builder = new NativeSearchQueryBuilder()
.withQuery(refinementQuery)
......
......@@ -21,6 +21,7 @@ public interface DocumentDaoCustom<D extends SearchDocument, I extends IndexedDo
*/
AggregatedPage<D> search(String query,
boolean highlight,
boolean descendants,
SearchRefinements refinements,
Pageable page);
......@@ -29,7 +30,7 @@ public interface DocumentDaoCustom<D extends SearchDocument, I extends IndexedDo
* Separation from the search method for performances reasons
*/
AggregatedPage<D> aggregate(String query,
SearchRefinements refinements);
SearchRefinements refinements, boolean descendants);
/**
* Suggests completions for the given term. It typically autocompletes all the fields except the identifier, the URL and
......
......@@ -18,9 +18,9 @@ import static fr.inra.urgi.datadiscovery.dao.AppAggregation.Type.SMALL;
public enum WheatisAggregation implements AppAggregation {
SPECIES("species", "species.keyword", LARGE),
ENTRY_TYPE("entry", "entryType.keyword", LARGE),
GO_ANNOTATION("annot", "annotationName.keyword", LARGE),
DATABASE_NAME("db", "databaseName.keyword", LARGE),
NODE("node", "node.keyword", SMALL),
GO_ANNOTATION("GO Annotation", "annotation.keyword", LARGE);
NODE("node", "node.keyword", SMALL);
private final String name;
private final String field;
......
......@@ -32,10 +32,12 @@ public final class WheatisDocument implements SearchDocument {
private final String entryType;
private final String databaseName;
private final String url;
private final List<String> species;
private final String node;
private final String description;
private final List<String> annotationId;
private final List<String> annotationName;
private final List<String> ancestors;
@JsonCreator
public WheatisDocument(String id,
......@@ -45,7 +47,10 @@ public final class WheatisDocument implements SearchDocument {
String url,
List<String> species,
String node,
String description) {
String description,
List<String> annotationId,
List<String> annotationName,
List<String> ancestors) {
this.id = id;
this.name = name;
this.entryType = entryType;
......@@ -54,6 +59,9 @@ public final class WheatisDocument implements SearchDocument {
this.species = nullSafeUnmodifiableCopy(species);
this.node = node;
this.description = description;
this.annotationId = annotationId;
this.annotationName = annotationName;
this.ancestors = ancestors;
}
public WheatisDocument(Builder builder) {
......@@ -64,7 +72,10 @@ public final class WheatisDocument implements SearchDocument {
builder.url,
builder.species,
builder.node,
builder.description);
builder.description,
builder.annotationId,
builder.annotationName,
builder.ancestors);
}
@Override
......@@ -101,6 +112,12 @@ public final class WheatisDocument implements SearchDocument {
return description;
}
public List<String> getAnnotationId() {return annotationId; }
public List<String> getAnnotationName() {return annotationName; }
public List<String> getAncestors() {return ancestors; }
@Override
public boolean equals(Object o) {
if (this == o) {
......@@ -157,6 +174,10 @@ public final class WheatisDocument implements SearchDocument {
private List<String> species = Collections.emptyList();
private String node;
private String description;
private List<String> annotationId;
private List<String> annotationName;
public List<String> ancestors;
private Builder() {
}
......@@ -169,6 +190,9 @@ public final class WheatisDocument implements SearchDocument {
this.species = document.getSpecies();
this.node = document.getNode();
this.description = document.getDescription();
this.annotationId = document.getAnnotationId();
this.annotationName = document.getAnnotationName();
this.ancestors = document.getAncestors();
}
public Builder withId(String id) {
......@@ -211,6 +235,21 @@ public final class WheatisDocument implements SearchDocument {
return this;
}
public Builder withAnnotationName(List<String> annotationName) {
this.annotationName = annotationName;
return this;
}
public Builder withAnnotationId(List<String> annotationId) {
this.annotationId = annotationId;
return this;
}
public Builder withAncestors(List<String> ancestors) {
this.ancestors = ancestors;
return this;
}
public WheatisDocument build() {
return new WheatisDocument(this);
}
......
......@@ -54,14 +54,16 @@ public class SearchController {
*/
@GetMapping("/api/search")
public AggregatedPageDTO<? extends SearchDocument> search(@RequestParam("query") String query,
@RequestParam("aggregate") Optional<Boolean> aggregate,
@RequestParam("highlight") Optional<Boolean> highlight,
@RequestParam("page") Optional<Integer> page,
@RequestParam MultiValueMap<String, String> parameters) {
@RequestParam("aggregate") Optional<Boolean> aggregate,
@RequestParam("highlight") Optional<Boolean> highlight,
@RequestParam("descendants") boolean descendants,
@RequestParam("page") Optional<Integer> page,
@RequestParam MultiValueMap<String, String> parameters) {
int requestedPage = page.orElse(0);
validatePage(requestedPage);
return AggregatedPageDTO.fromPage(documentDao.search(query,
highlight.orElse(false),
descendants,
createRefinementsFromParameters(parameters),
PageRequest.of(page.orElse(0), PAGE_SIZE)),
aggregationAnalyzer);
......@@ -69,10 +71,10 @@ public class SearchController {
}
@GetMapping("/api/aggregate")
public AggregatedPageDTO<? extends SearchDocument> aggregate(@RequestParam("query") String query,
@RequestParam MultiValueMap<String, String> parameters) {
@RequestParam MultiValueMap<String, String> parameters, @RequestParam("descendants") boolean descendants) {
return AggregatedPageDTO.fromPage(documentDao.aggregate(query,
createRefinementsFromParameters(parameters)),
createRefinementsFromParameters(parameters), descendants),
aggregationAnalyzer);
}
......
......@@ -33,6 +33,7 @@ logging.level:
# tracer: TRACE
# Above allows for logging curl queries and responses to/from ES.
# Need to add this header to curl: `-H "Content-Type: application/json"`
# org.elasticsearch.client: TRACE # to log Elasticsearch queries
org.springframework:
boot.web.embedded.tomcat.TomcatWebServer: INFO
web.client.RestTemplate: DEBUG
......
......@@ -59,7 +59,27 @@
}
}
},
"annotation": {
"annotationId": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256,
"null_value": "NULL"
}
}
},
"annotationName": {
"type": "text",
"fields": {
"keyword": {
"type": "keyword",
"ignore_above": 256,
"null_value": "NULL"
}
}
},
"ancestors": {
"type": "text",
"fields": {
"keyword": {
......
......@@ -188,8 +188,9 @@ class RareDocumentDaoTest extends DocumentDaoTest {
assertThat(documentDao.search(document.getId(),
false,
SearchRefinements.EMPTY,
firstPage).getContent()).isEmpty();
false,
SearchRefinements.EMPTY,
firstPage).getContent()).isEmpty();
}
@Test
......@@ -200,8 +201,9 @@ class RareDocumentDaoTest extends DocumentDaoTest {
assertThat(documentDao.search("bar",
false,
SearchRefinements.EMPTY,
firstPage).getContent()).isEmpty();
false,
SearchRefinements.EMPTY,
firstPage).getContent()).isEmpty();