Data indexing process throws several errors
When harvesting Elasticsearch using first version of files, following errors are thrown:
{
"id": "fe05116c-0e96-4df2-8015-cf43f5aaa82c",
"startInstant": "2018-08-10T12:59:17.999Z",
"endInstant": "2018-08-10T12:59:37.209Z",
"globalErrors": [],
"files": [
{
"fileName": "rare_pilier_animal.json",
"successCount": 399,
"errorCount": 1,
"errors": [
{
"index": 330,
"error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"identifier\"])",
"line": 7286,
"column": 4
}
]
},
{
"fileName": "rare_pilier_foret.json",
"successCount": 2235,
"errorCount": 0,
"errors": []
},
{
"fileName": "rare_pilier_microbial.json",
"successCount": 15,
"errorCount": 0,
"errors": []
},
{
"fileName": "rare_pilier_plant_2.json",
"successCount": 217,
"errorCount": 0,
"errors": []
},
{
"fileName": "rare_pilier_plant.json",
"successCount": 14522,
"errorCount": 10,
"errors": [
{
"index": 4790,
"error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"name\"])",
"line": 105594,
"column": 4
},
{
"index": 5905,
"error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"countryOfCollect\"])",
"line": 130127,
"column": 4
},
{
"index": 7216,
"error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"name\"])",
"line": 158972,
"column": 4
},
{
"index": 11393,
"error": "Error while parsing object: com.fasterxml.jackson.databind.exc.InvalidFormatException: Cannot deserialize value of type `java.lang.Double` from String \"France\": not a valid Double value\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"originLatitude\"])",
"line": 250898,
"column": 4
},
{
"index": 14238,
"error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"name\"])",
"line": 313519,
"column": 4
},
{
"index": 14245,
"error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"name\"])",
"line": 313676,
"column": 4
},
{
"index": 14263,
"error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"countryOfCollect\"])",
"line": 314075,
"column": 4
},
{
"index": 14264,
"error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"countryOfCollect\"])",
"line": 314100,
"column": 4
},
{
"index": 14265,
"error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"countryOfCollect\"])",
"line": 314125,
"column": 4
},
{
"index": 14266,
"error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"countryOfCollect\"])",
"line": 314150,
"column": 4
}
]
}
]
}
The main reason is because some mono-valued fields contain a comma which is a character used to split for multi-valued fields.
Have to prevent such mono-valued fields to be split on comma as already done for the description field.