Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
    • Contribute to GitLab
  • Sign in
  • data-discovery data-discovery
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
  • Issues 27
    • Issues 27
    • List
    • Boards
    • Service Desk
    • Milestones
  • Jira
    • Jira
  • Deployments
    • Deployments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Container Registry
  • Analytics
    • Analytics
    • Value stream
    • Repository
  • Snippets
    • Snippets
  • Activity
  • Graph
  • Create a new issue
  • Commits
  • Issue Boards
Collapse sidebar
  • urgi-is
  • data-discoverydata-discovery
  • Issues
  • #15

Closed
Open
Created Aug 10, 2018 by Raphaël Flores@raphael.floresOwner

Data indexing process throws several errors

When harvesting Elasticsearch using first version of files, following errors are thrown:

{
  "id": "fe05116c-0e96-4df2-8015-cf43f5aaa82c",
  "startInstant": "2018-08-10T12:59:17.999Z",
  "endInstant": "2018-08-10T12:59:37.209Z",
  "globalErrors": [],
  "files": [
    {
      "fileName": "rare_pilier_animal.json",
      "successCount": 399,
      "errorCount": 1,
      "errors": [
        {
          "index": 330,
          "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"identifier\"])",
          "line": 7286,
          "column": 4
        }
      ]
    },
    {
      "fileName": "rare_pilier_foret.json",
      "successCount": 2235,
      "errorCount": 0,
      "errors": []
    },
    {
      "fileName": "rare_pilier_microbial.json",
      "successCount": 15,
      "errorCount": 0,
      "errors": []
    },
    {
      "fileName": "rare_pilier_plant_2.json",
      "successCount": 217,
      "errorCount": 0,
      "errors": []
    },
    {
      "fileName": "rare_pilier_plant.json",
      "successCount": 14522,
      "errorCount": 10,
      "errors": [
        {
          "index": 4790,
          "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"name\"])",
          "line": 105594,
          "column": 4
        },
        {
          "index": 5905,
          "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"countryOfCollect\"])",
          "line": 130127,
          "column": 4
        },
        {
          "index": 7216,
          "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"name\"])",
          "line": 158972,
          "column": 4
        },
        {
          "index": 11393,
          "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.InvalidFormatException: Cannot deserialize value of type `java.lang.Double` from String \"France\": not a valid Double value\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"originLatitude\"])",
          "line": 250898,
          "column": 4
        },
        {
          "index": 14238,
          "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"name\"])",
          "line": 313519,
          "column": 4
        },
        {
          "index": 14245,
          "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"name\"])",
          "line": 313676,
          "column": 4
        },
        {
          "index": 14263,
          "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"countryOfCollect\"])",
          "line": 314075,
          "column": 4
        },
        {
          "index": 14264,
          "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"countryOfCollect\"])",
          "line": 314100,
          "column": 4
        },
        {
          "index": 14265,
          "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"countryOfCollect\"])",
          "line": 314125,
          "column": 4
        },
        {
          "index": 14266,
          "error": "Error while parsing object: com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize instance of `java.lang.String` out of START_ARRAY token\n at [Source: UNKNOWN; line: -1, column: -1] (through reference chain: fr.inra.urgi.rare.domain.GeneticResource[\"countryOfCollect\"])",
          "line": 314150,
          "column": 4
        }
      ]
    }
  ]
}

The main reason is because some mono-valued fields contain a comma which is a character used to split for multi-valued fields.

Have to prevent such mono-valued fields to be split on comma as already done for the description field.

Edited Aug 13, 2018 by Raphaël Flores
Assignee
Assign to
Time tracking