Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reindex can ignore op_type create when using external version #37855

Open
ismael-hasan opened this issue Jan 25, 2019 · 8 comments
Open

Reindex can ignore op_type create when using external version #37855

ismael-hasan opened this issue Jan 25, 2019 · 8 comments
Labels
>bug :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down good first issue low hanging fruit Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.

Comments

@ismael-hasan
Copy link
Contributor

ismael-hasan commented Jan 25, 2019

Elasticsearch version (bin/elasticsearch --version): 6.5.2

Plugins installed: []

JVM version (java -version): Java 10.0.2

OS version (uname -a if on a Unix-like system): Windows 10

Description of the problem including expected versus actual behavior:
In a reindex, when op_type is set as create and we use external versioning we can update documents already existing in the destination index. If op_type is set as create, it is not expected that we can update existing documents; quoting the reindex documentation:

Settings op_type to create will cause _reindex to only create missing documents in the target index. All existing documents will cause a version conflict:

Steps to reproduce:
Create a test index with 1 document:

POST test/doc
{
  "test" : "test"
}

Reindex to another test2 index with op_type create (it will work):

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test2",
    "op_type": "create",
    "version_type": "external"
  }
}
{
  "took" : 1500,
  "timed_out" : false,
  "total" : 1,
  "updated" : 0,
  "created" : 1,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}

Reindex again to the same index; in this case it will fail due to versioning (it is expected that it fails for op_type create, but not sure if it fails incidentally due to versioning and it is already ignoring the op_type). The request is the same as in the previous step, the response is:

{
  "took": 2,
  "timed_out": false,
  "total": 1,
  "updated": 0,
  "created": 0,
  "deleted": 0,
  "batches": 1,
  "version_conflicts": 1,
  "noops": 0,
  "retries": {
    "bulk": 0,
    "search": 0
  },
  "throttled_millis": 0,
  "requests_per_second": -1,
  "throttled_until_millis": 0,
  "failures": [
    {
      "index": "test2",
      "type": "doc",
      "id": "YjULhGgBIniEFhEZQcaI",
      "cause": {
        "type": "version_conflict_engine_exception",
        "reason": "[doc][YjULhGgBIniEFhEZQcaI]: version conflict, current version [1] is higher or equal to the one provided [1]",
        "index_uuid": "VNd0kK6aSxWCU6v-NO5m8g",
        "shard": "3",
        "index": "test2"
      },
      "status": 409
    }
  ]
}

Reindex again to the same index, but overwriting the version with scripting. In this case, regardless of the document already existing in the destination and op_type being create, it will update that document:

POST _reindex
{
  "source": {
    "index": "test"
  },
  "dest": {
    "index": "test2",
    "op_type": "create",
    "version_type": "external"
  },
    "script": {
    "source": "ctx._version = 6",
    "lang": "painless"
  }
}
{
  "took" : 65,
  "timed_out" : false,
  "total" : 1,
  "updated" : 1,
  "created" : 0,
  "deleted" : 0,
  "batches" : 1,
  "version_conflicts" : 0,
  "noops" : 0,
  "retries" : {
    "bulk" : 0,
    "search" : 0
  },
  "throttled_millis" : 0,
  "requests_per_second" : -1.0,
  "throttled_until_millis" : 0,
  "failures" : [ ]
}
@colings86 colings86 added the :Data Management/Indices APIs APIs to create and manage indices and templates label Jan 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-features

@jasontedor jasontedor added :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. and removed :Data Management/Indices APIs APIs to create and manage indices and templates labels Jan 25, 2019
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-distributed

@ywelsch ywelsch added >bug good first issue low hanging fruit labels Jan 25, 2019
@kwojcicki
Copy link

Hey @ismael-hasan what would be a good starting point to start fixing this issue 😄

@stjasink
Copy link

stjasink commented Jan 27, 2019

Hi @kwojcicki , I was the one who encountered this. @ismael-hasan opened this issue after he helped me diagnose it in a support call.

For me, it was confusing that the op_type of create was being ignored and the reindex task was overwriting documents in the destination index that I did not want it to overwrite. This happened because I had also specified a version_type; I don't know what the reasons were for us using version_type initially, but removing it solved the problem. It took me two days of debugging to figure out exactly what was happening as my tests were quite long turn-around, simulating a real-world situation of concurrent processes.

The way I see it, the op_type is more important than version_type. If op_type is set to create then it should not need to consider versions at all as it should only consider creates and conflicts and no updates.

Perhaps an easy "fix" would be to return an error if version_type is specified when op_type is set to create? Or ignore version_type in that situation, but ignoring that might be confusing for other users.

@ivange94
Copy link

but not sure if it fails incidentally due to versioning and it is already ignoring the op_type).

@ismael-hasan you are right about that "op_type": "create" was already been ignored in the first request. The failure was only due to version conflict. If you create a new version of the doc in index test and reindex on test2 which already has that same doc but an older version, it doesn't fail.

Working on the issue.

@ivange94
Copy link

ivange94 commented Jan 27, 2019

@stjasink I'm still tracking down the issue but it sees from this code

https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/action/index/IndexRequest.java#L167-L172

op_type create should not be used with version_type external?

PS: But that's not where the error comes from though. I'm just trying to understand the expected behavior

@stjasink
Copy link

@ivange94 That check sounds sensible. Would it make sense to do the same check in https://github.com/elastic/elasticsearch/blob/master/server/src/main/java/org/elasticsearch/index/reindex/ReindexRequest.java to avoid the situation I encountered?

ivange94 added a commit to ivange94/elasticsearch that referenced this issue Feb 8, 2019
op_type create should only be used with version_type
external

Closes elastic#37855
@henningandersen henningandersen added :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down and removed :Distributed Indexing/CRUD A catch all label for issues around indexing, updating and getting a doc by id. Not search. labels Apr 12, 2019
@rjernst rjernst added the Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination. label May 4, 2020
@timothy-choi
Copy link

Is issue is still open? If so, what requirements are there, if any, for a PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
>bug :Distributed Indexing/Reindex Issues relating to reindex that are not caused by issues further down good first issue low hanging fruit Team:Distributed (Obsolete) Meta label for distributed team (obsolete). Replaced by Distributed Indexing/Coordination.
Projects
None yet
Development

Successfully merging a pull request may close this issue.