Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

OrientDB looses data when we restart cluster after some nodes crashed #6999

Closed
pavlop-asg opened this issue Dec 13, 2016 · 39 comments
Closed
Assignees
Milestone

Comments

@pavlop-asg
Copy link

pavlop-asg commented Dec 13, 2016

OrientDB Version, operating system, or hardware.

  • v2.2.13

Operating System

  • Windows

Expected behavior and actual behavior

There is an example. We have node A and B, we write a value "1" to a document when node A and B are up. We shut down node B and write value "2", after we shut down node A as well. When we start node B and after node A - cluster will have old value "1" instead of "2", looks like there is some bug when OrientDB merges the records. NOTE: when we start node A and after node B - everything works as expected.

Steps to reproduce the problem

Start two nodes with the following distributed configuration:

{
  "autoDeploy": true,
  "readQuorum": 1,
  "writeQuorum": 1,
  "executionMode": "undefined",
  "readYourWrites": true,
  "newNodeStrategy": "static",
  "servers": {
    "*": "master"
  },
  "clusters": {
    "internal": {
    },
    "*": {
      "servers": ["<NEW_NODE>"]
    }
  }
}

This issue is applicable for other configurations as well, for example:

  • When we have 3 nodes and use write quorum "majority" and read quorum 1.
  • When we have 3 nodes and use write and read quorum as "majority".
  • When we have read quorum 1 and write quorum "all" plus "newNodeStrategy": "dynamic" and distributed.autoRemoveOfflineServers=0.
@lvca
Copy link
Member

lvca commented Dec 13, 2016

I suggest you to upgrade to the last 2.2.13 where many bugs on distributed have been already fixed.

@lvca lvca closed this as completed Dec 13, 2016
@lvca lvca added the invalid label Dec 13, 2016
@lvca lvca self-assigned this Dec 13, 2016
@pavlop-asg
Copy link
Author

Sorry, was a typo, we use v2.2.13

@lvca lvca reopened this Dec 13, 2016
@lvca
Copy link
Member

lvca commented Dec 13, 2016

We improved the merge strategy in 2.2.x branch. There are still some corner case where the merge is not done, but we will finalize it in the following days. However, if you could try 2.2.x, your feedback would be valuable before we release the next hotfix.

@lvca lvca added this to the 2.2.x (next hotfix) milestone Dec 13, 2016
@pavlop-asg
Copy link
Author

I have tested with the latest v2.2.13 version. May I clarify, is it an OrientDB issue or maybe I can somehow change the configuration to make it working?

@pavlop-asg
Copy link
Author

Is this issue going to be resolved in the next hotfix? Just want to clarify the expectation, because we have a release very soon and it is critical for us.

@lvca
Copy link
Member

lvca commented Dec 21, 2016

This issue has been fixed in last 2.2.x. Any chance to test it?

@pavlop-asg
Copy link
Author

Which version should I test, the latest snapshot?

@lvca
Copy link
Member

lvca commented Dec 21, 2016

@pavlop-asg
Copy link
Author

I have tested 2.2.14 version and the bug is still here.

@lvca
Copy link
Member

lvca commented Dec 27, 2016

If you tried the same wit 2.2.14, could you please publish here the content of the 2 files distributed-sync.json?

@pavlop-asg
Copy link
Author

pavlop-asg commented Dec 27, 2016

{
    "@type": "d",
    "@version": 0,
    "version": 30,
    "autoDeploy": true,
    "readQuorum": 1,
    "writeQuorum": 1,
    "executionMode": "undefined",
    "readYourWrites": true,
    "newNodeStrategy": "dynamic",
    "servers": {
        "@type": "d",
        "@version": 0,
        "*": "master"
    },
    "clusters": {
        "@type": "d",
        "@version": 0,
        "internal": {
            "@type": "d",
            "@version": 0
        },
        "*": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-1",
                "node-2",
                "<NEW_NODE>"
            ]
        },
        "odocumentwrapper_2": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "odocumentwrapper": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "aclentity": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "aclentity_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "ofunction_0": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "configurationentry_3": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "configurationentry": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "v": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "v_3": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "principalentity": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "principalentity_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "roleentity": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "roleentity_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "rulemodelentity_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "rulemodelentity_3": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "favoriteentity_3": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "favoriteentity": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "extractfilterentity_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "extractfilterentity_2": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "e": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "e_1": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "osequence_0": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "orole_0": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "savedsearchentity_3": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "savedsearchentity": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "ouser_0": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        },
        "oschedule_0": {
            "@type": "d",
            "@version": 0,
            "servers": [
                "node-2",
                "node-1",
                "<NEW_NODE>"
            ]
        }
    }
}

@lvca
Copy link
Member

lvca commented Dec 27, 2016

Not the distributed-config.json, but the distributed-sync.json.

@pavlop-asg
Copy link
Author

{
    "@type": "d",
    "@version": 0,
    "version": 57,
    "node-1": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 8726,
        "@fieldTypes": "segment=l,position=l"
    },
    "lastOperationTimeStamp": 1482505984716,
    "node-2": {
        "@type": "d",
        "@version": 0,
        "segment": 0,
        "position": 3552,
        "@fieldTypes": "segment=l,position=l"
    },
    "@fieldTypes": "lastOperationTimeStamp=l"
}

@lvca
Copy link
Member

lvca commented Dec 28, 2016

Could you please provide the files from other servers too, so I can compare what's wrong?

@lvca
Copy link
Member

lvca commented Jan 5, 2017

Do you have the full log? What's happened next? Was it able to download the database?

@pavlop-asg
Copy link
Author

One more, with version 2.2.15-SHAPHOT sometimes I get the following error when I try to work in a cluster:

Caused by: com.orientechnologies.orient.core.exception.ODatabaseException: Database 'plocal:C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius' is closed
	DB name="mobius"
	DB name="mobius"
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_60]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_60]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_60]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:422) ~[?:1.8.0_60]
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.throwSerializedException(OChannelBinaryAsynchClient.java:442) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:393) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:275) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:167) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:2170) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote$27.execute(OStorageRemote.java:1162) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote$2.execute(OStorageRemote.java:196) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:237) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationRetry(OStorageRemote.java:193) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperation(OStorageRemote.java:204) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:1138) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.core.sql.query.OSQLQuery.run(OSQLQuery.java:78) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.core.sql.query.OSQLAsynchQuery.run(OSQLAsynchQuery.java:74) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.core.sql.query.OSQLSynchQuery.run(OSQLSynchQuery.java:85) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.core.query.OQueryAbstract.execute(OQueryAbstract.java:33) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.query(ODatabaseDocumentTx.java:754) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.object.db.ODatabasePojoAbstract.query(ODatabasePojoAbstract.java:244) ~[orientdb-object-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$load$2(OrientDbConfigurationStorage.java:96) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
	at com.asg.mobius.orientdb.OrientDbObjectStorageBase.executeAndReturn(OrientDbObjectStorageBase.java:88) ~[orientdb-support-7.0.0-SNAPSHOT.jar:?]
	at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.load(OrientDbConfigurationStorage.java:95) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
	at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$loadAll$3(OrientDbConfigurationStorage.java:111) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
	at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_60]
	at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.loadAll(OrientDbConfigurationStorage.java:110) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
	at com.hazelcast.map.impl.MapStoreWrapper.loadAll(MapStoreWrapper.java:143) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.mapstore.AbstractMapDataStore.loadAll(AbstractMapDataStore.java:55) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadAndGet(BasicRecordStoreLoader.java:161) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.doBatchLoad(BasicRecordStoreLoader.java:134) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadValuesInternal(BasicRecordStoreLoader.java:120) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.access$100(BasicRecordStoreLoader.java:54) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader$GivenKeysLoaderTask.call(BasicRecordStoreLoader.java:107) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.util.executor.CompletableFutureTask.run(CompletableFutureTask.java:67) ~[hazelcast-3.7.3.jar:3.7.3]
	... 6 more
2017-01-05 18:32:52.024 [ERROR]    [hz._hzInstance_2_dev.partition-operation.thread-2] com.hazelcast.map.impl.operation.GetOperation:log - [10.129.235.254]:5701 [dev] [3.7.3] Database 'plocal:C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius' is closed
	DB name="mobius"
	DB name="mobius"
com.orientechnologies.orient.core.exception.ODatabaseException: Database 'plocal:C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius' is closed
	DB name="mobius"
	DB name="mobius"
	at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) ~[?:1.8.0_60]
	at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62) ~[?:1.8.0_60]
	at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) ~[?:1.8.0_60]
	at java.lang.reflect.Constructor.newInstance(Constructor.java:422) ~[?:1.8.0_60]
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.throwSerializedException(OChannelBinaryAsynchClient.java:442) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.handleStatus(OChannelBinaryAsynchClient.java:393) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:275) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.binary.OChannelBinaryAsynchClient.beginResponse(OChannelBinaryAsynchClient.java:167) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote.beginResponse(OStorageRemote.java:2170) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote$27.execute(OStorageRemote.java:1162) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote$2.execute(OStorageRemote.java:196) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote.baseNetworkOperation(OStorageRemote.java:237) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperationRetry(OStorageRemote.java:193) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote.networkOperation(OStorageRemote.java:204) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.client.remote.OStorageRemote.command(OStorageRemote.java:1138) ~[orientdb-client-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.core.sql.query.OSQLQuery.run(OSQLQuery.java:78) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.core.sql.query.OSQLAsynchQuery.run(OSQLAsynchQuery.java:74) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.core.sql.query.OSQLSynchQuery.run(OSQLSynchQuery.java:85) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.core.query.OQueryAbstract.execute(OQueryAbstract.java:33) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.query(ODatabaseDocumentTx.java:754) ~[orientdb-core-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.orientechnologies.orient.object.db.ODatabasePojoAbstract.query(ODatabasePojoAbstract.java:244) ~[orientdb-object-2.2.15-SNAPSHOT.jar:2.2.15-SNAPSHOT]
	at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$load$2(OrientDbConfigurationStorage.java:96) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
	at com.asg.mobius.orientdb.OrientDbObjectStorageBase.executeAndReturn(OrientDbObjectStorageBase.java:88) ~[orientdb-support-7.0.0-SNAPSHOT.jar:?]
	at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.load(OrientDbConfigurationStorage.java:95) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
	at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.lambda$loadAll$3(OrientDbConfigurationStorage.java:111) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
	at java.util.ArrayList.forEach(ArrayList.java:1249) ~[?:1.8.0_60]
	at com.asg.mobius.config.storage.orientdb.OrientDbConfigurationStorage.loadAll(OrientDbConfigurationStorage.java:110) ~[config-orientdb-storage-7.0.0-SNAPSHOT.jar:?]
	at com.hazelcast.map.impl.MapStoreWrapper.loadAll(MapStoreWrapper.java:143) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.mapstore.AbstractMapDataStore.loadAll(AbstractMapDataStore.java:55) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadAndGet(BasicRecordStoreLoader.java:161) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.doBatchLoad(BasicRecordStoreLoader.java:134) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.loadValuesInternal(BasicRecordStoreLoader.java:120) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader.access$100(BasicRecordStoreLoader.java:54) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.map.impl.recordstore.BasicRecordStoreLoader$GivenKeysLoaderTask.call(BasicRecordStoreLoader.java:107) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.util.executor.CompletableFutureTask.run(CompletableFutureTask.java:67) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.util.executor.CachedExecutorServiceDelegate$Worker.run(CachedExecutorServiceDelegate.java:212) ~[hazelcast-3.7.3.jar:3.7.3]
	at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_60]
	at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_60]
	at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_60]
	at com.hazelcast.util.executor.HazelcastManagedThread.executeRun(HazelcastManagedThread.java:76) ~[hazelcast-3.7.3.jar:3.7.3]
	at com.hazelcast.util.executor.HazelcastManagedThread.run(HazelcastManagedThread.java:92) ~[hazelcast-3.7.3.jar:3.7.3]

@pavlop-asg
Copy link
Author

Sorry, do not have the full log right now, I was trying to repeat the test and faced the error above.

@pavlop-asg
Copy link
Author

pavlop-asg commented Jan 5, 2017

The reason of that exception is the following:
After node-2 joined the cluster this happened:

2017-01-05 18:28:58:293 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] Accepting socket connection from /10.129.235.254:61063 [SocketAcceptorThread]
2017-01-05 18:28:58:306 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] Established socket connection between /10.129.235.254:2434 and /10.129.235.254:61063 [TcpIpConnectionManager]
2017-01-05 18:29:05:285 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] 

Members [2] {
	Member [10.129.235.254]:2434 - be3473c4-1407-48c7-a772-5ec20ff8946a this
	Member [10.129.235.254]:2435 - 9f08c22b-351d-4884-a616-ed61a69f0b8e
}
 [ClusterService]
2017-01-05 18:29:05:285 WARNI [node-1] Added new node id=Member [10.129.235.254]:2435 - 9f08c22b-351d-4884-a616-ed61a69f0b8e name=ext:9f08c22b-351d-4884-a616-ed61a69f0b8e [OHazelcastPlugin]
2017-01-05 18:29:05:603 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] Re-partitioning cluster data... Migration queue size: 271 [MigrationManager]
2017-01-05 18:29:07:311 INFO  [10.129.235.254]:2434 [orientdb] [3.7.3] All migration tasks have been completed, queues are empty. [MigrationThread]
2017-01-05 18:29:07:458 INFO  [node-1]<-[ext:9f08c22b-351d-4884-a616-ed61a69f0b8e] Received updated about registered nodes [OHazelcastPlugin]
2017-01-05 18:29:07:845 INFO  [node-1]<-[node-2] Added node configuration id=Member [10.129.235.254]:2435 - 9f08c22b-351d-4884-a616-ed61a69f0b8e name=node-2, now 2 nodes are configured [OHazelcastPlugin]
2017-01-05 18:29:07:846 INFO  [node-1] Distributed servers status:

+-------+--------+----------------------+-----+---------+----------------+----------------+------------------------+
|Name   |Status  |Databases             |Conns|StartedOn|Binary          |HTTP            |UsedMemory              |
+-------+--------+----------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE  |mobius=ONLINE (MASTER)|4    |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|768.49MB/3.53GB (21.26%)|
|node-2 |STARTING|                      |0    |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|333.52MB/3.53GB (9.23%) |
+-------+--------+----------------------+-----+---------+----------------+----------------+------------------------+
 [OHazelcastPlugin]
2017-01-05 18:29:08:181 INFO  [node-1] Setting new distributed configuration for database: OSystem (version=1)

CLUSTER CONFIGURATION (LEGEND: X = Owner, o = Copy)
+--------+-----------+----------+
|CLUSTER |writeQuorum|readQuorum|
+--------+-----------+----------+
|*       |     1     |    1     |
|internal|     1     |    1     |
+--------+-----------+----------+

 [ODistributedStorage]
2017-01-05 18:29:08:229 SEVER [node-1]--[node-2] Skip deploying delta database 'mobius' because the requesting server has a most recent database (requester LastOperationOn=2017-01-05 18:09:37.209 current LastOperationOn=2017-01-05 18:09:32.573) [OSyncDatabaseDeltaTask]
2017-01-05 18:29:08:238 INFO  [node-1]<-[node-2] Received new status node-2.mobius=ONLINE [OHazelcastPlugin]
2017-01-05 18:29:08:240 INFO  [node-1]<-[node-2] Received updated status node-1.mobius=NOT_AVAILABLE [OHazelcastPlugin]
2017-01-05 18:29:08:241 INFO  [node-1] Distributed servers status:

+-------+--------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name   |Status  |Databases                    |Conns|StartedOn|Binary          |HTTP            |UsedMemory              |
+-------+--------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE  |mobius=NOT_AVAILABLE (MASTER)|4    |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|768.49MB/3.53GB (21.26%)|
|node-2 |STARTING|                             |0    |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|340.27MB/3.53GB (9.41%) |
+-------+--------+-----------------------------+-----+---------+----------------+----------------+------------------------+
 [OHazelcastPlugin]
2017-01-05 18:29:13:609 INFO  [node-1] Distributed servers status:

+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name   |Status|Databases                    |Conns|StartedOn|Binary          |HTTP            |UsedMemory              |
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE|mobius=NOT_AVAILABLE (MASTER)|6    |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|818.16MB/3.53GB (22.64%)|
|node-2 |ONLINE|mobius=ONLINE (MASTER)       |2    |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|469.05MB/3.53GB (12.98%)|
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
 [OHazelcastPlugin]
2017-01-05 18:29:13:886 INFO  [node-1] Trying to recover current server for database 'mobius'... [OClusterHealthChecker]
2017-01-05 18:29:13:889 INFO  [node-1] Current node is a MASTER for database 'mobius' [ODistributedAbstractPlugin$3]
2017-01-05 18:29:13:890 WARNI [node-1]->[[node-2]] requesting delta database sync for 'mobius' on local server... [OHazelcastPlugin]
2017-01-05 18:29:13:891 INFO  [node-1]->[node-2] Requesting database delta sync for 'mobius' LSN=LSN{segment=0, position=168631}... [OHazelcastPlugin]
2017-01-05 18:29:14:302 INFO  [node-1]<-[node-2] Received updated status node-1.mobius=SYNCHRONIZING [OHazelcastPlugin]
2017-01-05 18:29:14:303 INFO  [node-1] Distributed servers status:

+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name   |Status|Databases                    |Conns|StartedOn|Binary          |HTTP            |UsedMemory              |
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE|mobius=SYNCHRONIZING (MASTER)|6    |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|818.16MB/3.53GB (22.64%)|
|node-2 |ONLINE|mobius=ONLINE (MASTER)       |2    |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|469.05MB/3.53GB (12.98%)|
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
 [OHazelcastPlugin]
2017-01-05 18:29:14:314 WARNI [node-1]<-[node-2] Error on installing database delta for 'mobius' (err=Requested database delta sync with LSN=LSN{segment=0, position=168631} but not found in database) [OHazelcastPlugin]
2017-01-05 18:29:14:314 WARNI [node-1]<-[node-2] Requesting full database 'mobius' sync... [OHazelcastPlugin]
2017-01-05 18:29:14:315 SEVER [node-1]->[node-2] Error on asking delta backup of database 'mobius' (err=Requested database delta sync with LSN=LSN{segment=0, position=168631} but not found in database) [OHazelcastPlugin]
2017-01-05 18:29:14:509 WARNI [node-1]->[[node-2]] Requesting deploy of database 'mobius' on local server... [OHazelcastPlugin]
2017-01-05 18:29:14:517 INFO  [node-1] Downloaded configuration for database 'mobius' from the cluster [ODistributedStorage]
2017-01-05 18:29:14:518 INFO  [node-1] Setting new distributed configuration for database: mobius (version=1)

CLUSTER CONFIGURATION (LEGEND: X = Owner, o = Copy)
+--------+-----------+----------+-------------+-------+
|        |           |          |   MASTER    |MASTER |
|        |           |          |SYNCHRONIZING|ONLINE |
|        |           |          |   dynamic   |dynamic|
+--------+-----------+----------+-------------+-------+
|CLUSTER |writeQuorum|readQuorum|   node-1    |node-2 |
+--------+-----------+----------+-------------+-------+
|*       |     1     |    1     |      X      |   o   |
|internal|     1     |    1     |             |       |
+--------+-----------+----------+-------------+-------+

 [ODistributedStorage]
2017-01-05 18:29:14:537 INFO  [node-1]<-[node-2] Received updated status node-2.mobius=BACKUP [OHazelcastPlugin]
2017-01-05 18:29:14:539 INFO  [node-1] Distributed servers status:

+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|Name   |Status|Databases                    |Conns|StartedOn|Binary          |HTTP            |UsedMemory              |
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
|node-1*|ONLINE|mobius=SYNCHRONIZING (MASTER)|6    |18:27:27 |10.33.77.53:2424|10.33.77.53:2480|818.16MB/3.53GB (22.64%)|
|node-2 |ONLINE|mobius=BACKUP (MASTER)       |2    |18:28:54 |10.33.77.53:2425|10.33.77.53:2481|469.05MB/3.53GB (12.98%)|
+-------+------+-----------------------------+-----+---------+----------------+----------------+------------------------+
 [OHazelcastPlugin]$ANSI{green {db=mobius}} Error during storage close
com.orientechnologies.orient.core.exception.OStorageException: Cannot open the storage 'mobius' because it does not exist in path: C:/Users/ppoh/asg/mobius/data/orientdb/databases/mobius
	DB name="mobius"
	at com.orientechnologies.orient.core.storage.impl.local.OAbstractPaginatedStorage.open(OAbstractPaginatedStorage.java:285)
	at com.orientechnologies.orient.core.db.document.ODatabaseDocumentTx.open(ODatabaseDocumentTx.java:261)
	at com.orientechnologies.orient.server.OServer.openDatabaseBypassingSecurity(OServer.java:980)
	at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:959)
	at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:932)
	at com.orientechnologies.orient.server.OServer.openDatabase(OServer.java:923)
	at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.openDatabase(ONetworkProtocolBinary.java:847)
	at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.handshakeRequest(ONetworkProtocolBinary.java:231)
	at com.orientechnologies.orient.server.network.protocol.binary.ONetworkProtocolBinary.execute(ONetworkProtocolBinary.java:196)
	at com.orientechnologies.common.thread.OSoftThread.run(OSoftThread.java:77)
$ANSI{green {db=mobius}} Error during storage close

@pavlop-asg
Copy link
Author

Any comments on that? And could you please tell me an approximate date of 2.2.15 release?

@pavlop-asg
Copy link
Author

Looks like this issue is fixed, but I found one more minor issue related to that #7073

@robfrank robfrank modified the milestones: 2.2.x (next hotfix), 2.2.15 Jan 16, 2017
@pavlop-asg
Copy link
Author

I have tested released 2.2.15 version and the fix was removed. Was not it released because of #7073 bug?

@santo-it
Copy link

Hi @pavlop-asg,

Thanks for your comment. I'm sorry to hear about your problems. It seems strange to me this difference between 2.2.15-SHAPHOT and 2.2.15. As far as I see the fix has been included in 2.2.15

Are you sure you made the very same test on 2.2.15-SHAPHOT and 2.2.15?
I am reopening the issue for now

Thanks,

@santo-it santo-it reopened this Jan 19, 2017
@pavlop-asg
Copy link
Author

pavlop-asg commented Jan 20, 2017

I have repeated a test with 2.2.15 and looks like the issue is fixed only partly. Use cases:

  • I start node-1 and after node-2 and write a value "1" to a document. I shut down node-1 and write value "2", after I shut down node-2 as well. When I start node-1 and after node-2, nothing gets synchronized and I see the old value "1". It means that issue remains for this case.

  • I start node-1 and after node-2 and write a value "1" to a document. I shut down node-2 and write value "2", after I shut down node-1 as well. When I start node-2 and after node-1, data get synchronized but we get an issue Synchronization in a cluster fails #7073

@lvca
Copy link
Member

lvca commented Jan 25, 2017

I've just tried the 2nd case and everything works. The node1 was restarted, it recognized that its database was newer. This is the log.

2017-01-25 22:38:24:458 INFO  [node1]->[[node2]] Current copy of database 'demo' is newer then the copy present in the cluster. Use the local copy and force other nodes to download this [ODistributedAbstractPlugin$3]
2017-01-25 22:38:24:468 WARNI [node1] Updated node status to 'ONLINE' [OHazelcastPlugin]
2017-01-25 22:38:24:470 INFO  [node1] Distributed servers status:

+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|Name  |Status  |Databases                  |Conns|StartedOn|Binary         |HTTP           |UsedMemory            |
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|node2 |ONLINE  |demo=NOT_AVAILABLE (MASTER)|1    |22:38:04 |10.1.1.100:2424|10.1.1.100:2480|35.50MB/3.56GB (0.97%)|
|node1*|STARTING|                           |3    |22:38:13 |10.1.1.100:2425|10.1.1.100:2481|35.36MB/8.00GB (0.43%)|
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
 [OHazelcastPlugin]
2017-01-25 22:38:24:472 WARNI Authenticated clients can execute any kind of code into the server by using the following allowed languages: [sql] [OServerSideScriptInterpreter]
2017-01-25 22:38:24:472 INFO   [OEnterpriseAgent]
2017-01-25 22:38:24:472 INFO  ***************************************************************************** [OEnterpriseAgent]
2017-01-25 22:38:24:472 INFO  *                     ORIENTDB  -  ENTERPRISE EDITION                       * [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO  ***************************************************************************** [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO  * If you are in Production or Test, you must purchase a commercial license. * [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO  * For more information look at: http://orientdb.com/orientdb-enterprise/    * [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO  ***************************************************************************** [OEnterpriseAgent]
2017-01-25 22:38:24:473 INFO   [OEnterpriseAgent]
2017-01-25 22:38:24:487 INFO  Profiler is recording metrics with configuration: 60,24 [OEnterpriseProfiler]
2017-01-25 22:38:24:677 INFO  Scheduled [FULL_BACKUP] task : fa461098-a6ea-410a-8909-6c974f12e58e. Next execution will be Sun Jan 15 08:10:00 EST 2017 [OBackupTask]
2017-01-25 22:38:24:719 INFO  OrientDB Studio available at http://10.1.1.100:2481/studio/index.html [OServer]
2017-01-25 22:38:24:719 INFO  OrientDB Server is active v2.2.16-SNAPSHOT. [OServer]
2017-01-25 22:38:28:285 INFO  [node1] Received updated status node2.demo=SYNCHRONIZING [OHazelcastPlugin]
2017-01-25 22:38:28:285 INFO  [node1]->[node2] Deploying database 'demo' with delta of changes... [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:285 INFO  [node1]->[node2] Creating delta backup of database 'demo' (startLSN=LSN{segment=0, position=484849}) in directory: /var/folders/zc/y34429014c3bt_x1587qblth0000gn/T/orientdb/backup_node2_demo.zip... [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:286 INFO  [node1] Distributed servers status:

+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|Name  |Status  |Databases                  |Conns|StartedOn|Binary         |HTTP           |UsedMemory            |
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
|node2 |ONLINE  |demo=SYNCHRONIZING (MASTER)|1    |22:38:04 |10.1.1.100:2424|10.1.1.100:2480|35.50MB/3.56GB (0.97%)|
|node1*|STARTING|                           |3    |22:38:13 |10.1.1.100:2425|10.1.1.100:2481|35.36MB/8.00GB (0.43%)|
+------+--------+---------------------------+-----+---------+---------------+---------------+----------------------+
 [OHazelcastPlugin]
2017-01-25 22:38:28:290 INFO  Exporting records after LSN=LSN{segment=0, position=484849}. Found 1 records [OEnterpriseLocalPaginatedStorage]
2017-01-25 22:38:28:291 INFO  [node1]->[node2] Delta backup of database 'demo' completed. range=LSN{segment=0, position=484849}-LSN{segment=0, position=485604} [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:291 INFO  [node1]->[node2] Deploy delta database task completed [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:293 INFO  [node1]->[node2] - transferring chunk #1 offset=0 size=44... [OSyncDatabaseDeltaTask]
2017-01-25 22:38:28:349 INFO  [node1]<-[node2] Received updated status node2.demo=ONLINE [OHazelcastPlugin]
2017-01-25 22:38:29:480 INFO  [node1] Distributed servers status:

+------+------+--------------------+-----+---------+---------------+---------------+----------------------+
|Name  |Status|Databases           |Conns|StartedOn|Binary         |HTTP           |UsedMemory            |
+------+------+--------------------+-----+---------+---------------+---------------+----------------------+
|node2 |ONLINE|demo=ONLINE (MASTER)|1    |22:38:04 |10.1.1.100:2424|10.1.1.100:2480|35.50MB/3.56GB (0.97%)|
|node1*|ONLINE|demo=ONLINE (MASTER)|3    |22:38:13 |10.1.1.100:2425|10.1.1.100:2481|67.84MB/8.00GB (0.83%)|
+------+------+--------------------+-----+---------+---------------+---------------+----------------------+
 [OHazelcastPlugin]

@lvca
Copy link
Member

lvca commented Jan 25, 2017

Actually this worked out after a fix that I just pushed on 2.2.x. Please could you test both cases? They both work for me.

@pavlop-asg
Copy link
Author

Thank you, I will try.

@pavlop-asg
Copy link
Author

It is working fine for me after your fix. Thank you a lot!

@pavlop-asg
Copy link
Author

Could you tell how soon 2.2.16 will be released?

@santo-it santo-it modified the milestones: 2.2.16, 2.2.15 Jan 25, 2017
@santo-it
Copy link

Hi @pavlop-asg,

Glad to hear the issue has been fixed. Thanks for your patience and tests

I'll discuss internally and try to understand when 2.2.16 will be released

Thanks,

@santo-it
Copy link

Fixed by ce26174

Release notes updated

No changes in the documentation required

Can be closed

@lvca lvca closed this as completed Jan 26, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

5 participants