Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Assertion errors from Java #27

Open
itamararjuan opened this issue Nov 7, 2018 · 10 comments
Open

Assertion errors from Java #27

itamararjuan opened this issue Nov 7, 2018 · 10 comments

Comments

@itamararjuan
Copy link

Hi @aphyr - Thank you for the amazing framework you've supplied us with to test out our databases
I have a working repo with Jepsen testing out our Blockchain Go Implementation at Orbs
and have encountered something weird with Knossos verifying history and firing many Java Assertion Errors like the following attached here

The weird thing is, the history checks appears to be valid and is quite confusing for me to know whether the test has passed or not due to these errors.
jepsen.log.zip

Would love your feedback. I am attaching here a small fragment of what comes out from stdout:

WARN [2018-11-06 13:46:47,969] clojure-agent-send-pool-2 - jepsen.checker Error while checking history: java.util.concurrent.ExecutionException: java.lang.AssertionError: Assert failed: Process completed an operation without a prior invocation: {:type :ok, :f :read, :value nil, :process 9, :time 682238600, :index 10} i at java.util.concurrent.FutureTask.report(FutureTask.java:122) [na:1.8.0_181] at java.util.concurrent.FutureTask.get(FutureTask.java:192) [na:1.8.0_181] at clojure.core$deref_future.invokeStatic(core.clj:2208) ~[clojure-1.8.0.jar:na] at clojure.core$future_call$reify__6962.deref(core.clj:6688) ~[clojure-1.8.0.jar:na] at clojure.core$deref.invokeStatic(core.clj:2228) ~[clojure-1.8.0.jar:na] at clojure.core$deref.invoke(core.clj:2214) ~[clojure-1.8.0.jar:na] at clojure.core$map$fn__4785.invoke(core.clj:2646) ~[clojure-1.8.0.jar:na] at clojure.lang.LazySeq.sval(LazySeq.java:40) ~[clojure-1.8.0.jar:na] at clojure.lang.LazySeq.seq(LazySeq.java:56) ~[clojure-1.8.0.jar:na] at clojure.lang.RT.seq(RT.java:521) ~[clojure-1.8.0.jar:na] at clojure.core$seq__4357.invokeStatic(core.clj:137) ~[clojure-1.8.0.jar:na] at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:24) ~[clojure-1.8.0.jar:na] at clojure.core.protocols$fn__6738.invokeStatic(protocols.clj:75) ~[clojure-1.8.0.jar:na] at clojure.core.protocols$fn__6738.invoke(protocols.clj:75) ~[clojure-1.8.0.jar:na] at clojure.core.protocols$fn__6684$G__6679__6697.invoke(protocols.clj:13) ~[clojure-1.8.0.jar:na] at clojure.core$reduce.invokeStatic(core.clj:6545) ~[clojure-1.8.0.jar:na] at clojure.core$into.invokeStatic(core.clj:6610) ~[clojure-1.8.0.jar:na] at clojure.core$into.invoke(core.clj:6604) ~[clojure-1.8.0.jar:na] at jepsen.checker$compose$reify__3952.check(checker.clj:88) ~[jepsen-0.1.9-SNAPSHOT.jar:na] at jepsen.checker$check_safe.invokeStatic(checker.clj:71) [jepsen-0.1.9-SNAPSHOT.jar:na] at jepsen.checker$check_safe.invoke(checker.clj:64) [jepsen-0.1.9-SNAPSHOT.jar:na] at jepsen.independent$checker$reify__2302$fn__2303.invoke(independent.clj:272) [na:na] at dom_top.core$bounded_pmap$launcher__218$fn__219.invoke(core.clj:129) [jepsen-0.1.9-SNAPSHOT.jar:na] at clojure.core$binding_conveyor_fn$fn__4676.invoke(core.clj:1938) [clojure-1.8.0.jar:na] at clojure.lang.AFn.call(AFn.java:18) [clojure-1.8.0.jar:na] at java.util.concurrent.FutureTask.run(FutureTask.java:266) [na:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_181] at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_181] at java.lang.Thread.run(Thread.java:748) [na:1.8.0_181] Caused by: java.lang.AssertionError: Assert failed: Process completed an operation without a prior invocation: {:type :ok, :f :read, :value nil, :process 9, :time 682238600, :index 10} i at knossos.history$complete_fold_op.invokeStatic(history.clj:187) ~[knossos-0.3.3.jar:na] at knossos.history$complete_fold_op.invoke(history.clj:160) ~[knossos-0.3.3.jar:na] at clojure.lang.ArrayChunk.reduce(ArrayChunk.java:58) ~[clojure-1.8.0.jar:na] at clojure.core.protocols$fn__6750.invokeStatic(protocols.clj:136) ~[clojure-1.8.0.jar:na] at clojure.core.protocols$fn__6750.invoke(protocols.clj:124) ~[clojure-1.8.0.jar:na] at clojure.core.protocols$fn__6710$G__6705__6719.invoke(protocols.clj:19) ~[clojure-1.8.0.jar:na] at clojure.core.protocols$seq_reduce.invokeStatic(protocols.clj:31) ~[clojure-1.8.0.jar:na] at clojure.core.protocols$fn__6738.invokeStatic(protocols.clj:75) ~[clojure-1.8.0.jar:na] at clojure.core.protocols$fn__6738.invoke(protocols.clj:75) ~[clojure-1.8.0.jar:na] at clojure.core.protocols$fn__6684$G__6679__6697.invoke(protocols.clj:13) ~[clojure-1.8.0.jar:na] at clojure.core$reduce.invokeStatic(core.clj:6545) ~[clojure-1.8.0.jar:na] at clojure.core$reduce.invoke(core.clj:6527) ~[clojure-1.8.0.jar:na] at knossos.history$complete.invokeStatic(history.clj:241) ~[knossos-0.3.3.jar:na] at knossos.history$complete.invoke(history.clj:221) ~[knossos-0.3.3.jar:na] at jepsen.checker.timeline$html$reify__2386.check(timeline.clj:163) ~[na:na] at jepsen.checker$check_safe.invokeStatic(checker.clj:71) [jepsen-0.1.9-SNAPSHOT.jar:na] at jepsen.checker$check_safe.invoke(checker.clj:64) [jepsen-0.1.9-SNAPSHOT.jar:na] at jepsen.checker$compose$reify__3952$fn__3954.invoke(checker.clj:87) ~[jepsen-0.1.9-SNAPSHOT.jar:na] at clojure.core$pmap$fn__6970$fn__6971.invoke(core.clj:6736) ~[clojure-1.8.0.jar:na] ... 6 common frames omitted

I also am attaching the full log of the running

Thanks and hope to hear from you soon!
-- Itamar Arjuan

@aphyr
Copy link
Collaborator

aphyr commented Nov 7, 2018 via email

@itamararjuan
Copy link
Author

Hi @aphyr , I recorded a smaller running, and staggered the requests a bit less dense
and wound up with 2 such history assertion fails. but here's the thing,
the process that knossos claimed was providing :ok op without an :invoke was perfect when following the log line by line.

Each :invoke had a corresponding :ok or :fail and still it said the same thing.
Also something here is quite bizarre in my eyes since the :invoke op is not something that my code adds into knossos , it's added by your code and is implicit to me.

The only part which is up to the code that is written into the test is the reify implementation of the client 's invoke! method which is responsible for the response part of the history record.

So I can't really think of a way that my code would of generated such a scenario where the implicit insertion to history of the :invoke part would be missing from the process.

Regardless of this: following the claimed process's history came up with it being perfectly symetric in terms of the ratio between invokes and ok(s)/fail(s)

Is there any reason in your eyes as to why knossoswould flag the history analysis as valid after these violations? which is also a weird part of this whole situation.

Thanks for feedback!
-- Itamar

@itamararjuan
Copy link
Author

For example, this is what the history.edn file includes at the complaint-about index:

{:type :invoke, :f :read, :value [1 nil], :process 1, :time 47689039400, :index 622} {:type :ok, :f :read, :value 0, :process 1, :time 47716834600, :index 623}

while the error that the checker throw is the following one:

INFO [2018-11-07 12:23:54,831] main - jepsen.core {:perf {:latency-graph {:valid? true}, :rate-graph {:valid? true}, :valid? true}, :indep {:valid? :unknown, :results {0 {:valid? :unknown, :error "java.util.concurrent.ExecutionException: java.lang.AssertionError: Assert failed: Process completed an operation without a prior invocation: {:type :ok, :f :read, :value 0, :process 1, :time 47716834600, :index 623}\ni\n at java.util.concurrent.FutureTask.report (FutureTask.java:122)\n java.util.concurrent.FutureTask.get (FutureTask.java:192)\n clojure.core$deref_future.invokeStatic (core.clj:2208)\n clojure.core$future_call$reify__6962.deref (core.clj:6688)\n clojure.core$deref.invokeStatic (core.clj:2228)\n clojure.core$deref.invoke (core.clj:2214)\n clojure.core$map$fn__4785.invoke (core.clj:2646)\n clojure.lang.LazySeq.sval (LazySeq.java:40)\n clojure.lang.LazySeq.seq (LazySeq.java:56)\n clojure.lang.RT.seq (RT.java:521)\n clojure.core$seq__4357.invokeStatic (core.clj:137)\n clojure.core.protocols$seq_reduce.invokeStatic (protocols.clj:24)\n clojure.core.protocols$fn__6738.invokeStatic (protocols.clj:75)\n clojure.core.protocols/fn (protocols.clj:75)\n clojure.core.protocols$fn__6684$G__6679__6697.invoke (protocols.clj:13)\n clojure.core$reduce.invokeStatic (core.clj:6545)\n clojure.core$into.invokeStatic (core.clj:6610)\n clojure.core$into.invoke (core.clj:6604)\n jepsen.checker$compose$reify__3952.check (checker.clj:88)\n jepsen.checker$check_safe.invokeStatic (checker.clj:71)\n jepsen.checker$check_safe.invoke (checker.clj:64)\n jepsen.independent$checker$reify__2302$fn__2303.invoke (independent.clj:272)\n dom_top.core$bounded_pmap$launcher__218$fn__219.invoke (core.clj:129)\n clojure.core$binding_conveyor_fn$fn__4676.invoke (core.clj:1938)\n clojure.lang.AFn.call (AFn.java:18)\n java.util.concurrent.FutureTask.run (FutureTask.java:266)\n java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149)\n java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)\n java.lang.Thread.run (Thread.java:748)\nCaused by: java.lang.AssertionError: Assert failed: Process completed an operation without a prior invocation: {:type :ok, :f :read, :value 0, :process 1, :time 47716834600, :index 623}\ni\n at knossos.history$complete_fold_op.invokeStatic (history.clj:187)\n knossos.history$complete_fold_op.invoke (history.clj:160)\n clojure.lang.ArrayChunk.reduce (ArrayChunk.java:63)\n clojure.core.protocols$fn__6750.invokeStatic (protocols.clj:136)\n clojure.core.protocols/fn (protocols.clj:124)\n clojure.core.protocols$fn__6710$G__6705__6719.invoke (protocols.clj:19)\n clojure.core.protocols$seq_reduce.invokeStatic (protocols.clj:31)\n clojure.core.protocols$fn__6738.invokeStatic (protocols.clj:75)\n clojure.core.protocols/fn (protocols.clj:75)\n clojure.core.protocols$fn__6684$G__6679__6697.invoke (protocols.clj:13)\n clojure.core$reduce.invokeStatic (core.clj:6545)\n clojure.core$reduce.invoke (core.clj:6527)\n knossos.history$complete.invokeStatic (history.clj:241)\n knossos.history$complete.invoke (history.clj:221)\n jepsen.checker.timeline$html$reify__2386.check (timeline.clj:163)\n jepsen.checker$check_safe.invokeStatic (checker.clj:71)\n jepsen.checker$check_safe.invoke (checker.clj:64)\n jepsen.checker$compose$reify__3952$fn__3954.invoke (checker.clj:87)\n clojure.core$pmap$fn__6970$fn__6971.invoke (core.clj:6736)\n clojure.core$binding_conveyor_fn$fn__4676.invoke (core.clj:1938)\n clojure.lang.AFn.call (AFn.java:18)\n java.util.concurrent.FutureTask.run (FutureTask.java:266)\n java.util.concurrent.ThreadPoolExecutor.runWorker (ThreadPoolExecutor.java:1149)\n java.util.concurrent.ThreadPoolExecutor$Worker.run (ThreadPoolExecutor.java:624)\n java.lang.Thread.run (Thread.java:748)\n"},

@itamararjuan
Copy link
Author

it appears that the expected history value is 1 while 0 was returned and that is what bothers knossos rather than the :invoke missing... or am I missing something ? :-)

-- Itamar

@aphyr
Copy link
Collaborator

aphyr commented Nov 7, 2018 via email

@aphyr
Copy link
Collaborator

aphyr commented Nov 7, 2018

Hmmm. This still feels like a bug in knossos, but now that I'm not on my phone and can read properly, I notice that your invoke and complete ops have different shapes--the value for the invoke is [1 nil], but the returned value is 0. That feels like you might have a client that isn't constructing appropriate responses--like maybe it's taking [key value] pairs and forgetting to construct [key value] completions, instead just returning values. I don't thiiink that should break this particular knossos codepath, though.

@itamararjuan
Copy link
Author

itamararjuan commented Nov 8, 2018

Indeed my client invoke! method returns the register's value as is after casting to int of course (I based my test on your etcd example from scratch)

I wasn't aware of the fact I need to return an array or anything else
So I guess I'll have a look at what your etcd Clojure client code returns in the original test to see I'm aligned with that.

My client code looks like so

`(defn client
"A client for a single compare-and-set register"
[conn]
(reify client/Client
(open! [_ test node]
(client node))

(invoke! [this test op]
  (case (:f op)
    :read (assoc op :type :ok, :value (gamma-cli-read-singular conn))
    :write (do (gamma-cli-write-singular-value (:value op) conn (get op :process))
               (assoc op :type, :ok))

    :cas (try+
          (let [[old new] (:value op)]
            (assoc op :type (if (gamma-cli-singular-cas old new conn (get op :process))
                              :ok
                              :fail)))
          (catch [:errorCode 100] ex
            (assoc op :type :fail, :error :not-found)))))

; If our connection were stateful, we'd close it here.
(close! [_ _])

(setup! [_ _])
(teardown! [_ _])))`

@craigpastro
Copy link

craigpastro commented Jun 5, 2019

Hi @aphyr and @itamararjuan. I am having the exact same problem and was wondering if what I have helps. I get the following exception

java.util.concurrent.ExecutionException: java.lang.AssertionError: Assert failed: Process completed an operation without a prior invocation: #knossos.op.Op{:process 6, :type :ok, :f :read, :value nil, :index 2, :time 3420418100}

corresponding to this clipped history

:type :invoke, :f :read, :value [6 nil], :process 6, :time 3159622400, :index 0}
{:type :invoke, :f :read, :value [3 nil], :process 3, :time 3168123700, :index 1}
{:type :ok, :f :read, :value nil, :process 6, :time 3420418100, :index 2}
{:type :ok, :f :read, :value nil, :process 3, :time 3420418100, :index 3}
...

My read function is

(defn r [_ _] {:type :invoke, :f :read, :value nil})

and in my client if I try to read a key which doesn't exist I do a (assoc op :type :ok, :value nil) which seems to be the cause of the exception. If I turn them into (assoc op :type :fail, :error :nil) or something, then everything seems to be okay.

I felt that (assoc op :type :ok, :value nil) should be okay based on the Jepsen tutorial, but should I be turning these into fail type?

Thanks for any help!

Oh, by the way, I am using the following checker:

:checker    (checker/linearizable {:model     (model/cas-register)
                                   :algorithm :linear})

@aphyr
Copy link
Collaborator

aphyr commented Jun 5, 2019

Hi @siyopao. This error message means that process 6's completion didn't have a matching prior invocation. If you look at process 6's operations in that history, they are:

{:type :invoke, :f :read, :value [6 nil], :process 6, :time 3159622400, :index 0}
{:type :ok, :f :read, :value nil, :process 6, :time 3420418100, :index 2}

In this history, process 6 invoked a read with value [6 nil], and completed that read with value nil. Those values don't match, which is why Knossos is confused. You haven't included your generator or client here, so I can't say for sure, but I suspect you may be using an independent generator, which emits operations of the form [key value], but you haven't updated your client or checker to return and analyze [key value] tuples yet.

@craigpastro
Copy link

craigpastro commented Jun 5, 2019

Hello @aphyr! Thank you for the super fast reply. Well now I feel silly. That is exactly what I missed doing in my client. Thank you for your help!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants