Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

java process terminated with assertion v != nullptr #53

Open
gknuppe opened this issue Oct 25, 2017 · 8 comments
Open

java process terminated with assertion v != nullptr #53

gknuppe opened this issue Oct 25, 2017 · 8 comments

Comments

@gknuppe
Copy link

gknuppe commented Oct 25, 2017

While running tomcat using MDS, java process terminated with following message:

java: mds-core/include/core/core_context.h:556: mpgc::gc_ptrmds::core::view mds::core::iso_context::shadow(const mpgc::gc_ptrmds::core::view&): Assertion `v != nullptr' failed.

I was reproduced once, I could not get the stacktrace yet.

@gknuppe
Copy link
Author

gknuppe commented Oct 26, 2017

This becomes reproducible very offen and now I could get the backtrace:

(gdb) t 105
[Switching to thread 105 (Thread 0x7fe0a99cb700 (LWP 22947))]
#0 0x00007fea63b300c1 in __pthread_kill (threadid=, signo=19) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:61
61 ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c: No such file or directory.
(gdb) bt
#0 0x00007fea63b300c1 in __pthread_kill (threadid=, signo=19) at ../nptl/sysdeps/unix/sysv/linux/pthread_kill.c:61
#1 0x00007fe0ccaf867e in mpgc::gc_handshake::hdl_abrt (sig=, siginfo=, context=) at /home/knuppe/github/labs/mds-gc/src/gc_handshake.cpp:203
#2
#3 0x00007fea6317f067 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#4 0x00007fea63180448 in __GI_abort () at abort.c:89
#5 0x00007fea63178266 in __assert_fail_base (fmt=0x7fea632b0f18 "%s%s%s:%u: %s%sAssertion `%s' failed.\n%n", assertion=assertion@entry=0x7fe0ccb124ae "v != nullptr",
file=file@entry=0x7fe0ccb12aa0 "/home/knuppe/github/labs/mds-core/include/core/core_context.h", line=line@entry=556,
function=function@entry=0x7fe0ccb18600 <mds::core::iso_context::shadow(mpgc::gc_ptrmds::core::view const&)::PRETTY_FUNCTION> "mpgc::gc_ptrmds::core::view mds::core::iso_context::shadow(const mpgc::gc_ptrmds::core::view&)")
at assert.c:92
#6 0x00007fea63178312 in __GI___assert_fail (assertion=0x7fe0ccb124ae "v != nullptr", file=0x7fe0ccb12aa0 "/home/knuppe/github/labs/mds-core/include/core/core_context.h", line=556,
function=0x7fe0ccb18600 <mds::core::iso_context::shadow(mpgc::gc_ptrmds::core::view const&)::PRETTY_FUNCTION> "mpgc::gc_ptrmds::core::view mds::core::iso_context::shadow(const mpgc::gc_ptrmds::core::view&)")
at assert.c:101
#7 0x00007fe0cc8e934b in mds::core::iso_context::shadow (v=..., this=0x7fd97c660800) at /home/knuppe/github/labs/mds-core/include/core/core_context.h:556
#8 mds::core::iso_context::shadowed (v=...) at /home/knuppe/github/labs/mds-core/include/core/core_context.h:564
#9 0x00007fe0cc90301c in mds::core::name_space::read<(mds::core::kind)12> (v=..., name=..., this=0x0) at /home/knuppe/github/labs/mds-core/include/core/core_naming.h:276
#10 mds::api::namespace_handle_cp::_lookup<(mds::core::kind)12, false> (v=..., name=..., this=0x7fe0a99c9c58) at /home/knuppe/github/labs/mds-core/include/mds_core_api.h:1939
#11 mds::api::namespace_handle_cp::lookup<false, false> (rt=..., name=..., this=0x7fe0a99c9c58) at /home/knuppe/github/labs/mds-core/include/mds_core_api.h:1951
#12 <lambda()>::operator() (__closure=0x7fe0a99c9cb0) at /home/knuppe/github/labs/mds-java-api/jni/src/RecordTypeProxy.cpp:180
#13 mds::jni::exception_handler_wr<Java_com_hpl_mds_impl_RecordTypeProxy_lookupHandle(JNIEnv*, jobject, jlong, jlong, jlong)::<lambda()> > (
func=<unknown type in /data/tomcat/mds-demo-2017/tomcat/apache-tomcat-8.0.33/shared/lib/libmds-jni.so, CU 0x417211, DIE 0x634837>, jEnv=0x7fe0c001c1e0) at /home/knuppe/github/labs/mds-java-api/jni/src/mds_jni.h:97
#14 Java_com_hpl_mds_impl_RecordTypeProxy_lookupHandle (jEnv=0x7fe0c001c1e0, hIndex=5, nsHIndex=0, nameHIndex=55) at /home/knuppe/github/labs/mds-java-api/jni/src/RecordTypeProxy.cpp:183

@EvanKirshenbaum
Copy link
Collaborator

The clue here, I believe is the nsHIndex=0. This means that we think that the namespace we're looking the name up in is null. But the Java RecordTypeProxy.lookupName() function (which, unless I'm mistaken, is the only thing that calls lookupHandle(), gets that zero by calling nsp.handleIndex(), which means that there's an actual Java object that holds the null. Which shouldn't happen.

What I'd like you to do is to put an

assert(handle != 0)

in NamespaceProxy.NamespaceProxy(handle, path) in java_api/src/com/hpl/mds/impl/NamespaceProxy.java, at line 46. There's no path I can see that should trigger this, but it must be happening.

The only places that call the constructor are the construction of the root namespace, at line 35, and the construction of a child namespace at line 68. Neither of these should be able to return a null.

@gknuppe
Copy link
Author

gknuppe commented Oct 31, 2017

I included such assert and so far I'm not seing the crash anymore. I keep running it to see if I can reproduce with this assert.
A question here: Do you think this could be a side-effect of the workaround we implemented for #51 ?

@EvanKirshenbaum
Copy link
Collaborator

I don't see how they could be related, but I don't know where you're getting your namespace handle from.

@gknuppe
Copy link
Author

gknuppe commented Nov 1, 2017

@EvanKirshenbaum Assertion reached.
See the java stacktrace:
java stacktrace.txt

This occurred when looking for sensorSchema of a given historySchema, under isolated(Options.detachedSnapshot(), () -> {} ) call.

Another important think: we have a HistorySchema that represents a given datetime, this one has a property named sensors that is in fact a SensorsListSchema and contains a dynamic namespace including the timestamp in the name. this name space is used to look for sensor ID in this history and not in the other. Note this work pretty fine without option detachedSnapshot().

@gknuppe
Copy link
Author

gknuppe commented Nov 1, 2017

Trying to give you more context, here is a subset of the code:

@RecordSchema
public interface SensorSchema {
	ManagedString id();
	double value();

	Sensor next();
	Sensor prev();
}

@RecordSchema
public interface SensorsHistorySchema {
    SensorsList sensors();
};

@RecordSchema
public interface SensorsListSchema {
	@Private
	ManagedString SENSOR_NAMESPACE();

	Sensor first(); 
	Sensor last();

	static void SensorsList(SensorsList.Private self, long timestamp) {
		self.setSENSOR_NAMESPACE("/MDSDemo/SensorNamespace/" + timestamp);
		self.setFirst(null);
		self.setLast(null);
	}

	static void add(SensorsList.Private self, Sensor sensor) {
		try {
			isolated(() -> {
				if (self.getLast() == null) {
					self.setLast(sensor);
				}

				Sensor current = self.getFirst();

				sensor.setNext(current);
				sensor.setPrev(null);

				if (current != null) {
					current.setPrev(sensor);
				}

				self.setFirst(sensor);
				sensor.bindName(self.getNamespace(), sensor.getId());
			});
		} catch (FailedTransactionException e) {
			// ignore
		}
	}

	static Sensor getSensor(SensorsList.Private self, String sensorId) {
		return self.getNamespace().lookup(sensorId, Sensor.TYPE); // This is the lookup where the problem was found
	}

	@Private
	static Namespace getNamespace(SensorsList.Private self) {
		return Namespace.fromPath(self.getSENSOR_NAMESPACE().toString());
	}
}

Converting the "lookup" by a "while loop", the code lost a lot of performance. The analytics took about 40s using "lookup" code, while it tooks about 2 minutes using the "while loop" code.

@EvanKirshenbaum
Copy link
Collaborator

Okay, I (sort of) think I see what's happening. Looking at mds/java-api/jni/src/mds_jni.cpp, line 201, I see

catch (...)
{
  // throwUnknownEx (jEnv);
}

I'm not sure who commented out that line or why, and I won't be able to tell from just the external repository, but it looks as though an exception is being thrown that isn't one that's explicitly caught and translated into Java, so the exception is being ignored and the default value for the handle (in this case, zero) is being returned. I'd certainly recommend either reinstating the JNI throw by uncommenting the line or putting in a C++ assert.

Now, which exception is it? Namespaces throw two exceptions:

  • incompatible_type_ex is thrown when you're reading a value and the type you pass in isn't compatible or if you ask for a child namespace with a particular name, and there's a binding there to something other than a namespace, and
  • unbound_name_ex is thrown when you try to read the value of a name and it's unbound.

Unfortunately, both of those exceptions are caught in the try block in question.

Perhaps the best thing to do would be to comment out the entire catch (...) clause at line 199. That way, you'll get an uncaught exception, which if you're lucky will cause a core dump that will let you see where it was thrown from.

@EvanKirshenbaum
Copy link
Collaborator

Looking at the core code, the following exceptions are caught in this try block (in exception_converter()):

  • unbound_name_ex
  • incompatible_type_ex
  • incompatible_record_type_ex
  • incompatible_superclass_ex
  • unmodifiable_recored_type_ex
  • read_only_context_ex
  • unpublishable_context_ex
  • unimplemented
  • thread_base_task_unset_ex

The following exceptions are thrown in the core but not caught there:

  • already_published
  • div_by_zero
  • guard_failure_ex
  • illegal_path_ex
  • incompatible_modify_op
  • no_prior_task_ex
  • uncertain_value_ex
  • unknown_modification_ex

The most likely is probably uncertain_value_ex, but that's going to be a bear to track down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants