-
Notifications
You must be signed in to change notification settings - Fork 235
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
SchemeEval run smoothly on x64 and i386 but it crashes on armv7-a #2944
Comments
Wow! Cool! I think I see what the problem is, and how to fix it, but it will require a bit of patience from you. About 6 months ago (circa November 2021) The AtomSpace was switched to use a C++ smart-pointer to hold references. This means that you can no longer use More details in the next comment. |
A C++ smart pointer is three words: one word is a pointer to the "actual object", plus an atomic, plus some other stuff - messy low-level CPU stuff. To use this in your jni code, you have to treat it excactly the same way that you treat So, in your
and then, in the destructor/finalizer, you would
I hope this makes sense and is clear; if not, let me know. If you haven't already written code for Handles, you will need to do something similar to the above. If you have working code for handles, then do exactly that. |
More pseudocode. To actually use the AtomSpace, your c++ code would look like this:
You don't have to make a copy of the smart pointer, but it will be easier to read the code if you do. |
I don't think that is the problem because I used the same code as createAtomSpace() in creating AtomSpace. Please see following extractions. opencog/atomspace/AtomSpace.h
opencog/java/SPW.h
opencog/java/com_cogroid_atomspace_AtomSpace.cc
|
Shouldn't this:
be this:
That's because |
However, it does not seem that you actually call the above methods in your code, so that is not the source of the problem. So, some more comments follow, as I review things. Looking at your tombstone file, I see |
So what's in
The reason for this is that you are sending the entire file contents to scheme, in one big lump. If there are multiple scheme blocks in that file, a lot of output might be generated. You should wait for all of it to be processed, before moving to the next file. There's another design issue, that might cause problems with guile-2.2 -- it does not like long strings. You are taking the contents of each file, and converting it to one big string. That string then is copied into C++, and then the C++ string is copied into a If you are trying to load genetic or protein data from agi-bio, these files can be 50 MBytes or 100MBytes long, containing millions of scheme statements in them. I've noted in the past that if you send strings longer than a few MBytes to the guile-2.2 evaluator, it will run for a while, and then crash in weird ugly ways. I don't know if this is still a problem in guile-3.0 or not. There are several solutions to this large-file problem, but first, I want to know how far things have gotten... are you able to run any scheme statements at all? Or does it work fine, and only crash much later, after you've already done lots of scheme evals? |
I failed at following statement:
|
The tombstone file shows that guile is crashing somewhere inside of gc, and that it has crashed because it tried to touch something about 400 bytes past the end of a 128KByte sized block (at For your test files in
You could try that in one file, or even better: four files, one line each. Does that work? |
whoops we posted at the same time. One moment, let me review... |
Are you sure? That does not match the tombstone file. (I should have looked at the tombstone file more carefully; I would have been able to answer my own questions.) The tombstone file shows a crash in this code:
which means that it already crashed in your java wrapper So the questions are:
|
I need to eat dinner now. What timezone are you in? Perhaps a chat on discord might be faster? |
I created a version with debug printfs that surround the area that crashes in in the tombstone. You can get it from there: https://github.com/linas/atomspace/tree/android-dbg
Then capture the prints to stdout and post those. BTW, there is also a completely different possibility, too. Can you get a guile shell? If so, then the following should work:
|
I don't know how to capture stdout and stderr on Android app. Can you write logs to /storage/emulated/0/Download/datomspace-test.txt file? |
I am thinking of that in case of this issue can not be fixed, I can write a simulated SchemeEval class which converts scheme code to javascript using scheme2js, then runs javascript code in Atomize - JavaScript Sandbox for AtomSpace. |
Atomize - JavaScript Sandbox for AtomSpace now can convert scheme code to javascript using scheme2js and can run it. If this issue can not be fixed, I will write SchemeEval class which contains a flag for simulating or not. If simulating is true, it will convert scheme code to javascript and then run it. If simulating is false, it will call SchemeEval class of AtomSpace. Atomize - JavaScript Sandbox for AtomSpaceReleasedAtomSpace Tester with Atomize Install & Run dAtomSpace Tester with Atomize
|
I'm fairly certain the issue is in some code that we control, and thus can be fixed.
This won't work: there is some heavy/complicated interfacing to call c++ code from scheme (and vice-versa).
How do I install?
I'll do that in just a few minutes. |
Done. Edit |
|
I think it will work. The process is javascript <--> java <--> jni C code <--> AtomSpace classes. (ConceptNode "dream") will be converted to ConceptNode("dream"). So I have to add function definition "function ConceptNode(name) { }". I have to test and add all function definitions which are available on scheme files of AtomSpace. However, this process will be very slow. So it is better if this issue is fixed. |
Naively, that is correct, but you are vastly underestimating the actual complexity. The code in SchemeEval.cc is big and bloated not because it's guile, but because there's dozens of needed functions in there. Plus also
OK I will try that, but I would like it much better if you did most of the work. Don't make me do things that give me headaches ... |
OK, I've installed. I now see a plain panel. At the top it says "dAtomSpace Tester" and then
Note that my logfile is in a different location from yours. |
You open 'Download' folder, then view datomspace-test.txt file for results. |
After I compile AtomSpace with your SchemeEval.cc and run, I receive datomspace-test.txt file as:
|
This is tombstone file. |
And then what ... I guess it hangs after the
When I do it, and look at the text file, I don't see any of what you have above. Instead, I get a java stack trace that looks like its due to my clicking on the "INSTALL & RUN" button, with the bad URL. It says something like |
That is caused by that the app has not permission to write to sdcard. You have to set permission for the app and run again. |
There's no tombstone file in the
I tried other variants, but none worked. |
I looked at your tombstone file .. it looks a lot like your earlier one. The stack trace is more or less the same as before. ... its crashing in the GC. The printf's affirm the stack trace in the tombstone file: As before, the GC crashed trying to access some invalid address. This time, the address is different than before; this time, its some unlabelled 4K page between some shared libs, not too far away from a libc_malloc pool. Close enough that it's plausible that the GC wanted to search the malloc pool, got confused, and wandered away from the pool to a nearby area. This suggests that there's a bug with the GC. If this is happening only on arm, and you've tested on non-arm android, then ... (I'm confused .. did you test on non-arm android? or some non-arm java system? ) ... The easiest thing to try is to find some other version of the GC, ideally a newer version, and see if that works. Trying older versions are worthwhile, too. A little bit harder is to download the code for the gc, and compile it yourself (its not hard) and see if that works. It's bdw-gc. Can you explain what is happening on my phone? Did the atomspace initialize, or not? Did guile initialize, or not, on my phone? |
But it wrote to the txt file just fine! And the txt file is on the sdcard, in the Download directory. |
Please try following URL: https://github.com/cogroid/d-atomize-bin/raw/main/samples/Tests.js. The URL is for javascript file in JavaScript Sandbox format which contains following function:
|
I tested on non-arm java system (java 64bit and 32bit for linux). On that systems, it worked well. |
Without log file, I don't know what happens on your phone. Can you do following?
With dAtomSpace Tester which you download here, atomspace can be initialized but guile is not initialized because it is not called (if it is called, it will crash). |
I am building gc-8.0.6.tar.gz. When it will complete, I will update log file and tombstone file. |
I built gc-8.0.6.tar.gz and results were:
This is tombstone file. |
I removed *.go from cache folder of guile package and it did not crash. But it runs too long, may be it compile *.scm files. |
Ah! This is interesting! But now I am very confused -- I thought that only guile-3.0 used *.go files, and that you were using only guile-2.2 !?? Yes, the *.go files contain guile bytecode; yes, they are the result of compiling *.scm files. And yes, they take a very long time to compile - painfully long -- several minutes on my main desktop. However, the compile is done only once; The second and later startup should be fast. Removing the *.go files will cause a recompile. Sometimes (but not always) changing the *.scm files will cause a recompile. On rare occasions, one can have stale *.go files that are broken, and exhibit crazy nonsense errors. This happens once or twice a year for me... I have no idea if the *.go files are architecture-dependent. They might contain amd64 or i386 assembly in them. There might be differences in stack-growth direction, memory mapping, endian-ness. Differences in how scheme objects are laid out in memory. The safe solution is to remove the *.go files, compile them once, on arm, and then ship the new *.go files as a part of the apk. The compilation should happen automatically, but there is also a way to trigger it manually. I've never done a manual compile for the atomspace ... |
Huh apparently, guile-2.2 also creates *.go files. OK, I'd forgotten about that. I did find some go files in your |
Here's the full contents of the file (as of last night) Note that sometimes I get guile-snarf errors, and sometimes not. The first thing I did was to try to open an empty URL:
|
And then this with the URL you suggested:
|
And one more try:
|
OK, so on my phone:
I can see that inside of I have a terminal emulator installed on my phone, but the |
Your new tombstone is interesting: the stack trace is:
The This line looks suspicious or wrong:
So The good news is that guile itself looks OK. The bad news is that either |
Calling SchemeEval class does not crash but it run forever without any output although I have redirected stdout & stderr to files.
|
In this datomspace-tester.apk, I have removed harded-code and fixed error with 'https' url. |
If you removed the *.go files from Starting the atomspace for the first time will cause the atomspace *.scm files to be compiled into *.go files. This might take 2 or 5 or 10 minutes, depending on your phone. It would be best if the apk included those go files (It would be best if the atomspace cmake did this automatically; I will look into it. #2945) |
Could you change
to
(in two places). This will write a file into "the current directory", whatever that is. I think it will then be the right directory for both you and I. (This is not urgent, however. I can live without it, I think...) |
Can I change as following?
|
I installed this, and clicked on new "test SchemeEval" button, and nothing else. I get this:
|
Yes, that will work. |
According to this page: https://www.gnu.org/software/guile/manual/html_node/Compilation.html the *.go files contain CPU-architecture-dependent code. There is a specific If/when I fix #2945 this might also present cross-compilation challenges. Not sure what to do about this ... |
I'm also thinking that the The above is just an educated guess, though. I could be wrong. |
I just launch a-jsb.com for running javascript in sandbox with atomspace. |
Wow. Well, that is unexpected! It looks like the |
I compiled datomspace-tester.apk with more logs. Following are files:
I am stuck at following error: At libguile/init.cAt scm_load_startup_files ()
At libguile/vm.c
It repeats "scm_call_n #26", then "scm_call_n #28", then "scm_call_n #26" again several times. After that, it stopped. |
Yeah, that's going to be a hard way to debug. Poking through that stuff is like .. debugging assembly code. And anyway, I doubt that is where the bug is. Based on several of your tombstone files, the garbage collector was accessing bad memory, and so the question is "why is it doing that?" So, some background:
When the GC runs, it searches for pointers in all of the stacks and in any malloced RAM it knows about. It is not supposed to search outside of these boundaries. Yet, clearly, this is happening: in the first tombstone, it access memory about 300 bytes away from valid RAM, and in the second tombstone, only about 8K away. These offsets are tiny: both are less than 16-bits away from a valid address. I mean, out of a giant 4GB address space, it didn't access some "random" address, it access something really close by. This less-than-16-bit mistake suggests to me that guile is using a 16-bit short for some offset. I am guessing that, due to architecture confusion, this offset is being added instead of subtracted. How could this happen? Here are my guesses:
The I asked the guile gurus about about arm7 on IRC chat. They said it works fine on Android. They said "just install guix, you'll see" (guix is a guile linux distro.) So, here's how we can check this: A. Install a terminal emulator on the phone
The I could not do this myself, because running If the above does work, then try
If that works, then ??? |
I wrote the above before reading through your files. I'll read your files shortly. |
Did it hang, or did it crash? If it hangs, did you look at the cpu usage? Is the CPU usage 100% or 0% -- If it's 100%, then it is probably trying to compile If it's hung, but there is no CPU usage, then .. ugh. We'd have to use gdb. But first, please check everything I mentioned earlier. |
SchemeEval run smoothly on x64 and i386 but it crashes on armv7-a.
Steps to reproduce bug
Download datomspace-tester.apk
Install datomspace-tester.apk (Do not run!)
Go to Settings -> Apps -> dAtomSpace Tester. Set Storage permission.
Run dAtomSpace Tester
App runs about 30 seconds, then it crashes. There is tombstone file.
View results in datomspace-test.txt file in Download folder
Source codes
dAtomSpace Tester
dAtomSpace
SchemeEval.java
com_cogroid_atomspace_SchemeEval.h
com_cogroid_atomspace_SchemeEval.cc
Tester.java
The text was updated successfully, but these errors were encountered: