Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Terminate with signal in case of an assert #134

Merged
merged 1 commit into from
Jun 3, 2015
Merged

Terminate with signal in case of an assert #134

merged 1 commit into from
Jun 3, 2015

Conversation

akosthekiss
Copy link
Member

Automated debugging is easier if the process terminates with a
signal instead of a regular exit (code); call, since in this
latter case the cause of error cannot be automatically backtraced.

Care has been taken to ensure that the exit code after the signal
is the same as what we would get with exit.

JerryScript-DCO-1.0-Signed-off-by: Akos Kiss [email protected]

@akosthekiss akosthekiss added the enhancement An improvement label May 29, 2015
@egavrin egavrin added this to the Core ECMA features milestone May 29, 2015
@egavrin egavrin self-assigned this May 29, 2015
@zherczeg
Copy link
Member

zherczeg commented Jun 1, 2015

I had this problem with Jerry before. I suspect the reason of not using abort() is avoiding compiling issues on dev boards. Perhaps we could add an abort() to jerry libc, and call the normal abort on Linux.

@akosthekiss
Copy link
Member Author

Going for abort (); is another, almost identical option. __builtin_trap (); was another -- quicker -- way.

From the proposed patch's point of view, the abort-approach would need the change of 2 lines only: replacing __builtin_trap with abort and re-defining ERR_FAILED_INTERNAL_ASSERTION to 134 (to keep internal error codes and externally visible exit codes aligned -- at least on x86-64/linux). However, this would require some more work on jerry-libc side.

We could also have a 2-step approach:

  • go with the current PR first,
  • add abort () (with all implementations) to jerry-libc and adapt jerry-core (as described above) as second.
    IMHO

@ruben-ayrapetyan
Copy link
Contributor

@akiss77, @zherczeg, there is also a third option that seems to be more convenient for automatic testing systems, and maybe, for manual debugging too.


As far as I understand, by SIGILL we are causing core dump to extract backtrace later.

While the way with SIGILL is working and seems to be simple, there are some disadvantages, as using SIGILL:

  • can confuse, because SIGILL means 'execution of invalid instruction', while real termination cause is quite another;
  • doesn't provide a way to introduce and distinguish various abnormal termination cases (with different exit codes);
  • performs core dump only for the fixed termination cause 'failed assertion' and cannot be used to backtrace, for example, out of memory (in case, exit code, used for the cause, is another).

There is another way to perform core dump. Under Linux we could use gdb's gcore operation.
For example, upon reaching exit function with exit code, indicating an error, gdb is invoked using execve or something like this, with command line instructing it to perform gcore operation for the engine's process.

When using this way, we:

  • don't confuse termination cause with SIGILL;
  • can use any exit codes (and even, can perform core dumps without actually terminating the engine's process);
  • can enable or disable core dumps for any situations (probably, using an option like --enable-core-dump;
  • can replace core dumping with just attaching gdb to the engine's process without terminating (--attach-gdb-on-fail ?).

Furthermore, this way we can get backtrace without allocating much storage space for core dumps during automatic testing.

@zherczeg
Copy link
Member

zherczeg commented Jun 1, 2015

I feel we would just overcomplicate things. We need to capture abnormal program termination in testing environments, regardless how we fix this. So there is no simplification here. When we are debugging, gdb has been started, and creating another one would be overkill. And abort() throws SIGABRT, not SIGILL. That is for invalid instructions.

@ruben-ayrapetyan
Copy link
Contributor

@zherczeg, using abort upon failure seems to be much better option than raising SIGILL.

@ruben-ayrapetyan
Copy link
Contributor

When we are debugging, gdb has been started, and creating another one would be overkill.

This is just a secondary use case, while the main proposed use case was to self-attach gdb upon failure during automatic testing for dumping backtrace, core, or whatever we would add to the dumping script.

@akosthekiss
Copy link
Member Author

@ruben-ayrapetyan, tying gdb so tightly to jerry does not feel right to me. That would add a very strong dependency to the project. Also, usually, an automatic testing framework does start whatever debugger it wants from outside. Starting it up from inside of the tested app is normally unexpected.

Furthermore, the question/point (for me, at least) is not really which signal to raise just to raise one and not exit in a "normal" way. (As seen in all projects using assertions I had a chance to stumble upon.) SIGILL comes into play only because __builtin_trap is implemented that way in gcc 4.8.2 for x86-64/linux.

From the docs of gcc: "This function causes the program to exit abnormally. GCC implements this function by using a target-dependent mechanism (such as intentionally executing an illegal instruction) or by calling abort."

So, eventually, it may be the same as calling abort elsewhere.

In my view, the advantage of __builtin_trap is only that it's already there (being a builtin), while abort is still to be implemented in jerry-libc (AFAIK).

@seanshpark
Copy link
Contributor

Hope this fix has little or no problem with embed RTOS.

@akosthekiss
Copy link
Member Author

@seanshpark, I'm not too familiar with embedded RTOSes so I have to rely on existing examples: iotjs also uses assert (), from assert.h, which eventually translates to an abort () call (on my box). The above discussed approaches (calling __builtin_trap () or abort () directly) are not exactly the same but may be close enough - unless the definition of assert () is significantly different on the RTOS.

@seanshpark
Copy link
Contributor

@akiss77 , thank you for your kind explanation. :)

@akosthekiss
Copy link
Member Author

Updated patch to use abort instead of __builtin_trap. Works as soon as #141 gets landed.

@@ -50,7 +50,7 @@ typedef enum
ERR_SYSCALL = 11,
ERR_PARSER = 12,
ERR_UNIMPLEMENTED_CASE = 118,
ERR_FAILED_INTERNAL_ASSERTION = 120
ERR_FAILED_INTERNAL_ASSERTION = 134
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In case of abort usage we don't need this change anymore.

@egavrin
Copy link
Contributor

egavrin commented Jun 2, 2015

Blocked by #141.

@akosthekiss
Copy link
Member Author

ERR_FAILED_INTERNAL_ASSERTION can be left untouched, of course. This will cause to have an internal error code of 120 and an externally visible exit code of 134, however. (On x86-64/linux, at least.) Is that OK?

@ruben-ayrapetyan
Copy link
Contributor

Using ERR_FAILED_INTERNAL_ASSERTION as an exceptional case, connected to SIGABRT, could lead to some difficulties in future. For example, if we would need core dumps for ERR_OUT_OF_MEMORY cases, it would be necessary to change its internal exit code to 134 too, but maybe some testing scripts would already use current value by that time, so they would need to be updated too, and also we wouldn't have a way to distinguish between the two failure types.

We could add a debug option like --abort-on-failthat would cause engine to perform abort upon exit with non-zero exit code (maybe, except ERR_SYSCALL?), and update condition

if (code == ERR_FAILED_INTERNAL_ASSERTION)

with the following:

if (code != 0
    && jrt_is_abort_on_failure ())
{
  abort ();
}
else
{
  exit (code);
}

In the case, we would expect exit code, corresponding to SIGABRT, if the option is passed in debug version (maybe, in release too?), and expect our internal exit codes otherwise.

We can leave abort for assertion failures in the pull request, and update the implementation in an upcoming enhancement.

@akosthekiss
Copy link
Member Author

Updated the patch according to the latest comments from @egavrin and @ruben-ayrapetyan . The introduction of an --abort-on-fail option seems to be a sensible next step. Thanks for the review.

@egavrin
Copy link
Contributor

egavrin commented Jun 3, 2015

@akiss77 Great! make push

@akosthekiss
Copy link
Member Author

Got make push rejected twice in a row because of parallel works on the repo. Now waiting a bit for the dust to settle. Will try again afterwards.

Automated debugging is easier if the process terminates with a
signal instead of a regular `exit (code);` call, since in this
latter case the cause of error cannot be automatically backtraced.

JerryScript-DCO-1.0-Signed-off-by: Akos Kiss [email protected]
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement An improvement
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants