TM: recorder doesn't handle errors that also cause an abort

RESOLVED FIXED

Status

()

RESOLVED FIXED
9 years ago
9 years ago

People

(Reporter: gal, Assigned: luke)

Tracking

Trunk
x86
Mac OS X
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 attachments, 3 obsolete attachments)

(Reporter)

Description

9 years ago
Luke's type system seems to not handle the case where an abort happens and within the same deep bail an error occurs. If we report the abort back, the error is swallowed. If we report the error, various recorder pieces are touched and cause badness.

Note: someone please steal this bug. Luke?
(Reporter)

Comment 1

9 years ago
Proposed patch:

--- a/js/src/jstracer.cpp
+++ b/js/src/jstracer.cpp
@@ -7120,23 +7120,25 @@ TraceRecorder::monitorRecording(JSOp op)
     }
     if (status == ARECORD_ABORTED) {
         JS_ASSERT(!localtm.recorder);
         return ARECORD_ABORTED;
     }
 
   stop_recording:
     /* Handle lazy abort / OOM. */
-    if (outOfMemory() || OverfullJITCache(&localtm)) {
+    if ((localtm.recorder && outOfMemory()) ||
+        OverfullJITCache(&localtm)) {
         ResetJIT(cx, FR_OOM);
         return ARECORD_ABORTED;
     }
     if (StatusAbortsRecording(status)) {
-        AbortRecording(cx, js_CodeName[op]);
-        return ARECORD_ABORTED;
+        if (localtm.recorder)
+            AbortRecording(cx, js_CodeName[op]);
+        return (status == ARECORD_ERROR) ? ARECORD_ERROR : ARECORD_ABORTED;
     }
 
     return status;
 }
 
 JS_REQUIRES_STACK void
 AbortRecording(JSContext* cx, const char* reason)
 {
(Reporter)

Comment 2

9 years ago
Here is a slightly improved version:

diff --git a/js/src/jstracer.cpp b/js/src/jstracer.cpp
--- a/js/src/jstracer.cpp
+++ b/js/src/jstracer.cpp
@@ -7120,23 +7120,25 @@ TraceRecorder::monitorRecording(JSOp op)
     }
     if (status == ARECORD_ABORTED) {
         JS_ASSERT(!localtm.recorder);
         return ARECORD_ABORTED;
     }
 
   stop_recording:
     /* Handle lazy abort / OOM. */
-    if (outOfMemory() || OverfullJITCache(&localtm)) {
+    if ((localtm.recorder && localtm.recorder->outOfMemory()) ||
+        OverfullJITCache(&localtm)) {
         ResetJIT(cx, FR_OOM);
         return ARECORD_ABORTED;
     }
     if (StatusAbortsRecording(status)) {
-        AbortRecording(cx, js_CodeName[op]);
-        return ARECORD_ABORTED;
+        if (localtm.recorder)
+            AbortRecording(localcx, js_CodeName[op]);
+        return (status == ARECORD_ERROR) ? ARECORD_ERROR : ARECORD_ABORTED;
     }
 
     return status;
 }
 
 JS_REQUIRES_STACK void
 AbortRecording(JSContext* cx, const char* reason)
 {
(Reporter)

Updated

9 years ago
Blocks: 558754
(Reporter)

Comment 3

9 years ago
I manually checked all code paths and I think all current paths are safe, but this is pretty severe and should be fixed right away (it crashes my iterator patch that does add a path that does abort + report and error at the same time).
(Assignee)

Comment 4

9 years ago
Eech, good find.

IIUC, in addition to changes you suggest above, we'd also need to soup up the ExecuteTree call in attemptTreeCall to possibly return ARECORD_ERROR instead of always returning ARECORD_ABORTED.  Since [A]RECORD_ERROR means "there was an error, but the recorder was not aborted", I'm inclined to add ARECORD_ABORTED_WITH_ERROR.  Does that sound right?
Assignee: general → lw
(Assignee)

Comment 5

9 years ago
So in investigating our natives-returning-error-handling, I found another, highly unlikely bug: although we check the return value of natives to see whether to bail or not, once we return to the interpreter, we only check cx->throwing.  Thus, if a native returns false without setting cx->throwing, the false is lost.  But to return 0 without setting cx->throwing is to quit, and so this would have to be a native that quits, but only some of the time.  Do such natives exist?  It seems like no.
The only case I can think of is the old branch callback, which would trigger an uncatchable failure.  Otherwise, false-without-error-report is a bug in the native, but we should terminate the script rather than continuing, I think!
(Assignee)

Comment 7

9 years ago
Created attachment 440620 [details] [diff] [review]
patch

If I've understood all the cases correctly, this patch catches the abort/error case and also cleans up the monitorRecording logic so its easier to understand.  
I didn't end up needing any extra ARECORD_ flag since ARECORD_ERROR, returned from monitorRecording, already means "the recorder was aborted (somehow), goto error".  The patch passes trace and ref tests.  Andreas: does it fix the crash you were hitting?
(Reporter)

Comment 8

9 years ago
Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x0000000000000000
0x0000000100146d57 in JS_Assert (s=0x1001ec188 "status == ARECORD_COMPLETED || status == ARECORD_ABORTED", file=0x1001e59d5 "../jstracer.cpp", ln=7138) at ../jsutil.cpp:76
76	    *((int *) NULL) = 0;  /* To continue from here in GDB: "return" then "continue". */
(gdb) bt
#0  0x0000000100146d57 in JS_Assert (s=0x1001ec188 "status == ARECORD_COMPLETED || status == ARECORD_ABORTED", file=0x1001e59d5 "../jstracer.cpp", ln=7138) at ../jsutil.cpp:76
#1  0x0000000100199110 in js::TraceRecorder::monitorRecording (this=0x100415070, op=JSOP_NEXTITER) at ../jstracer.cpp:7138
#2  0x000000010007eaa5 in js_Interpret (cx=0x10088b400) at jsops.cpp:78
#3  0x00000001000a7f8e in js_Execute () at jsinterp.cpp:1103
#4  0x0000000100011e01 in JS_ExecuteScript (cx=0x10088b400, obj=0x101002000, script=0x100414ce0, rval=0x0) at ../jsapi.cpp:4804
#5  0x0000000100009f9c in Process (cx=0x10088b400, obj=0x101002000, filename=0x7fff5fbffa68 "x.js", forceTTY=0) at ../../shell/js.cpp:449
#6  0x000000010000abdc in ProcessArgs (cx=0x10088b400, obj=0x101002000, argv=0x7fff5fbff910, argc=2) at ../../shell/js.cpp:863
#7  0x000000010000ad57 in main (argc=2, argv=0x7fff5fbff910, envp=0x7fff5fbff928) at ../../shell/js.cpp:5045
(gdb)
(Assignee)

Comment 9

9 years ago
Created attachment 440672 [details] [diff] [review]
bigger

Brendan explained that, on OOM, functions fail without setting cx->throwing, so there is a common case where testing cx->throwing isn't sufficient.  Second, as Andreas's new patch demonstrates, many of these RETURN_ERROR situations can reenter and thus delete the current TraceRecorder.  This means:
 1) catch errors using cx->builtinStatus in ExecuteTree, and propagate this up to the interpreter
 2) turn all RECORD_ERRORs into ARECORD_ERRORs (and all the functions that return those from RecordingStatus to AbortableRecordingStatus)

(2) is a lot of boring syntax.  (1) requires more careful change since errors can propagate up from ExecuteTree through several paths.  The patch passes trace/ref tests, not quite ready though.
Attachment #440620 - Attachment is obsolete: true
(Assignee)

Comment 10

9 years ago
Created attachment 440826 [details] [diff] [review]
better

The last patch went a bit overboard converting RecordingStatus to AbortableRecording status; this one undoes the unnecessary parts.  I also came across a few cases where we weren't checking for deleted TraceRecorder and few others where we were returning ABORTED instead of ERROR.

Going to build a browser and run some tests first.  Andreas: does this patch work with yours?
Attachment #440672 - Attachment is obsolete: true
(Reporter)

Comment 11

9 years ago
This patch utterly conflicts with mine. One of us is going to wade through knee deep rebasing hell.
(Assignee)

Comment 12

9 years ago
Created attachment 440912 [details] [diff] [review]
minus silly assertion

xpcshell tests found a silly assertion I added.  Surprisingly we don't have any trace-tests that deep abort while recording, so I added one.

(In reply to comment #11)
> This patch utterly conflicts with mine. One of us is going to wade through knee
> deep rebasing hell.

Psh, just a flesh wound.  I'm happy to rebase my patch on top of yours.
Attachment #440826 - Attachment is obsolete: true
Attachment #440912 - Flags: review?(gal)
(Reporter)

Comment 13

9 years ago
I don't have review yet, and I need your patch to not crash since my patch tickles this bug, so its going to be my pound o flesh.
(Reporter)

Comment 14

9 years ago
Comment on attachment 440912 [details] [diff] [review]
minus silly assertion

Looks good. I suggest some fuzzing.
Attachment #440912 - Flags: review?(gal) → review+
(In reply to comment #14)
> (From update of attachment 440912 [details] [diff] [review])
> Looks good. I suggest some fuzzing.

Please cc me if fuzzing is needed. :)

Could this patch please be rebased to TM tip?
(Reporter)

Comment 16

9 years ago
Created attachment 441334 [details] [diff] [review]
refreshed
(Assignee)

Comment 17

9 years ago
http://hg.mozilla.org/tracemonkey/rev/04b98e71bc55
Whiteboard: fixed-in-tracemonkey
Depends on: 563167

Comment 18

9 years ago
http://hg.mozilla.org/mozilla-central/rev/04b98e71bc55
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.