Closed Bug 487134 Opened 15 years ago Closed 15 years ago

TM: Record all calls to native functions

Categories

(Core :: JavaScript Engine, defect)

Other Branch
defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: jorendorff, Assigned: gal)

References

Details

(Keywords: verified1.9.1, Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 19 obsolete files)

72.19 KB, patch
Details | Diff | Splinter Review
Currently we bail out on constructors (except those with constructor-specific trcinfo) and slow natives.
Attached patch v1 (obsolete) — Splinter Review
Passes trace-tests. This probably needs a bit more testing and some mild tweaking for your patch since we touch the same parts. Who should go first?
Attachment #371415 - Flags: review?(jorendorff)
(feel free to steal and play with it since you are a few time zones ahead, otherwise I will do more testing tomorrow)
Flags: blocking1.9.1?
Attached patch v2 (obsolete) — Splinter Review
Array constructors were wrong
Attachment #371415 - Attachment is obsolete: true
Attachment #371415 - Flags: review?(jorendorff)
Halfway through.  Just so I don't forget, in this code:

 bool
-TraceRecorder::getClassPrototype(JSObject* ctor, LIns*& proto_ins)
+TraceRecorder::getClassPrototype(JSObject* ctor, JSObject** proto)
 {
     jsval pval;
 
     if (!OBJ_GET_PROPERTY(cx, ctor,
-                          ATOM_TO_JSID(cx->runtime->atomState
-                                       .classPrototypeAtom),
+                          ATOM_TO_JSID(cx->runtime->atomState.classPrototypeAtom),
                           &pval)) {
         ABORT_TRACE("error getting prototype from constructor");
     }
     if (JSVAL_TAG(pval) != JSVAL_OBJECT)
         ABORT_TRACE("got primitive prototype from constructor");
 #ifdef DEBUG
     JSBool ok, found;
     uintN attrs;
     ok = JS_GetPropertyAttributes(cx, ctor, js_class_prototype_str, &attrs, &found);
     JS_ASSERT(ok);
     JS_ASSERT(found);
     JS_ASSERT((~attrs & (JSPROP_READONLY | JSPROP_PERMANENT)) == 0);
 #endif
-    proto_ins = INS_CONSTPTR(JSVAL_TO_OBJECT(pval));
+    *proto = JSVAL_TO_OBJECT(pval);
     return true;
 }

While you're in there, maybe those aborts should be strengthened to asserts?  I think both should never ever happen.  If it's possible that ctor's .prototype property has been tampered with, it's just as likely that it will be tampered with *after* recording, in which case this code is invalid anyway...
Attached patch v3 (obsolete) — Splinter Review
Attachment #371525 - Attachment is obsolete: true
Attached patch v4 (obsolete) — Splinter Review
Attachment #371555 - Attachment is obsolete: true
Attached patch v5 (obsolete) — Splinter Review
Still crashes on make check when we trace across a non-constructor call of a slow native. Need help debugging. Putting this back in the pool for a bit. I want to pick up one of the crashers.
Attachment #371557 - Attachment is obsolete: true
Assignee: gal → general
I can take!
Assignee: general → jorendorff
Making good progress here.  Calling a JSNative has a bunch of special cases and fiddly bits that weren't being emulated.

I've broken testNewDate somehow.  Will get to it tomorrow morning.

Fixing bug 487240 along the way.
Assignee: jorendorff → general
The testNewDate thing was a trivial bug.  Now I'm passing trace-test and js/tests, but _tests/xpcshell/xpcom/unit/test_nsIProcess.js deadlocks. We are tracing a loop that we weren't tracing before:

  while (!_quit)
    thr.processNextEvent(true);

and some bug is causing us to enter this loop even though _quit has been set to `true`.  Investigating.
Attached patch interdiff v5 to v6 (obsolete) — Splinter Review
Assignee: general → jorendorff
Attached patch v6 (obsolete) — Splinter Review
Attachment #371577 - Attachment is obsolete: true
Here is the reduced testcase; comments to follow.


var _quit = false;

function other() {
    var bits = [""];
    for (var i = 0; i < bits.length; i++)
        ;
}

other();
other();
var file = Components.classes["@mozilla.org/file/directory_service;1"]
    .getService(Components.interfaces.nsIProperties)
    .get("CurWorkD", Components.interfaces.nsILocalFile);
file.append("TestQuickReturn");
var process = Components.classes["@mozilla.org/process/util;1"]
    .createInstance(Components.interfaces.nsIProcess2);
process.init(file);
process.runAsync([], 0, {
    observe: function(subject, topic, data) {
            other();
            _quit = true;
            dump("    DONE - _quit is: " + _quit + "\n");
        }
    });
var thr = Components.classes["@mozilla.org/thread-manager;1"]
    .getService().currentThread;
while (!_quit) {
    dump("    _quit is: " + _quit + "\n");
    thr.processNextEvent(true);
}
All three calls to other() are significant, so it must have something to do with deep-bailing and then entering an existing trace before the first _FAIL builtin has returned.

I think I see the bug:

  Enter trace 1
  Call native 1
      Reenter the interpreter, so deep-bail
      Enter trace 2
      Call native 2
          Deep-bail for whatever reason
      Return from native; leave trace 2 with STATUS_EXIT;
       erroneously set cx->builtinStatus to 0
  Return from native; should leave trace 1 but we don't,
   because cx->builtinStatus is 0!
Attached patch v7 (obsolete) — Splinter Review
This is not much different from v6, but rebased on top of the fix in bug 487676.
Attachment #371893 - Attachment is obsolete: true
Attachment #371894 - Attachment is obsolete: true
TEST-UNEXPECTED-FAIL | /Users/jason/dev/moz/tracemonkey/obj-ff-release/_tests/xpcshell/test_update/unit/test_0030_general.js | test failed, see following log:

Log is useless; it's a crash.

(gdb) bt
#0  0x004b0c8e in ?? ()
#1  0x00000001 in ?? ()
Previous frame inner to this frame (gdb could not unwind past this frame)
(gdb) info threads
  6 thread 0x3403  0x95578226 in semaphore_timedwait_signal_trap ()
  5 thread 0x3103  0x95578226 in semaphore_timedwait_signal_trap ()
  4 thread 0x2f03  0x95578226 in semaphore_timedwait_signal_trap ()
  3 thread 0x2b03  0x955af30a in select$DARWIN_EXTSN$NOCANCEL ()
  2 thread 0x180b  0x95578226 in semaphore_timedwait_signal_trap ()
* 1 thread 0x10b  0x004b0c8e in ?? ()
Depends on: 487707
Depends on: 487845
Attached patch v8 (obsolete) — Splinter Review
Rebased on top of the fix for bug 487845.
Attachment #371963 - Attachment is obsolete: true
Comment on attachment 372463 [details] [diff] [review]
v8

r?brendan because I think he knows the most about the code that this new code should be emulating.

This can wait a bit.  The bug is not a blocker.
Attachment #372463 - Flags: review?(brendan)
-'ing based on comment #19.
Flags: blocking1.9.1? → blocking1.9.1-
Flags: wanted1.9.1?
Attached patch v9 (really just v8 refreshed) (obsolete) — Splinter Review
No significant changes.
Attachment #372463 - Attachment is obsolete: true
Attachment #374369 - Flags: review?(brendan)
Attachment #372463 - Flags: review?(brendan)
Attachment #374369 - Flags: review?(brendan) → review+
Comment on attachment 374369 [details] [diff] [review]
v9 (really just v8 refreshed)

Want INS_CONSTWORD to hide (void*) cast of jsval/jsword/intptr_t actual macro param

Isn't putting the INS_CONSTPTR(proto) in a common helper such as getClassPrototype a codesize win?

If not, why not return the JSObject *proto directly avoid the bool return with JSObject** out param?

Second getClassPrototype should just be one-liner: return js_GetClassPrototype(...); and not if (!js_Get...()) return false; return true;

There's an else after return in functionCall, near the bottom

r=me with these addressed. Great patch, want to bake it and maybe consider for 3.5 if it looks good. (I know, I know -- stop moanin' you skeptics :-P.)

/be
v9 has a bug: I'm calling getClassPrototype(ctor, blah) with constructors that don't necessarily have a READONLY PERMANENT .prototype property.

So this, for example, asserts:

  for (i = 0; i < 4; i++)
      new isNaN(3);

For this case I have to imitate a property lookup (isNaN.prototype) or punt.  Looking into it.
jorendorff reminded me on IRC of jsobj.cpp's js_NewInstance, which is used for interpreted constructor functions. Natives non-class-constructor functions called via new could do likewise.

/be
Attached patch fixes (obsolete) — Splinter Review
Attachment #374562 - Flags: review?(brendan)
Attached patch nits (obsolete) — Splinter Review
Reverted a bunch of getClassPrototype changes.
Attached patch v10 = v9 + fixes + nits (obsolete) — Splinter Review
Attachment #374369 - Attachment is obsolete: true
Attachment #374564 - Attachment is obsolete: true
Attachment #374565 - Flags: review+
Comment on attachment 374562 [details] [diff] [review]
fixes

I leave it to jorendorff to obsolete this patch.

/be
Attachment #374562 - Flags: review?(brendan)
Your wish is my command. Still got a bug -- assertion in dom/tests/mochitest/ajax/jquery/test_jQuery.html, in this code:

	for ( var value = object[0];
		i < length && callback.call( value, i, value ) !== false; value = object[++i] ){}

that's a JSOP_APPLY right there, which we're newly tracing.  The assertion is JSVAL_IS_OBJECT(vp[1]); in the interpreter, line 5003.  I suppose I am reconstructing the stack incorrectly --but it also bothers me that js_Disassemble shows this:

...
00220: 745  and 243 (23)
00223: 745  getarg 1
00226: 745  callprop "true"
00229: 745  getlocal 3
00232: 745  getlocal 1
00235: 745  getlocal 3
00238: 745  apply 3
...

"true"?  Recent disassembler breakage maybe?
function fn(i, v) { print(i); }
for (var i = 0; i < 9; ++i)
    fn.call("q", i);

Assertion failure: JSVAL_IS_OBJECT(vp[1]), at ../jsinterp.cpp:5003
The "true" in comment 30 reminds me of bugs where the imacro's atoms pointer (which points to the runtime's common atoms, including 'false', 'true', etc.) is left stuck as the imacro-calling script's atoms pointer. Can you check in gdb?

/be
Oh - I was calling js_Disassemble directly from gdb, so it makes sense that that would happen.

Found the bug.  Very simple, v10 deleted a check (for primitive 'this' arguments to .apply/.call) that is still needed.

Fixing that I pushed to the try server, and it passed mochitest-plain, but:

http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1240874308.1240884417.5473.gz

Finished tests from alltabslistener.js
*** can move shortcut id?
*** can move shortcut node?
*** can move shortcut id?
*** can move shortcut node?
*** can move shortcut id?
*** can move shortcut node?
*** can move shortcut id?
*** can move shortcut node?
invalid new default button: , assuming: none
pure virtual method called
terminate called without an active exception
TEST-UNEXPECTED-FAIL | (automation.py) | Exited with code 1 during test run
INFO | (automation.py) | Application ran for: 0:00:35.965375
TEST-UNEXPECTED-FAIL | (automation.py) | Browser crashed (minidump found)
TEST-UNEXPECTED-FAIL | runtests-leaks | missing output line for total leaks!
make: *** [mochitest-browser-chrome] Error 255
program finished with exit code 2
TinderboxPrint: mochitest-browser-chrome<br/><em class="testfail">T-FAIL</em>&nbsp;<em class="testfail">CRASH</em>&nbsp;<em class="testfail">L-FAIL</em>
Unknown Error: command finished with exit code: 2

The line "pure virtual method called" makes me think this isn't us.  The crash is Linux-only and it seems likely it's random.  I'd like to push to tracemonkey; if it happens again at least we get a stack trace.
Attached patch v12 (obsolete) — Splinter Review
Attachment #374562 - Attachment is obsolete: true
Attachment #374563 - Attachment is obsolete: true
Attachment #374565 - Attachment is obsolete: true
Attachment #374843 - Flags: review?(gal)
We just talked about the need to properly handle deep aborts and reconstruct the stack frame of the slow native constructor.
Attached patch interdiff v12 to v13 (obsolete) — Splinter Review
Attached patch interdiff v13 to v14 (obsolete) — Splinter Review
Remove some assertions.  Release the slow native frame when popping it.
Attached patch v14 (obsolete) — Splinter Review
Attachment #375356 - Flags: review?(gal)
Comment on attachment 375356 [details] [diff] [review]
v14

Working through tinderbox now, so requesting review.
Attachment #374843 - Attachment is obsolete: true
Attachment #374843 - Flags: review?(gal)
Attachment #375355 - Attachment is obsolete: true
nativeCalleeWord may seem ugly but the alternative is to

a) support slow native FrameInfo, which affects basically every function that uses FrameInfo, a huge pile of special cases; and

b) push FrameInfo onto the rp stack each time we call a slow native; and

c) make sure callDepth is correct, which requires more hacking.

I tried it and this is a lot cleaner.  However this is safe only because we can't reenter, and therefore on trace we can only have one native stack frame and it's always the top stack frame.
Comment on attachment 375356 [details] [diff] [review]
v14

Follow-up bug: apply(null, args) should check globalObj. If globalObj has been wrappen already, just proceed and bake in globalObj, otherwise abort (see JSOP_THIS).
Attachment #375356 - Flags: review?(gal) → review+
Needs rebasing (I should have pushed Friday, sorry).
Attached patch v14, refreshed (obsolete) — Splinter Review
Assignee: jorendorff → gal
Attachment #375263 - Attachment is obsolete: true
Attachment #375356 - Attachment is obsolete: true
Attachment #375748 - Attachment is obsolete: true
Whiteboard: fixed-in-tracemonkey
Depends on: 491837
Depends on: 491965
http://hg.mozilla.org/mozilla-central/rev/b8cf788763a0
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Depends on: 492028
Depends on: 491989
Depends on: 492693
Depends on: 493345
Blocks: 487240
upping this to a blocker, since it fixes so many bugs
Flags: wanted1.9.1?
Flags: blocking1.9.1-
Flags: blocking1.9.1+
js1_8_1/trace/trace-test.js
http://hg.mozilla.org/tracemonkey/rev/61892f57b46a
Flags: in-testsuite+
v 1.9.1, 1.9.2
Status: RESOLVED → VERIFIED
cvsroot/mozilla/js/tests/js1_8_1/trace/trace-test.js,v  <--  trace-test.js
new revision: 1.14; previous revision: 1.13

/cvsroot/mozilla/js/tests/shell.js,v  <--  shell.js
You need to log in before you can comment on or make changes to this bug.