Closed Bug 574313 Opened 14 years ago Closed 14 years ago

turning autorooters into conservative stack scan verifiers

Categories

(Core :: JavaScript Engine, defect)

Other Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: igor, Assigned: igor)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 2 obsolete files)

+++ This bug was initially created as a clone of Bug #573709 +++ To increase test coverage for the conservative stack scanner I suggest run the stack scan as the first thing during the GC marking phase and then change autorooters to insist that the stack-based GC things are marked.
Attached patch v1 (obsolete) — Splinter Review
The patch moves the conservative scanner to the beginning of js_TraceRuntime so the following autoroot loop can verify that nothing is missed. Note that the verifier part calls abort() even in optimized builds for max test coverage. The plan is to keep it like that for couple of weeks and then remove the no longer needed autoroots.
With the patch I am getting try server failures on file:///builds/slave/tryserver-macosx-debug-unittest-jsreftest/build/jsreftest/tests/jsreftest.html?test=js1_8_5/extensions/array-length-protochange.js
Very suspicious test name. Might be a nice find.
Aparently there some places in FF where the autoroots are used outside a request. This is bad. I will try to fix this but this may not be straightforward as it is not clear at the call sites if using AutoRequest is safe as other cx could be active on the stack. An alternative is to always do the stack scan for a thread that calls the GC even if the GC was called outside a request and tolerate extra leaks. This will assume that web workers and extensions still properly respect the request model.
(In reply to comment #4) > An alternative is to always do > the stack scan for a thread that calls the GC even if the GC was called outside > a request and tolerate extra leaks. Note that an autoroots outside a request is bad in general. The autoroot link manipulations are not locked and rely on the request model to serialize with the GC. It only works in FF because thete the GC can only run on the main thread.
Lets fix the FF code. This is a clear API violation. And we have now a great way to find them.
Blocks: 572057
Any ETA on this?
If I base the patch on the bug 552266, then this could be ready before Friday. Without that fixing request model violations requres a lot of extra testing.
Depends on: 552266
I am getting with a path tryserver failures like: REFTEST TEST-START | file:///Users/cltbld/talos-slave/tryserver-leopard-debug-u-jsreftest/build/jsreftest/tests/jsreftest.html?test=js1_8/regress/regress-464418.js ++DOMWINDOW == 10 (0xd1264f8) [serial = 5191] [outer = 0xcd4a8f0] begin test: js1_8/regress/regress-464418.js BUGNUMBER: 464418 STATUS: Do not assert: fp->slots + fp->script->nfixed + js_ReconstructStackDepth(cx, fp->script, fp->regs->pc) == fp->regs->sp Conservative stack GC scanner has missed a root on the stack. Aborting. investigating...
Attached patch v2 (obsolete) — Splinter Review
With the bug 580578 fixed I no longer see any aborts on the tryserver due to missed roots.
Attachment #453855 - Attachment is obsolete: true
Attachment #460203 - Flags: review?(anygregor)
Attached patch v3Splinter Review
patch with more informative abort message
Attachment #460203 - Attachment is obsolete: true
Attachment #460206 - Flags: review?(anygregor)
Attachment #460203 - Flags: review?(anygregor)
The plan for this bug is to have the abort code at least in one beta before we can remove it together with autorooters.
Comment on attachment 460206 [details] [diff] [review] v3 >- * The conservative stack scanner runs before we mark extra roots, >- * which may use additional colors to implement cycle collection. >+ * We mark extra roots at the last thing so it can use use additional >+ * colors to implement cycle collection. One "use" is enough :)
Attachment #460206 - Flags: review?(anygregor) → review+
http://hg.mozilla.org/tracemonkey/rev/0b6cbd5916c6 - compared with v3 I have added missing null check to skip AutoObjectRooter if it contains a null pointer.
Whiteboard: fixed-in-tracemonkey
Depends on: 583763
Landing of this bug lead to sporadic of mochi test failures on Windows. Unfortunately I do not have time to work on this today, so, if necessary, this should be backed out from relevant branches.
Depends on: 583598
Since this landed on the trunk load, I'm getting the "Conservative GC scanner has missed the root...." assert quite often when a background thread calls a function that returns a large amount of data. For example in my code I have a function that reads a file and returns the data. I call this function in a loop to read all the files in a folder as such: files.forEach(function(aFile) { let data = this.readFile(file); // do stuff }, module_object); function readFile(file, aHeaderOnly) { let data = doActualFileRead(file, aHeaderOnly); // do processing on data return data; } The readFile function is called in a background thread two instances in my add-on. In the first case, the aHeaderOnly flag is set which causes only 1020 bytes to be read from the file. That works fine. The other instance doesn't set that flag (the code above readFile). That has a high tendency of crashing when executed in a background thread. At first I thought it was the file reads since it was happening when large files were read, but if I replace the "return data;" in readFile function with "return null;" or even 'return "hello";' the crashes no longer occur. Since the files are still being read, it can't be the file reads that are causing the assert. I then thought that maybe it was the processing being done on the data being returned (the "do stuff" code) that caused the assert so I commented it out and the asserts still happened. The largest file reads in 4,170,162 bytes and returning that data crashes pretty much every time. The 2nd largest file is 295050 and that is returned fine. I replaced the readFile function with the following: function readFile(file, aHeaderOnly) { let a = ""; for (var i=0; i < 1000000; i++) a += "1"; return a; } and that crashed as well despite doing no file reads at all. I then started changing the "1000000" to see how low it would need to be and still work. When I changed it to "500000" Minefield aborted after the 2nd time the function was called. When I changed it to "250000" the function was called about 15 times before Minefield aborted. This seemed to indicate that the less data returned, the less likely it is to abort. I also found that if my machine was busy (drive thrashing), it has less of a chance of crashing which indicates a timing issue. I tried removing the module_object "this" parameter from the foreach function and it still aborted. I tried changing the foreach to a normal for and even a while loop and in both cases it still aborted. I then removed the foreach function altogether and just made a bunch of direct calls to module_object.readFile. The aborts no longer occurred. I went back to my original code and changed it so instead of looping, I created a function call that does exactly what was inside the loop and added one call to the function for every file in the folder (52 lines, one for each file). That still aborted on occasion, but the vast majority of the time it worked as opposed to aborting the vast majority of the time. So it seems the simple act of returning data from a function call has a chance of causing data to get lost resulting in the assert when doing a GC. The larger the amount of data returned, the better than chance of aborting. If the call to the function is made from inside a loop, the chance of asserting increases exponentially such that for large amounts of data it's almost a certainty.
(In reply to comment #16) > Since this landed on the trunk load, I'm getting the "Conservative GC scanner > has missed the root...." assert quite often when a background thread calls a > function that returns a large amount of data. Is this with optimized build of FF? If yes, could you try a debug one and see if you also hits other asserts?
(In reply to comment #17) > Is this with optimized build of FF? If yes, could you try a debug one and see > if you also hits other asserts? Yes it's with the optimized nightly win32 FF build from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/latest-trunk/. Is there some place I can get a win32 debug build? I tried installing a build from the following location, but I keep getting a message stating that "This application has failed to start because the application configuration is incorrect. Reinstalling the application may fix this problem." http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32-debug/
I managed to get the tinderbox-builds debug builds to run (needed to install VC 2005), but unfortunately the debug builds I've tried, crash with an assertion on startup with a new profile in safe-mode, so I can't even get to the point where I can test my add-on. I tried both the latest mozilla-central-win32-debug and tracemonkey-win32-debug builds so there must be something about those builds that my system doesn't like. If you're curious the assertion is: ###!!! ASSERTION: Can't create document accessible!: 'docAcc', file e:/builds/moz2_slave/mozilla-central-win32-debug/build/accessible/src/base/nsAccDocManager.cpp, line 354 (or 351 for 1281091389 tracemonkey build)
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: