Closed Bug 778460 Opened 13 years ago Closed 13 years ago

Tinderbox spidermonkey-rootanalysis builds broken since 2012-07-17

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla18

People

(Reporter: emorley, Assigned: sfink)

References

Details

(Whiteboard: [js:p2])

Attachments

(1 file)

Tinderbox mozilla-inbound_linux64-debug_spidermonkey-rootanalysis builds on inbound tip are segfaulting: https://tbpl.mozilla.org/php/getParsedLog.php?id=13941541&tree=Mozilla-Inbound https://tbpl.mozilla.org/?tree=Mozilla-Inbound&noignore=1&jobname=spidermonkey&rev=29bff59d3bbe Tracing this back, the seg faults started in this range: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=56a19b273c2b&tochange=b6291bb72294 Prior to that, they were still busted, but with a variety of jit_test.py failures: https://tbpl.mozilla.org/php/getParsedLog.php?id=13846927&tree=Mozilla-Inbound The range for these failures starting is: http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=4a25a6c994fd&tochange=2e8d41e9f8ef Given that these builds have been failing for 11 days without anyone noticing (and our current infra-load contained state of late), do we even need these builds?
> segfaulting: > https://tbpl.mozilla.org/php/getParsedLog.php?id=13941541&tree=Mozilla- { make -C jsapi-tests check make[1]: Entering directory `/builds/slave/m-in-lnx64-dbg-spidermonkey-rootanalysis/objdir/js/jsapi-tests' /builds/slave/m-in-lnx64-dbg-spidermonkey-rootanalysis/objdir/dist/bin/run-mozilla.sh /builds/slave/m-in-lnx64-dbg-spidermonkey-rootanalysis/objdir/dist/bin/jsapi-tests make[1]: Leaving directory `/builds/slave/m-in-lnx64-dbg-spidermonkey-rootanalysis/objdir/js/jsapi-tests' make[1]: *** [check] Segmentation fault } > Prior to that, they were still busted, but with a variety of jit_test.py > failures: > https://tbpl.mozilla.org/php/getParsedLog.php?id=13846927&tree=Mozilla- > Inbound (Too many to paste here)
The rooting analysis builds work by causing jit-tests to crash or assert when someone introduces new rooting hazards (which aren't a problem for the product we are currently shipping, but will become a big problem when we start doing generational/compacting GC). A couple weeks ago I got them green, then noticed a week or so ago they were orange again. I didn't do anything, because keeping the builds green is the job of the person doing the pushes, and I do not have time to watch for / track down bustage. Turning these builds orange is not currently cause for backout / hotfix, as the builds are not shown by default and are not run on try server. My understanding is the latter is a prerequisite for the former. Steve has a patch in bug 775355 to fix this, but it's been sitting for a week with no activity.
Thank you for clarification as to what these builds are for - I had no idea up until now :-) > Turning these builds orange is not currently cause for backout / hotfix, as > the builds are not shown by default and are not run on try server. My > understanding is the latter is a prerequisite for the former. Steve has a > patch in bug 775355 to fix this, but it's been sitting for a week with no > activity. It's ok, I wasn't proposing backing anything out :-) I know the SM hidden builds often get busted for a few days, so don't tend to give them a second thought. I was just surprised to keep on pressing the down arrow on TBPL and seeing some of them busted for 2-4 weeks (latter being the other bugs filed at the same time as this one) - so thought it would be worth filing so it was on someone's radar.
Blocks: 773059
Assignee: general → sphink
Summary: Tinderbox spidermonkey-rootanalysis builds busted for 11 days → Tinderbox spidermonkey-rootanalysis builds broken since 2012-07-17
I think I've finally tracked down part of what's going on. I could not reproduce locally. Still not sure why. I got access to a slave and could reproduce there, although the tree I built crashed in a different place. The gdb installed on the slaves is old and gives internal errors all over the place. Upgrading to a newer gdb fixes that, but only function names are given. No locals or line numbers are available. I created something similar to an --enable-shared-js build, and was able to get full debug info, but then the bug did not reproduce. I ended up debugging without debug info. I believe the root problem is that we have not added the --enable-rooting-analysis flag to the toplevel configure.in, so gc/Root.h is getting compiled with different flags. Without JSGC_ROOT_ANALYSIS, the Rooted<T> structure is different (it contains no 'stack' or 'prev' fields.) An inlined operator= will change the 'ptr' field, which is at offset zero without JSGC_ROOT_ANALYSIS. With it, 'stack' sits at that offset, and the debug check will dereference *stack. For pointer types, that's not a big deal, and should lead to an assert failure. For Rooted<Value>, this will dereference a Value interpreted as a pointer, and will probably crash. That's clear, but what I'm confused about is that my non-JSGC_ROOT_ANALYSIS operator= is getting called from within js::Interpret, which should see JSGC_ROOT_ANALYSIS and therefore have the 'stack' field. I'm wondering if that operator= was emitted into libxul.so and the linker is choosing it or something? Very confusing. (gdb only sees the |stack|-less one for Rooted<Value>, while it sees the |stack|-ful one for Rooted<JSObject*>.) It is possible that differing amounts of inlining explains why I can't reproduce it locally. Or different effective weakness of various symbols. Or something. Anyway, this will hopefully fix it all so I don't have to unravel everything that's going on.
Unless I've fooled myself again by doing a build that isn't really doing rooting analysis, which is not only possible but likely, http://tbpl.mozilla.org/?tree=Try&rev=1748cefd2cf8 says this change makes it green again.
Attachment #658910 - Flags: review?(terrence)
Attachment #658910 - Flags: review?(terrence) → review?(ted.mielczarek)
Comment on attachment 658910 [details] [diff] [review] Copy the various GC-related --enable-* configure options into toplevel configure.in to get correct defines Review of attachment 658910 [details] [diff] [review]: ----------------------------------------------------------------- Bleh, I hate the explosion of configure options here. Are we going to get rid of these in the near future?
Attachment #658910 - Flags: review?(ted.mielczarek) → review+
(In reply to Ted Mielczarek [:ted] from comment #6) > Bleh, I hate the explosion of configure options here. Are we going to get > rid of these in the near future? "Near future"? Hm, not unless your definition is pretty loose. But yes, exact rooting will become the default once it works 'n stuff. Dunno if we'll switch it to --enable-conservative-rooting then, but even if so, we'll probably break it before too long and have to remove it anyway. --enable-rooting-analysis is probably removable once exact rooting works. I would think that both incremental and conservative GC would eventually be compiled in all the time and controllable via preferences. I'll file a bug now for incremental, since I don't know what point there is in turning it off now: bug 789577.
Okay, just in my experience every single configure option is a sweet sweet honeypot for random people building Firefox to fiddle with, which inevitably results in broken builds. I much prefer that we don't add more knobs to twiddle unless we absolutely need them.
Whiteboard: [js:p2]
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla18
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: