Closed
Bug 1338699
Opened 7 years ago
Closed 7 years ago
Intermittent LulIntegration.unwind_consistency | Value of: nTestsPassed == nTests
Categories
(Core :: Gecko Profiler, defect)
Core
Gecko Profiler
Tracking
()
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox52 | --- | wontfix |
firefox-esr52 | --- | fixed |
firefox53 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | unaffected |
People
(Reporter: intermittent-bug-filer, Assigned: RyanVM)
Details
(Keywords: intermittent-failure)
Filed by: rvandermeulen [at] mozilla.com https://treeherder.mozilla.org/logviewer.html#?job_id=76059499&repo=mozilla-beta https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-beta-linux-pgo/1486662080/mozilla-beta_ubuntu32_vm_test_pgo-gtest-bm05-tests1-linux32-build9.txt.gz Seems to only be happening on Linux32 GTests on Beta since https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?changeset=dbd0fa47f7a2 landed. Failing 30-40ish% of time.
Assignee | ||
Comment 1•7 years ago
|
||
Maybe bug 995069? Otherwise bug 1334933?
status-firefox52:
--- → affected
Flags: needinfo?(jdemooij)
Comment 2•7 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #1) > Maybe bug 995069? Otherwise bug 1334933? Hm. The former is unlikely, it adds a different mechanism to generate a random seed on Linux, but we (should have!) returned random numbers there before so I doubt it changed anything. If it's bug 1334933 it's likely exposing a pre-existing bug with the stack unwinder or its tests. Maybe we can do Try pushes to bisect?
Comment 3•7 years ago
|
||
jseward, what is LulIntegration.unwind_consistency testing exactly? The main change in bug 1334933 is that we now reserve a 128 MB chunk of memory at a random address (within a particular range). Could that be affecting these tests somehow? It's weird it doesn't fail on Aurora/Nightly.
Flags: needinfo?(jdemooij) → needinfo?(jseward)
Comment hidden (Intermittent Failures Robot) |
Comment 5•7 years ago
|
||
The comments above do paint a coherent story (unfortunately). The LUL test reads debuginfo from libxul.so so as to do some test unwinds with it. To read the debuginfo, LUL mmaps .so files temporarily, pulls what it needs out of them and unmaps them. So far, so good. Except that libxul.so with debuginfo is at least several hundred megabytes, and finding several hundred megabytes of consecutive free space in a 32 bit process is pushing one's luck, so to speak. Until now it's been OK. But with 128MB chunks of address space disappearing at random addresses, it's very likely that LUL's mmapping is now failing randomly, depending on where the 128MB chunk ends up. This also explains why it only fails on 32 bit targets -- 64 bit targets have more than enough address space to mmap libxul.so many times over.
Flags: needinfo?(jseward)
Comment 6•7 years ago
|
||
I should say, this is not a new problem. The same thing happened for Valgrind running Firefox for Android on 32 bit targets, and is now happening for the Gecko Profiler Plugin's symbolisation on Linux (even 64 bit, because it is done in a 32-bit emscripten-world). The fix for Valgrind on 32 bit was to insert an abstraction layer into the debuginfo reader, so that all accesses to the underlying file go through the abstraction layer, and have that abstraction layer be implemented differently -- for example, as a fixed size cache that keeps a selection of most-recently-read slices of the file. That worked well for Valgrind. I'll see if something similar is possible here.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 9•7 years ago
|
||
Any updates on this? We're officially failing on ESR52 now too :(
Comment 10•7 years ago
|
||
I discussed this on irc with Ryan. To summarise: * We don't have any hard data that this affects anything other than ESR52. * Fixing this properly would require inserting an abstraction layer low in tools/profiler/lul/Lul{Dwarf,Elf}.cpp and possibly other files. This is on the order of 2 - 5 days work and will generate a patch which is too large (risky) for backport to ESR52. * We probably have few linux32 users running the profiler. * Ryan says "I'd vote wontfix and skipping the test on ESR52 (with as narrowly-scoped of a skipping as we can do)" and I am inclined to agree. Ryan, does that sync with your understanding?
Flags: needinfo?(ryanvm)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 16•7 years ago
|
||
I ended up just skipping LulTest.cpp on x86 at the moz.build level since LulIntegration.unwind_consistency was the only test contained within it anyway. I've verified post-push that the test is still running on Linux64 and no longer on Linux32. https://hg.mozilla.org/releases/mozilla-esr52/rev/94ce63191069
Assignee: nobody → ryanvm
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox53:
--- → unaffected
status-firefox54:
--- → unaffected
status-firefox55:
--- → unaffected
status-firefox-esr52:
--- → affected
Resolution: --- → FIXED
Assignee | ||
Updated•7 years ago
|
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•