Closed Bug 1338699 Opened 7 years ago Closed 7 years ago

Intermittent LulIntegration.unwind_consistency | Value of: nTestsPassed == nTests

Categories

(Core :: Gecko Profiler, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox52 --- wontfix
firefox-esr52 --- fixed
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- unaffected

People

(Reporter: intermittent-bug-filer, Assigned: RyanVM)

Details

(Keywords: intermittent-failure)

Maybe bug 995069? Otherwise bug 1334933?
Flags: needinfo?(jdemooij)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #1)
> Maybe bug 995069? Otherwise bug 1334933?

Hm. The former is unlikely, it adds a different mechanism to generate a random seed on Linux, but we (should have!) returned random numbers there before so I doubt it changed anything.

If it's bug 1334933 it's likely exposing a pre-existing bug with the stack unwinder or its tests. Maybe we can do Try pushes to bisect?
jseward, what is LulIntegration.unwind_consistency testing exactly? The main change in bug 1334933 is that we now reserve a 128 MB chunk of memory at a random address (within a particular range). Could that be affecting these tests somehow? It's weird it doesn't fail on Aurora/Nightly.
Flags: needinfo?(jdemooij) → needinfo?(jseward)
The comments above do paint a coherent story (unfortunately).

The LUL test reads debuginfo from libxul.so so as to do some
test unwinds with it.  To read the debuginfo, LUL mmaps .so
files temporarily, pulls what it needs out of them and unmaps
them.

So far, so good.  Except that libxul.so with debuginfo is at
least several hundred megabytes, and finding several hundred
megabytes of consecutive free space in a 32 bit process is
pushing one's luck, so to speak.

Until now it's been OK.  But with 128MB chunks of address space
disappearing at random addresses, it's very likely that LUL's
mmapping is now failing randomly, depending on where the 128MB
chunk ends up.  This also explains why it only fails on 32 bit
targets -- 64 bit targets have more than enough address space
to mmap libxul.so many times over.
Flags: needinfo?(jseward)
I should say, this is not a new problem.  The same thing happened
for Valgrind running Firefox for Android on 32 bit targets, and 
is now happening for the Gecko Profiler Plugin's symbolisation
on Linux (even 64 bit, because it is done in a 32-bit
emscripten-world).

The fix for Valgrind on 32 bit was to insert an abstraction layer
into the debuginfo reader, so that all accesses to the underlying
file go through the abstraction layer, and have that abstraction
layer be implemented differently -- for example, as a fixed size
cache that keeps a selection of most-recently-read slices of the
file.  That worked well for Valgrind.  I'll see if something 
similar is possible here.
Any updates on this? We're officially failing on ESR52 now too :(
I discussed this on irc with Ryan.  To summarise:

* We don't have any hard data that this affects anything other than ESR52.

* Fixing this properly would require inserting an abstraction layer low in
  tools/profiler/lul/Lul{Dwarf,Elf}.cpp and possibly other files.  This is on
  the order of 2 - 5 days work and will generate a patch which is too large
  (risky) for backport to ESR52.

* We probably have few linux32 users running the profiler.

* Ryan says "I'd vote wontfix and skipping the test on ESR52 (with as
  narrowly-scoped of a skipping as we can do)" and I am inclined to agree.

Ryan, does that sync with your understanding?
Flags: needinfo?(ryanvm)
Yes, I think that sums it up well.
Flags: needinfo?(ryanvm)
I ended up just skipping LulTest.cpp on x86 at the moz.build level since LulIntegration.unwind_consistency was the only test contained within it anyway. I've verified post-push that the test is still running on Linux64 and no longer on Linux32.

https://hg.mozilla.org/releases/mozilla-esr52/rev/94ce63191069
Assignee: nobody → ryanvm
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.