Closed
Bug 1016538
Opened 10 years ago
Closed 10 years ago
Intermittent B2G test_bug346659.html,test_bug365932.html | application crashed [@ mozilla::layers::ISurfaceAllocator::~ISurfaceAllocator] (due to: "Assertion failure: mUsedShmems.empty(), at ../../../gecko/gfx/layers/ipc/ISurfaceAllocator.cpp:53")
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
FIXED
mozilla34
Tracking | Status | |
---|---|---|
firefox32 | --- | wontfix |
firefox33 | --- | fixed |
firefox34 | --- | fixed |
firefox-esr24 | --- | unaffected |
firefox-esr31 | --- | unaffected |
b2g-v1.4 | --- | unaffected |
b2g-v2.0 | --- | fixed |
b2g-v2.0M | --- | fixed |
b2g-v2.1 | --- | fixed |
People
(Reporter: ahal, Assigned: nical)
References
Details
(Keywords: assertion, intermittent-failure)
Attachments
(1 file)
1.59 KB,
patch
|
bas.schouten
:
review+
Sylvestre
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-beta-
bajaj
:
approval-mozilla-b2g32+
|
Details | Diff | Splinter Review |
https://tbpl.mozilla.org/php/getParsedLog.php?id=40473271&tree=Try
Full crash stack:
https://pastebin.mozilla.org/5262079
This happens intermittently when setting MOZ_CRASHREPORTER_SHUTDOWN to true (which aborts Gecko when a child process crashes) and calling appStartup.quit()
Updated•10 years ago
|
Keywords: intermittent-failure
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•10 years ago
|
Severity: normal → critical
Keywords: assertion
Summary: Intermittent emulator mochitest shutdown crash | application crashed [@ mozilla::layers::ISurfaceAllocator::~ISurfaceAllocator] | before MOZ_Assert: Assertion failure: mUsedShmems.empty(), at ../../../gecko/gfx/layers/ipc/ISurfaceAllocator.cpp:53 → Intermittent B2G test_bug365932.html | application crashed [@ mozilla::layers::ISurfaceAllocator::~ISurfaceAllocator] (due to: "Assertion failure: mUsedShmems.empty(), at ../../../gecko/gfx/layers/ipc/ISurfaceAllocator.cpp:53")
Comment 8•10 years ago
|
||
Comment 9•10 years ago
|
||
Note a spike in this will just be because we're now catching cases that previously went unnoticed, prior to bug 969146.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 34•10 years ago
|
||
Turns out this is extraordinarily frequent. Nicolas, can you take a look at this by chance or suggest someone else who could?
Flags: needinfo?(nical.bugzilla)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 81•10 years ago
|
||
Summary: Intermittent B2G test_bug365932.html | application crashed [@ mozilla::layers::ISurfaceAllocator::~ISurfaceAllocator] (due to: "Assertion failure: mUsedShmems.empty(), at ../../../gecko/gfx/layers/ipc/ISurfaceAllocator.cpp:53") → Intermittent B2G test_bug346659.html,test_bug365932.html | application crashed [@ mozilla::layers::ISurfaceAllocator::~ISurfaceAllocator] (due to: "Assertion failure: mUsedShmems.empty(), at ../../../gecko/gfx/layers/ipc/ISurfaceAllocator.cpp:53")
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 84•10 years ago
|
||
Jeff, as module owner, could you find someone to look at this top orange on B2G? (escalating per https://wiki.mozilla.org/Sheriffing/Test_Disabling_Policy). (~70 failures on trunk in the last week alone). Thanks :-)
Flags: needinfo?(jmuizelaar)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 89•10 years ago
|
||
Sorry for the wait, I was away for 2 weeks.
What's happening here:
ISurfaceAllocator has a basic allocator which only purpose at the moment is to store the atomic copy-on-write counters of tiles (see gfxShmSharedReadLock). The way this allocator is written, counters can be deallocated on either the client or host side, be the shared memory itself underneath can only be deallocated on the client side. This means that all counters must be deallocated on both sides before the child side of ISurfaceAllocatro shuts down, otherwise there will be nobody to deallocate the shared memory. This assertion means "I am shutting down the client side, but there are still some counters alive".
I think that this means that the host is still asynchronously destroying tiles and some of the counters are still alive.
It's not entirely clear to me whether the best solution is to make sure that all of the tiles are destroyed on both sides before ISurfaceAllocator's dtor (imposing this kind of constraint on shutdown has proven to be tedious many times), or if we need to make it possible for the host side to destroy the shmems as well (maybe not easy). We are possibly late enough in the shutdown that we don't really have the luxury to send more messages to let the host side deallocate the shmems.
Flags: needinfo?(nical.bugzilla)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 94•10 years ago
|
||
Actually, I can reproduce this bug even when all of the gfxShmSharedReadLocks are destroyed, so it's also possible that we are just messing up the Lock/Unlock logic. Still digging...
Assignee | ||
Comment 95•10 years ago
|
||
So what was actually happening was that when the copy-on-write path triggers in GetBackBuffer, we forget to ReadUnlock the tile, and replace the gfxSharedReadLock, loosing the possibility to balance the read count. in the same-process case it's not a big deal since the thing is wrapped in a refcounted object used by both sides, but in the cross-process case we effectively leak shmem sections.
This patch also adds an assertion in the same-process path so that we can catch this problem more easily if it happens again.
Assignee: nobody → nical.bugzilla
Attachment #8479091 -
Flags: review?(bas)
Assignee | ||
Comment 96•10 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 101•10 years ago
|
||
Oops, the last try push was build-only: https://tbpl.mozilla.org/?tree=Try&rev=bf2821d7bd31
Comment 102•10 years ago
|
||
FYI, you can trigger tests on those pushes now.
https://wiki.mozilla.org/ReleaseEngineering/How_To/Trigger_arbitrary_jobs
Assignee | ||
Comment 103•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #102)
> FYI, you can trigger tests on those pushes now.
> https://wiki.mozilla.org/ReleaseEngineering/How_To/Trigger_arbitrary_jobs
Thanks!
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(jmuizelaar)
Updated•10 years ago
|
Attachment #8479091 -
Flags: review?(bas) → review+
Assignee | ||
Comment 104•10 years ago
|
||
Comment 105•10 years ago
|
||
This bug was first reported when Gecko 32 was on m-c, but the next star wasn't until weeks later when we were up to 33, so I'm not sure if v2.0 is affected by this or not.
status-b2g-v1.4:
--- → unaffected
status-b2g-v2.0:
--- → ?
status-b2g-v2.1:
--- → affected
status-firefox32:
--- → wontfix
status-firefox33:
--- → affected
status-firefox34:
--- → affected
status-firefox-esr24:
--- → unaffected
status-firefox-esr31:
--- → unaffected
Comment hidden (Legacy TBPL/Treeherder Robot) |
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 109•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #105)
> This bug was first reported when Gecko 32 was on m-c, but the next star
> wasn't until weeks later when we were up to 33, so I'm not sure if v2.0 is
> affected by this or not.
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 110•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #105)
> This bug was first reported when Gecko 32 was on m-c, but the next star
> wasn't until weeks later when we were up to 33, so I'm not sure if v2.0 is
> affected by this or not.
Yes, I checked the code and the issue is in 2.0 as well.
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 111•10 years ago
|
||
Comment on attachment 8479091 [details] [diff] [review]
Fix the locking logic in tiling
Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: memory leak in b2g and e10s (with tiling enabled)
[Describe test coverage new/current, TBPL]:
[Risks and why]: not too risky, easy to back out (small change)
[String/UUID change made/needed]:
Attachment #8479091 -
Flags: approval-mozilla-beta?
Attachment #8479091 -
Flags: approval-mozilla-b2g32?
Attachment #8479091 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
Comment 112•10 years ago
|
||
Comment on attachment 8479091 [details] [diff] [review]
Fix the locking logic in tiling
Too late for 32. Sorry.
Attachment #8479091 -
Flags: approval-mozilla-beta?
Attachment #8479091 -
Flags: approval-mozilla-beta-
Attachment #8479091 -
Flags: approval-mozilla-aurora?
Attachment #8479091 -
Flags: approval-mozilla-aurora+
Comment 113•10 years ago
|
||
Comment 114•10 years ago
|
||
Nervous to take any non-blocking changes on 2.0 especially in areas of gfx or areas that are high risk, given our stability situation. :milan, can you give your feedback here on landing this on 2.0 given b2g32 is not taking it ?
Flags: needinfo?(milan)
Comment 115•10 years ago
|
||
We can get away with not taking this for desktop - it only effects E10S, and you can memory leak more on desktop before you notice it. Same thing on Android, because we're all in the same process, I don't think we get the memory leak at all. On B2G we do, and we're tight for memory, so if we don't take this for 2.0, we're leaving an intermittent memory leak in place. It may even show up in the MTBF numbers. The change is as safe as a change this late can be.
Flags: needinfo?(milan)
Comment 116•10 years ago
|
||
Comment on attachment 8479091 [details] [diff] [review]
Fix the locking logic in tiling
Approving based on :milan's feedback.
Attachment #8479091 -
Flags: approval-mozilla-b2g32? → approval-mozilla-b2g32+
Comment 117•10 years ago
|
||
status-b2g-v2.0M:
--- → affected
Comment 118•10 years ago
|
||
Comment 119•10 years ago
|
||
Unable to verify since it's a back end issue.
QA Whiteboard: [QAnalyst-Triage?][QAnalyst-verify-]
Flags: needinfo?(ktucker)
Updated•10 years ago
|
QA Whiteboard: [QAnalyst-Triage?][QAnalyst-verify-] → [QAnalyst-Triage+][QAnalyst-verify-]
Flags: needinfo?(ktucker)
You need to log in
before you can comment on or make changes to this bug.
Description
•