Closed
Bug 982128
Opened 11 years ago
Closed 11 years ago
crash in mozilla::layers::ClientTiledLayerBuffer::ReadLock()
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
Tracking | Status | |
---|---|---|
b2g-v1.4 | --- | fixed |
People
(Reporter: RobertC, Assigned: bas.schouten)
References
Details
(4 keywords, Whiteboard: [fromAutomation][b2g-crash])
Crash Data
Attachments
(1 file)
6.12 KB,
patch
|
cwiiis
:
review+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is
report bp-6b3bf01b-2895-4c46-917a-c0e172140311.
=============================================================
This crash occurs intermittently on local device, but seems to only be triggered by the automation tests.
It is caused when setting a new alarm in the clock app and the test tries to tap on the repeat element.
Build info:
Gaia a351fe62c11737c722ad33aaff438f6ccd00bd4a
Gecko https://hg.mozilla.org/mozilla-central/rev/41d962d23e81
BuildID 20140311040203
Version 30.0a1
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013
The test that causes the failure is found at:
gaiatest/tests/functional/clock/test_clock_set_alarm.py
Reporter | ||
Comment 1•11 years ago
|
||
The test ran on hamachi.
Comment 2•11 years ago
|
||
I've debugged this a bit more and it seems related to automation using the select menu so I've changed the component to Gaia. I can also replicate this crash in other tests that use the select menu picker.
Basically all this test does is:
1. Open Clock app, creates a new alarm
2. Iterates through the repeat option menu turning on and off then click 'OK' to confirm the selection.
It is much more likely to crash when the automation needs to scroll through options outisde the viewport, then select the item. The automation does it much faster than you can manually so it seems to be a race condition perhaps.
It is replicable about 9/10 times.
Here's another report:
https://crash-stats.mozilla.com/report/index/b809d171-fd36-4415-839a-9bc962140311
Comment 3•11 years ago
|
||
Easy to repro in automation, harder (but still possible) to hit on device.
blocking-b2g: --- → 1.4?
Component: Gaia::Clock → Graphics: Layers
Keywords: regression
Product: Firefox OS → Core
Whiteboard: [fromAutomation] → [fromAutomation][b2g-crash]
Version: unspecified → Trunk
Comment 4•11 years ago
|
||
Discussed with vlad & bas in IRC - this is a regression from enabling of tiling.
Blocks: b2g-tiling
Updated•11 years ago
|
Assignee: nobody → nical.bugzilla
Reporter | ||
Comment 5•11 years ago
|
||
I got the same crash when navigating in the browser. It's reproduced intermittently with manual testing, but 100% from automation.
STR:
1. go to http://mozqa.com/data/firefox/layout/mozilla.html
2. tap on the "more" link from the community section
Expected result:
Page should load all elements
Actual result:
Browser crashes
Here is another report:
https://crash-stats.mozilla.com/report/index/5f93b84a-2781-44d2-b454-fa4782140312
Tested on hamachi with build:
Gaia 3005269d4dcabcc7d27eaf72bda44a969873af8c
Gecko https://hg.mozilla.org/mozilla-central/rev/23005b395ae8
BuildID 20140312040203
Version 30.0a1
ro.build.version.incremental=eng.tclxa.20131223.163538
ro.build.date=Mon Dec 23 16:36:04 CST 2013
Comment 6•11 years ago
|
||
We crash when trying to increment the lock count on a tile before sending it. This tells us that most likely we either don't have a lock or the lock's share memory wasn't allocated properly. Looking at the code, the second possibility is the most likely because we don't check the result of the shmem allocation in gfxShmSharedLemoryLock's constructor and (which we definitely should), and I couldn't find a code path where we set the tile's buffer without setting the lock.
This patch adds checks and assertions that will make it much easier to validate the above theory.
It doesn't fix the problem, though.
If we failed to allocate the shmem it's most likely because we hit the limit of open file descriptors (yeah, 1 tile = lock = 1FD...). In that case the best thing we can do is to stop using 1 shmem per tile, and allocate a shmem per TiledContentClient or per ClientTiledLayerBuffer. It's probably not a patch I can write by tomorrow and I'll be absent tomorrow the entire day.
Let's make sure the problem actually is the FD limit and decide where to go from there.
Attachment #8389856 -
Flags: review?(chrislord.net)
Comment 7•11 years ago
|
||
Comment on attachment 8389856 [details] [diff] [review]
Add assertions and checks that the lock exists and was properly allocated
Review of attachment 8389856 [details] [diff] [review]:
-----------------------------------------------------------------
::: gfx/layers/client/TiledContentClient.cpp
@@ +523,5 @@
> } else {
> mBackLock = new gfxShmSharedReadLock(mManager->AsShadowForwarder());
> }
> +
> + MOZ_ASSERT(mBackLock->IsValid());
I wonder if we could just return a placeholder tile when this fails, rather than asserting? I imagine this will be gone as soon as Bas finishes up the multiple-locks-in-one-shmem-segment bug though, so it's no big deal.
Attachment #8389856 -
Flags: review?(chrislord.net) → review+
Comment 8•11 years ago
|
||
(In reply to Chris Lord [:cwiiis] from comment #7)
> I wonder if we could just return a placeholder tile when this fails, rather
> than asserting? I imagine this will be gone as soon as Bas finishes up the
> multiple-locks-in-one-shmem-segment bug though, so it's no big deal.
I thought about doing this but we'd replace a crash by a correctness issue which is harder to find and debug, so I'd rather resolve to this at the last moment if all else fails.
Comment 9•11 years ago
|
||
Whiteboard: [fromAutomation][b2g-crash] → [fromAutomation][b2g-crash][leave-open]
Comment 10•11 years ago
|
||
We hit this crash consistently in Browser (repro ~75%) and intermittently in other apps (Gallery)
STR (same as comment 5)
1. Open Browser
2. Navigate to any URL
3. Select any links
==> crash occurs
It fails our manual smoke test browser cases.
BuildID: 20140312040203
Gaia: 3005269d4dcabcc7d27eaf72bda44a969873af8c
Gecko: 23005b395ae8
Version: 30.0a1
Keywords: smoketest
Comment 11•11 years ago
|
||
Yeah, no placeholder tiles please. This will need a complex fix. Even per thebes buffer might not be enough.
Updated•11 years ago
|
blocking-b2g: 1.4? → 1.4+
Updated•11 years ago
|
Keywords: reproducible
Comment 12•11 years ago
|
||
I won't be available tomorrow, so I'll let Bas track this (or reassign). I think that this bug is the same as bug 981315, although i am not 100% sure.
Assignee: nical.bugzilla → bas
Comment 14•11 years ago
|
||
Able to reproduce this somewhat consistently within the Clock App while using the Timer. Please see Bug 982789 for repro steps that were used to experience the crash four times this morning.
Assignee | ||
Comment 17•11 years ago
|
||
My work in bug 981315 should make this -a lot- less likely regardless of what else is going on if this is related to the allocation failing.
Assignee | ||
Comment 18•11 years ago
|
||
Assignee | ||
Comment 19•11 years ago
|
||
(In reply to Bas Schouten (:bas.schouten) from comment #18)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/193df828b5a2
This landed for bug 981315. When it's on central, let's see if it fixes this too.
Comment 22•11 years ago
|
||
This crash no longer reproduces during manual testing using the latest master
BuildID: 20140313131233
Gaia: ea9e23abea5933656555d849b922c8da7530c90b
Gecko: fe40387eba1a
Version: 30.0a1
v1.2-device.cfg
Comment 23•11 years ago
|
||
Sounds like the pieces landed here have fixed the problem, so closing this.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [fromAutomation][b2g-crash][leave-open] → [fromAutomation][b2g-crash]
Updated•11 years ago
|
status-b2g-v1.4:
--- → fixed
Target Milestone: --- → mozilla30
You need to log in
before you can comment on or make changes to this bug.
Description
•