Closed Bug 982128 Opened 11 years ago Closed 11 years ago

crash in mozilla::layers::ClientTiledLayerBuffer::ReadLock()

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla30

Project Flags:

blocking-b2g

1.4+

Tracking Flags:

Tracking

Status

b2g-v1.4

---

fixed

People

(Reporter: RobertC, Assigned: bas.schouten)

References

Details

(4 keywords, Whiteboard: [fromAutomation][b2g-crash])

Crash Data

Attachments

(1 file)

Add assertions and checks that the lock exists and was properly allocated 11 years ago Nicolas Silva [:nical] 6.12 KB, patch	cwiiis : review+	Details \| Diff \| Splinter Review

Robert Chira [:RobertC]

Reporter

Description

•

11 years ago

This bug was filed from the Socorro interface and is report bp-6b3bf01b-2895-4c46-917a-c0e172140311. ============================================================= This crash occurs intermittently on local device, but seems to only be triggered by the automation tests. It is caused when setting a new alarm in the clock app and the test tries to tap on the repeat element. Build info: Gaia a351fe62c11737c722ad33aaff438f6ccd00bd4a Gecko https://hg.mozilla.org/mozilla-central/rev/41d962d23e81 BuildID 20140311040203 Version 30.0a1 ro.build.version.incremental=eng.tclxa.20131223.163538 ro.build.date=Mon Dec 23 16:36:04 CST 2013 The test that causes the failure is found at: gaiatest/tests/functional/clock/test_clock_set_alarm.py

Robert Chira [:RobertC]

Reporter

Comment 1

•

11 years ago

The test ran on hamachi.

Zac C (:zac)

Comment 2

•

11 years ago

I've debugged this a bit more and it seems related to automation using the select menu so I've changed the component to Gaia. I can also replicate this crash in other tests that use the select menu picker. Basically all this test does is: 1. Open Clock app, creates a new alarm 2. Iterates through the repeat option menu turning on and off then click 'OK' to confirm the selection. It is much more likely to crash when the automation needs to scroll through options outisde the viewport, then select the item. The automation does it much faster than you can manually so it seems to be a race condition perhaps. It is replicable about 9/10 times. Here's another report: https://crash-stats.mozilla.com/report/index/b809d171-fd36-4415-839a-9bc962140311

Jason Smith [:jsmith]

Comment 3

•

11 years ago

Easy to repro in automation, harder (but still possible) to hit on device.

blocking-b2g: --- → 1.4?

Component: Gaia::Clock → Graphics: Layers

Keywords: regression

Product: Firefox OS → Core

Whiteboard: [fromAutomation] → [fromAutomation][b2g-crash]

Version: unspecified → Trunk

Jason Smith [:jsmith]

Comment 4

•

11 years ago

Discussed with vlad & bas in IRC - this is a regression from enabling of tiling.

Blocks: b2g-tiling

Nicolas Silva [:nical]

Updated

•

11 years ago

Assignee: nobody → nical.bugzilla

Robert Chira [:RobertC]

Reporter

Comment 5

•

11 years ago

I got the same crash when navigating in the browser. It's reproduced intermittently with manual testing, but 100% from automation. STR: 1. go to http://mozqa.com/data/firefox/layout/mozilla.html 2. tap on the "more" link from the community section Expected result: Page should load all elements Actual result: Browser crashes Here is another report: https://crash-stats.mozilla.com/report/index/5f93b84a-2781-44d2-b454-fa4782140312 Tested on hamachi with build: Gaia 3005269d4dcabcc7d27eaf72bda44a969873af8c Gecko https://hg.mozilla.org/mozilla-central/rev/23005b395ae8 BuildID 20140312040203 Version 30.0a1 ro.build.version.incremental=eng.tclxa.20131223.163538 ro.build.date=Mon Dec 23 16:36:04 CST 2013

Nicolas Silva [:nical]

Comment 6

•

11 years ago

Attached patch Add assertions and checks that the lock exists and was properly allocated — Details — Splinter Review

We crash when trying to increment the lock count on a tile before sending it. This tells us that most likely we either don't have a lock or the lock's share memory wasn't allocated properly. Looking at the code, the second possibility is the most likely because we don't check the result of the shmem allocation in gfxShmSharedLemoryLock's constructor and (which we definitely should), and I couldn't find a code path where we set the tile's buffer without setting the lock. This patch adds checks and assertions that will make it much easier to validate the above theory. It doesn't fix the problem, though. If we failed to allocate the shmem it's most likely because we hit the limit of open file descriptors (yeah, 1 tile = lock = 1FD...). In that case the best thing we can do is to stop using 1 shmem per tile, and allocate a shmem per TiledContentClient or per ClientTiledLayerBuffer. It's probably not a patch I can write by tomorrow and I'll be absent tomorrow the entire day. Let's make sure the problem actually is the FD limit and decide where to go from there.

Attachment #8389856 - Flags: review?(chrislord.net)

Chris Lord [:cwiiis]

Comment 7

•

11 years ago

Comment on attachment 8389856 [details] [diff] [review] Add assertions and checks that the lock exists and was properly allocated Review of attachment 8389856 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/client/TiledContentClient.cpp @@ +523,5 @@ > } else { > mBackLock = new gfxShmSharedReadLock(mManager->AsShadowForwarder()); > } > + > + MOZ_ASSERT(mBackLock->IsValid()); I wonder if we could just return a placeholder tile when this fails, rather than asserting? I imagine this will be gone as soon as Bas finishes up the multiple-locks-in-one-shmem-segment bug though, so it's no big deal.

Attachment #8389856 - Flags: review?(chrislord.net) → review+

Nicolas Silva [:nical]

•

11 years ago

Yeah, no placeholder tiles please. This will need a complex fix. Even per thebes buffer might not be enough.

Jason Smith [:jsmith]

Updated

•

11 years ago

blocking-b2g: 1.4? → 1.4+

Jason Smith [:jsmith]

Updated

•

11 years ago

Keywords: reproducible

Nicolas Silva [:nical]

Comment 12

•

11 years ago

I won't be available tomorrow, so I'll let Bas track this (or reassign). I think that this bug is the same as bug 981315, although i am not 100% sure.

Assignee: nical.bugzilla → bas

Montoya Clemmons (MClemmons)

•

11 years ago

(In reply to Bas Schouten (:bas.schouten) from comment #18) > https://hg.mozilla.org/integration/mozilla-inbound/rev/193df828b5a2 This landed for bug 981315. When it's on central, let's see if it fixes this too.

Natalya Kot [:nkot]

Comment 22

•

11 years ago

This crash no longer reproduces during manual testing using the latest master BuildID: 20140313131233 Gaia: ea9e23abea5933656555d849b922c8da7530c90b Gecko: fe40387eba1a Version: 30.0a1 v1.2-device.cfg

Jason Smith [:jsmith]

Comment 23

•

11 years ago

Sounds like the pieces landed here have fixed the problem, so closing this.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Whiteboard: [fromAutomation][b2g-crash][leave-open] → [fromAutomation][b2g-crash]

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

status-b2g-v1.4: --- → fixed

Target Milestone: --- → mozilla30

You need to log in before you can comment on or make changes to this bug.