Closed Bug 1042771 Opened 8 years ago Closed 8 years ago

Crash in mozilla:layers while stability testing

Categories

(Core :: Graphics: Layers, defect)

ARM
Gonk (Firefox OS)
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla34
blocking-b2g 2.0+
Tracking Status
firefox32 --- wontfix
firefox33 --- wontfix
firefox34 --- fixed
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: ggrisco, Assigned: nical)

References

Details

(Keywords: crash, Whiteboard: [caf-crash 309][caf priority: p1][CR 698516][b2g-crash])

Crash Data

Attachments

(3 files, 1 obsolete file)

Attached file decoded minidump
[Blocking Requested - why for this release]:
This happened during stability test.  I don't have any STR right now, will share if it becomes available.  But this crash happened 5 times on latest build, so wanted to file this bugzilla right away.
Bug 1041744 might be related to this bug.
Component: General → Gaia::System::Window Mgmt
Bug 1037173 seems same bug.
Component: Gaia::System::Window Mgmt → Graphics: Layers
Product: Firefox OS → Core
From the crash, it seems to be related to gfx layers.
Keywords: crash
Whiteboard: [b2g-crash]
TextureClientPool::ReturnDeferredClients() code has a possibility that call pop() even when mTextureClientsDeferred is empty.
(In reply to Sotaro Ikeda [:sotaro PTO July/25 - Aug/3] from comment #6)
> TextureClientPool::ReturnDeferredClients() code has a possibility that call
> pop() even when mTextureClientsDeferred is empty.

I confirmed that to call pop() to empty std::stack<RefPtr<TextureClient> > causes the crash.
Assignee: nobody → sotaro.ikeda.g
Attachment #8461242 - Flags: review?(nical.bugzilla)
Whiteboard: [b2g-crash] → [CR 698516][b2g-crash]
Whiteboard: [CR 698516][b2g-crash] → [caf priority: p1][CR 698516][b2g-crash]
Observed on: 

Device: msm8226
Gonk Version: AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.040
Moz BuildID: 20140716000201
B2G Version: 2.0
Gecko Version: 32.0a2
Gaia:  http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=5f8b1b8a2da9e3b531eee817a669f57fa4d9b9c6
Gecko: http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=e00f7e464333689fcf54edb4945ece94f97f930b
Attachment #8461242 - Flags: review?(nical.bugzilla) → review+
blocking-b2g: 2.0? → 2.0+
Whiteboard: [caf priority: p1][CR 698516][b2g-crash] → [caf-crash 309][caf priority: p1][CR 698516][b2g-crash]
(In reply to Wes Kocher (:KWierso) from comment #12)
> I had to back this out in
> https://hg.mozilla.org/integration/mozilla-inbound/rev/978b8d1fd58d for
> making b2g reftest-18 frequently fail with
> https://tbpl.mozilla.org/php/getParsedLog.php?id=44519068&tree=Mozilla-
> Inbound

Hmm, it is wired that the patch affected to the test failure.
Flags: needinfo?(sotaro.ikeda.g)
nical, can you take this bug? I am in PTO for a week. Thanks!
Flags: needinfo?(nical.bugzilla)
Assignee: sotaro.ikeda.g → nical.bugzilla
Flags: needinfo?(nical.bugzilla)
I found one issues with the: it inlines some of ReturnTextureClient in ReturnDeferredClients so that ShrinkToMaximumSize doesn't happen inside the loop, but forgets to update the number of outstanding textures (mOutstandingClients), which causes the number to become big and causes ShrinkToMaximumSize to shrink too much.

My bad for not catching that in the review.

try push with the fix: https://tbpl.mozilla.org/?tree=Try&rev=2b4c12408d37

It is not obvious to me how it would explain the failing test, though, Even if shrinking the pool goes out of sync, I'd have expected the pool to just discard too much and reallocate often, rather than getting this particular reftest failure.
(In reply to Nicolas Silva [:nical] from comment #15)
> 
> try push with the fix: https://tbpl.mozilla.org/?tree=Try&rev=2b4c12408d37
> 

Seems to fix the issue.
We have built the patch into AU_LINUX_GECKO_B2G_KK_3.6.01.04.00.000.043 and later.
Attachment #8463301 - Flags: review?(jmuizelaar) → review+
We haven't seen this reproduce on AU43, so probably ok to land this.
Flags: needinfo?(nical.bugzilla)
:nical can you please help check this in given comment #19 ?
Comment on attachment 8463301 [details] [diff] [review]
patch for b2g-v2.0 - Same patch with the outstanding texture count fixed


[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: 2.0 blocker
Testing completed: Try servers and a separate branch (see comments from Greg Grisco)
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch:
Attachment #8463301 - Flags: approval-mozilla-b2g30?
Flags: needinfo?(nical.bugzilla)
Landed on inbound, sorry I confused this another blocker that had already landed and that we had failed to uplift. The uplift request still holds I believe.

https://hg.mozilla.org/integration/mozilla-inbound/rev/1c94ca18780c
Something landed between the try push and now, and there is new build error:

> TextureClientPool.cpp:150:56: error: 'sShrinkTimeout' was not declared in this scope

Backed out://hg.mozilla.org/integration/mozilla-inbound/rev/2e02ebf33a3a

I'll fix this and reland shortly.
sShrinkTimeout that got renamed into mShrinkTimeoutMsec and moved from global to member of the class. Built the fix locally and relanded:

https://hg.mozilla.org/integration/mozilla-inbound/rev/db513704b2c3
Comment on attachment 8463301 [details] [diff] [review]
patch for b2g-v2.0 - Same patch with the outstanding texture count fixed

Given this is a 2.0+ it will get auto uplifted once landing on central and it does not need an approval request
Attachment #8463301 - Flags: approval-mozilla-b2g30?
https://hg.mozilla.org/mozilla-central/rev/db513704b2c3
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla34
Attachment #8461242 - Attachment is obsolete: true
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #29)
> Backed out for bustage.
> https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/9de46ab29836
> 
> https://tbpl.mozilla.org/php/getParsedLog.php?id=44996846&tree=Mozilla-B2g32-
> v2.0



Error was caused by 'mShrinkTimeoutMsec'. The committed patch is affected by Bug 1043603.

attachment 8463301 [details] [diff] [review] does not have 'mShrinkTimeoutMsec'. attachment 8463301 [details] [diff] [review] should work for b2g-v2.0.
Attachment #8463301 - Attachment description: Same patch with the outstanding texture count fixed → patch for b2g-v2.0 - Same patch with the outstanding texture count fixed
Not enough information with current STR to write test case to address bug.
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(rmead)
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
Flags: in-moztrap?(rmead)
Flags: in-moztrap-
Comment on attachment 8463301 [details] [diff] [review]
patch for b2g-v2.0 - Same patch with the outstanding texture count fixed

Review of attachment 8463301 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/TextureClientPool.cpp
@@ +141,5 @@
> +  }
> +  ShrinkToMinimumSize();
> +  // Kick off the pool shrinking timer if there are still more unused texture
> +  // clients than our desired minimum cache size.
> +  if (mTextureClients.size() > sMinCacheSize) {

ShrinkToMinimumSize has a loop that terminates when mTextureClients.size() <= sMinCacheSize; when do we expect if (mTextureClients.size() > sMinCacheSize) to fire right after that call?
I imagine we either want to shrink it right away (calling ShrinkToMinimumSize()) or leave it for later (the if that follows and the callback inside of it), but it doesn't seem to make sense to have both?
:nical, can you take a look at comment 33 question?
Flags: needinfo?(nical.bugzilla)
Come to think of it, you probably wanted ShrinkToMaximumSize() followed by the callback for shrinking to minimum size instead?
(In reply to Milan Sreckovic [:milan] from comment #35)
> Come to think of it, you probably wanted ShrinkToMaximumSize() followed by
> the callback for shrinking to minimum size instead?

I don't remember exactly, but I think you are right, we want ShrinkToMaximumSize here instead of ShrinkToMinimum size. Sotaro's initial patch did use ShrinkToMinimumSize, though (my addition to the patch only fixed the management of the mOutstandingClients counter). So Sotaro will probably be able to give you a more definite answer about the original intent.
But yeah, from the look of it I think that you are right, we probably confused the two shrink methods here.
Flags: needinfo?(nical.bugzilla) → needinfo?(sotaro.ikeda.g)
Thanks.  I have to patch on bug 1048984 for this and other issues with outstanding client tracking...
Flags: needinfo?(sotaro.ikeda.g)
Unable to verify, there are no steps to reproduce.
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-verify-][QAnalyst-Triage?]
Flags: needinfo?(ktucker)
QA Whiteboard: [QAnalyst-verify-][QAnalyst-Triage?] → [QAnalyst-verify-][QAnalyst-Triage+]
Flags: needinfo?(ktucker)
You need to log in before you can comment on or make changes to this bug.