Closed Bug 822398 Opened 12 years ago Closed 12 years ago

Crash when trying to add contact photo during MW2

Tracking

(blocking-basecamp:+, firefox19 fixed, firefox20 fixed, b2g18 fixed)

Status:

RESOLVED FIXED

Milestone:

B2G C3 (12dec-1jan)

Project Flags:

blocking-basecamp

Tracking Flags:

Tracking

Status

firefox19

---

fixed

firefox20

---

fixed

b2g18

---

fixed

People

(Reporter: cjones, Assigned: bechen)

References

Details

(Keywords: regression)

Attachments

(3 files, 1 obsolete file)

gdb stack 12 years ago Benjamin Chen [:bechen] 6.54 KB, text/plain		Details
double free 12 years ago Benjamin Chen [:bechen] 12.47 KB, text/plain		Details
Fix 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 1.13 KB, patch		Details \| Diff \| Splinter Review
Fix v2 12 years ago Chris Jones [:cjones] inactive; ni?/f?/r? if you need me 2.03 KB, patch	kanru : review+	Details \| Diff \| Splinter Review

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Description

•

12 years ago

STR (1) Follow the steps in https://wiki.mozilla.org/B2G/Memory_acceptance_criteria#MW2:_Active_call_stays_active During step 19, the b2g process crashes. Can we please run these tests as part of our smoketests?

Jason Smith [:jsmith]

Comment 1

•

12 years ago

Over to tchung to determine if we should add this as a smoketest.

Flags: needinfo?(tchung)

JP Rosevear [:jpr]

Updated

•

12 years ago

blocking-basecamp: ? → +

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 2

•

12 years ago

Andrea: Can you have a look at this? I think this is crashing in webactivities code.

Assignee: nobody → amarchesini

Andrew Overholt [:overholt]

Updated

•

12 years ago

Target Milestone: --- → B2G C3 (12dec-1jan)

Andrea Marchesini [:baku]

Comment 3

•

12 years ago

I cannot reproduce this bug. Can you tell me which release you are using?

JP Rosevear [:jpr]

Updated

•

12 years ago

Flags: needinfo?(jones.chris.g)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 4

•

12 years ago

https://wiki.mozilla.org/index.php?title=B2G/Memory_acceptance_criteria&oldid=494353 http://git.mozilla.org/?p=releases/gecko.git;a=commit;h=7b93aab393cd0ea45f2b1894c969f6c5b55ef083 http://git.mozilla.org/?p=releases/gaia.git;a=commit;h=df14f48088e107a180811ff73357e76e95acf776

Flags: needinfo?(jones.chris.g)

Andrew Overholt [:overholt]

Updated

•

12 years ago

Assignee: amarchesini → nobody

Kai-Chih Hu [:khu]

Comment 5

•

12 years ago

Benjamin, could you help this case? Thanks!

Assignee: nobody → bechen

Benjamin Chen [:bechen]

Assignee

Comment 6

•

12 years ago

Attached file gdb stack — Details

Looks like a duplication to Bug 823951

Benjamin Chen [:bechen]

Assignee

Comment 7

•

12 years ago

I have got this twice by gdb and both the strings in frame6 are "screen". #6 0x411cef8a in nsStringBuffer::Release (this=0x49acfb60) at /home/benjamin/Documents/git/unagi/B2G/gecko/xpcom/string/src/nsSubstring.cpp:161 0000000: 00 00 00 00 52 00 61 00 64 00 69 00 6f 00 20 00 ....R.a.d.i.o. . 0000010: 46 00 4d 00 20 00 47 00 61 00 69 00 61 00 00 00 F.M. .G.a.i.a... 0000020: 04 00 ff ff 00 00 00 00 0a 00 00 00 69 6d 61 67 ............imag 0000030: 65 2f 67 69 66 00 00 20 63 6c 6f 73 65 00 00 00 e/gif.. close... 0000040: 00 00 9d 47 00 00 00 00 0b 00 00 00 6b 65 65 70 ...G........keep 0000050: 2d 61 6c 69 76 65 00 00 e0 98 30 40 d0 81 b7 43 -alive....0@...C 0000060: c7 75 23 40 00 00 00 00 0e 00 00 00 73 00 63 00 .u#@........s.c. 0000070: 72 00 65 00 65 00 6e 00 0a r.e.e.n

Ting-Yuan Huang

Comment 8

•

12 years ago

In frame 1, this assertion failed: RELEASE_ASSERT((run->regs_mask[elm] & (1U << bit)) == 0); It looks like a double free. In frame 6: int32_t count = PR_ATOMIC_DECREMENT(&mRefCount); NS_LOG_RELEASE(this, count, "nsStringBuffer"); if (count == 0) { STRING_STAT_INCREMENT(Free); free(this); // we were allocated with |malloc| } It is possible that more than one threads free the same thing simultaneously. The "if (count ==0)" check should be combined with the decrement.

Ting-Yuan Huang

Comment 9

•

12 years ago

Sorry! The protection should be fine! So it may be that it is freed in other sites.

Benjamin Chen [:bechen]

Assignee

Comment 10

•

12 years ago

Attached file double free — Details

It's double free.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 11

•

12 years ago

Nice! :) Are you able to reproduce this in a DEBUG build? It looks like bug 823474 just maybe could cause this, but I'm not 100% sure. If it's bug 823474, then assertions will fail in DEBUG builds.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 12

•

12 years ago

What the backtraces seem to show is that - we're actually destroying a string here [1]; maybe that string is |aKey|? o_O - we hand out that string reference to an hal:: notification [2], which is probably calling into JS - then we later die trying to free the same string JSExternalString::finalize This looks like it might maybe be violating some pldhash invariant. Also looks like it might be the Great White Whale of random crashes :D. [1] http://mxr.mozilla.org/mozilla-central/source/hal/HalWakeLock.cpp#81 [2]http://mxr.mozilla.org/mozilla-central/source/hal/HalWakeLock.cpp#85

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 13

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #12) > - we hand out that string reference to an hal:: notification [2], which is > probably calling into JS Confirmed that this will result in JS callbacks being invoked.

Kan-Ru Chen [:kanru] (UTC+9)

Comment 14

•

12 years ago

Hmm, I didn't notice that IPDL type doesn't copy values. We should copy the key. Nice catch Benjamin.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 15

•

12 years ago

Looking back over the memory-test history, things started getting really crash between 12/6 and 12/12. With I had more resolution there, but bug 811740 is in that history. That seems like the kind of change that might exacerbate a pre-existing problem.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 16

•

12 years ago

Bug 814822 merged on 11/27, and bug 817946 was filed on 12/3. Doesn't say a whole lot, but things are in range.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 17

•

12 years ago

(In reply to Kan-Ru Chen [:kanru] from comment #14) > Hmm, I didn't notice that IPDL type doesn't copy values. We should copy the > key. Nice catch Benjamin. The IPC layer copies the incoming key. I think the problem is sLockTable->Remove(aKey); while we're enumerating |aKey|. I think we need to return PL_DHASH_REMOVE and not remove the key there, but I need to double check the dhash API. (It's not very friendly :/.)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 18

•

12 years ago

Attached patch Fix (obsolete) — Details — Splinter Review

Kan-Ru Chen [:kanru] (UTC+9)

Comment 19

•

12 years ago

Use Enumerate instead EnumerateRead?

Kan-Ru Chen [:kanru] (UTC+9)

Comment 20

•

12 years ago

Comment on attachment 695954 [details] [diff] [review] Fix Review of attachment 695954 [details] [diff] [review]: ----------------------------------------------------------------- ::: hal/HalWakeLock.cpp @@ +78,5 @@ > LockCount totalCount; > WakeLockInformation info; > aTable->EnumerateRead(CountWakeLocks, &totalCount); > if (!totalCount.numLocks) { > + op = PL_DHASH_REMOVE; op |= PL_DHASH_REMOVE; We need to search the whole table. I'm not sure if EnumerateRead() is allowed to return PL_DHASH_REMOVE. If not we should change the enumerator to Enumerate().

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 21

•

12 years ago

Yep you're right. Now trying to sort through the also not-very-friendly nsBaseHashtable API :/.

Kan-Ru Chen [:kanru] (UTC+9)

Comment 22

•

12 years ago

https://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsBaseHashtable.h#402

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 23

•

12 years ago

Attached patch Fix v2 — Details — Splinter Review

I think this is right. The PLDHashOperator isn't really a bitfield when enumerating.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 24

•

12 years ago

Comment on attachment 695957 [details] [diff] [review] Fix v2 I was able to complete the memory acceptance tests without any crashes! Looks promising. That's only happened once in the last 14 days. Kan-Ru, do you have tests that would check that we're removing the wake-lock tables properly?

Attachment #695957 - Flags: review?(kchen)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Updated

•

12 years ago

Attachment #695954 - Attachment is obsolete: true

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 25

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #23) > Created attachment 695957 [details] [diff] [review] > Fix v2 > > I think this is right. The PLDHashOperator isn't really a bitfield when > enumerating. Specifically, http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/pldhash.cpp#717

Kan-Ru Chen [:kanru] (UTC+9)

Comment 26

•

12 years ago

Comment on attachment 695957 [details] [diff] [review] Fix v2 Review of attachment 695957 [details] [diff] [review]: ----------------------------------------------------------------- The existing test doesn't cover multiple wake lock with different names.

Attachment #695957 - Flags: review?(kchen) → review+

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 27

•

12 years ago

I got a second consecutive non-crash memory acceptance run, so this is promising indeed.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 28

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/de53cfa4b491 https://hg.mozilla.org/releases/mozilla-aurora/rev/e28578533fae https://hg.mozilla.org/releases/mozilla-b2g18/rev/6d8cdf8cb5c3

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 33

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #27) > I got a second consecutive non-crash memory acceptance run, so this is > promising indeed. Three in a row. Unheard of!

Benjamin Chen [:bechen]

Assignee

Comment 34

•

12 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #12) > What the backtraces seem to show is that > - we're actually destroying a string here [1]; maybe that string is |aKey|? > o_O > - we hand out that string reference to an hal:: notification [2], which is > probably calling into JS > - then we later die trying to free the same string > JSExternalString::finalize > > This looks like it might maybe be violating some pldhash invariant. > > Also looks like it might be the Great White Whale of random crashes :D. > > [1] http://mxr.mozilla.org/mozilla-central/source/hal/HalWakeLock.cpp#81 > [2]http://mxr.mozilla.org/mozilla-central/source/hal/HalWakeLock.cpp#85 I have a question about the nsStringBuffer's ref count. From my attachment 695937 [details], the first and the second free come from nsStringBuffer::Release. There is a mRefCount for the nsStringBuffer, only free when mRefCount == 0 That means someone keep the reference and add the mRefCount after the first free?

Robert Kaiser

Comment 35

•

12 years ago

As bugs with verified seen-in-the-wild signatures were blindly duped here, can people who could reproduce the exact crash scenario fixed here please give us some crash IDs? Otherwise I think I'll probably need to reopen the duped bugs unless we can verify otherwise that those top crash signatures from the other bugs were indeed this issue.

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Updated

•

12 years ago

Depends on: 825038

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 37

•

12 years ago

(In reply to Benjamin Chen from comment #34) > I have a question about the nsStringBuffer's ref count. > From my attachment 695937 [details], the first and the second free come from > nsStringBuffer::Release. > There is a mRefCount for the nsStringBuffer, only free when mRefCount == 0 > > That means someone keep the reference and add the mRefCount after the first > free? The problem is that the enumerator is passed a const reference to the string element in the hash set, and then frees the item in the hash set that the reference points to. We would have caught this with nasty assertion failures if we had a test for this running in debug builds (v. bug 824526).

Ryan VanderMeulen [:RyanVM]

Comment 38

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/de53cfa4b491

Ryan VanderMeulen [:RyanVM]

Updated

•

12 years ago

status-b2g18: --- → fixed

status-firefox19: --- → fixed

status-firefox20: --- → fixed

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Reporter

Comment 39

•

12 years ago

This is a huge improvement in stability :). bechen++ for the analysis!

Jason Smith [:jsmith]

Updated

•

12 years ago

Flags: needinfo?(tchung)

You need to log in before you can comment on or make changes to this bug.