Closed
Bug 1318247
Opened 8 years ago
Closed 7 years ago
Intermittent remoteautomation.py | application crashed [@ ZipCollection::GetZip]
Categories
(Firefox for Android Graveyard :: Testing, defect, P3)
Firefox for Android Graveyard
Testing
Tracking
(firefox-esr52 unaffected, firefox54 wontfix, firefox55 wontfix, firefox56 fixed)
RESOLVED
FIXED
Firefox 56
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox54 | --- | wontfix |
firefox55 | --- | wontfix |
firefox56 | --- | fixed |
People
(Reporter: intermittent-bug-filer, Assigned: esawin)
References
Details
(Keywords: crash, intermittent-failure)
Attachments
(1 file, 1 obsolete file)
3.33 KB,
patch
|
esawin
:
review+
|
Details | Diff | Splinter Review |
Filed by: philringnalda [at] gmail.com https://treeherder.mozilla.org/logviewer.html#?job_id=6770084&repo=autoland https://queue.taskcluster.net/v1/task/ClySYIuvSwuOR_Nb7ugo6w/runs/0/artifacts/public/logs/live_backing.log https://hg.mozilla.org/mozilla-central/raw-file/tip/layout/tools/reftest/reftest-analyzer.xhtml#logurl=https://queue.taskcluster.net/v1/task/ClySYIuvSwuOR_Nb7ugo6w/runs/0/artifacts/public/logs/live_backing.log&only_show_unexpected=1
Comment 1•8 years ago
|
||
[task 2016-11-17T06:16:23.634614Z] 06:16:23 INFO - Crash reason: SIGSEGV [task 2016-11-17T06:16:23.634794Z] 06:16:23 INFO - Crash address: 0x0 [task 2016-11-17T06:16:23.634980Z] 06:16:23 INFO - Process uptime: not available [task 2016-11-17T06:16:23.635155Z] 06:16:23 INFO - Thread 10 (crashed) [task 2016-11-17T06:16:23.635369Z] 06:16:23 INFO - 0 libmozglue.so!ZipCollection::GetZip [RefCounted.h:8cfe56f5b5d4 : 101 + 0x0] [task 2016-11-17T06:16:23.635564Z] 06:16:23 INFO - r0 = 0x0000008d r1 = 0x21d0b45c r2 = 0x528fe381 r3 = 0x5291610c [task 2016-11-17T06:16:23.635757Z] 06:16:23 INFO - r4 = 0x52d0e080 r5 = 0x52758c2c r6 = 0x00000000 r7 = 0x52758c04 [task 2016-11-17T06:16:23.635955Z] 06:16:23 INFO - r8 = 0x52d080f4 r9 = 0x2a357798 r10 = 0x52d080f8 r12 = 0x00000003 [task 2016-11-17T06:16:23.636150Z] 06:16:23 INFO - fp = 0x52758c64 sp = 0x52758c00 lr = 0x52880beb pc = 0x52881504 [task 2016-11-17T06:16:23.636336Z] 06:16:23 INFO - Found by: given as instruction pointer in context [task 2016-11-17T06:16:23.636563Z] 06:16:23 INFO - 1 libmozglue.so!Java_org_mozilla_gecko_mozglue_NativeZip_getZip [nsGeckoUtils.cpp:8cfe56f5b5d4 : 65 + 0x5] [task 2016-11-17T06:16:23.636753Z] 06:16:23 INFO - r4 = 0x4c847180 r5 = 0x2a1e4778 r6 = 0x2a357798 r7 = 0x52758c2c [task 2016-11-17T06:16:23.636946Z] 06:16:23 INFO - r8 = 0x4af00005 r9 = 0x00000000 r10 = 0x2a1e4b58 fp = 0x52758c64 [task 2016-11-17T06:16:23.637123Z] 06:16:23 INFO - sp = 0x52758c28 pc = 0x528864a1 [task 2016-11-17T06:16:23.637314Z] 06:16:23 INFO - Found by: call frame info [task 2016-11-17T06:16:23.637488Z] 06:16:23 INFO - 2 libdvm.so + 0x1dc4e [task 2016-11-17T06:16:23.637694Z] 06:16:23 INFO - r4 = 0x4c847180 r5 = 0x2a1e4b48 r6 = 0x00000003 r7 = 0x52658dcc [task 2016-11-17T06:16:23.637915Z] 06:16:23 INFO - r8 = 0x52758c50 r9 = 0x52658dc4 r10 = 0x2a1e4b58 fp = 0x52758c64 [task 2016-11-17T06:16:23.638087Z] 06:16:23 INFO - sp = 0x52758c50 pc = 0x4085bc50 [task 2016-11-17T06:16:23.638262Z] 06:16:23 INFO - Found by: call frame info [task 2016-11-17T06:16:23.638459Z] 06:16:23 INFO - 3 data@app@org.mozilla.fennec-1.apk@classes.dex + 0x4b9a84 [task 2016-11-17T06:16:23.638636Z] 06:16:23 INFO - sp = 0x52758c5c pc = 0x523a1a86 [task 2016-11-17T06:16:23.638817Z] 06:16:23 INFO - Found by: stack scanning [task 2016-11-17T06:16:23.639000Z] 06:16:23 INFO - 4 dalvik-heap (deleted) + 0x5ef996 [task 2016-11-17T06:16:23.639182Z] 06:16:23 INFO - sp = 0x52758c60 pc = 0x41aa4998 [task 2016-11-17T06:16:23.639357Z] 06:16:23 INFO - Found by: stack scanning [task 2016-11-17T06:16:23.639534Z] 06:16:23 INFO - 5 libdvm.so + 0x4dcad [task 2016-11-17T06:16:23.639717Z] 06:16:23 INFO - sp = 0x52758c68 pc = 0x4088bcaf [task 2016-11-17T06:16:23.639891Z] 06:16:23 INFO - Found by: stack scanning [task 2016-11-17T06:16:23.640086Z] 06:16:23 INFO - 6 data@app@org.mozilla.fennec-1.apk@classes.dex + 0x4b9a82 [task 2016-11-17T06:16:23.640264Z] 06:16:23 INFO - sp = 0x52758c70 pc = 0x523a1a84 [task 2016-11-17T06:16:23.640438Z] 06:16:23 INFO - Found by: stack scanning [task 2016-11-17T06:16:23.640679Z] 06:16:23 INFO - 7 libmozglue.so!Java_org_mozilla_gecko_mozglue_NativeZip_getZipFromByteBuffer [nsGeckoUtils.cpp:8cfe56f5b5d4 : 87 + 0xf] [task 2016-11-17T06:16:23.640843Z] 06:16:23 INFO - sp = 0x52758c74 pc = 0x52886461 [task 2016-11-17T06:16:23.641016Z] 06:16:23 INFO - Found by: stack scanning I thought you sorted these out in bug 1302516?
Flags: needinfo?(esawin)
Priority: -- → P3
Assignee | ||
Updated•8 years ago
|
Assignee: nobody → esawin
Flags: needinfo?(esawin)
Assignee | ||
Comment 2•8 years ago
|
||
The crash is caused by the assertion in [1] because mRefCnt < 0 holds during a call to AddRef. This can only be the case when we're attempting to call AddRef on a reference which is in the process of being destructed [2], but hasn't been removed from the zip collection yet. The following timeline would trigger this event: 1. Thread 1 acquires the lock in GetZip [3]. 2. Thread 2 calls Release on reference A [4]. 3. Thread 2 invokes destructor of A [5]. 4. Thread 2 is waiting for the lock in Forget [6]. 5. Thread 1 finds zip A and calls AddRef on it [7]. [1] https://dxr.mozilla.org/mozilla-central/source/mfbt/RefCounted.h#101 [2] https://dxr.mozilla.org/mozilla-central/source/mfbt/RefCounted.h#135 [3] https://dxr.mozilla.org/mozilla-central/source/mozglue/linker/Zip.cpp#196 [4] https://dxr.mozilla.org/mozilla-central/source/mozglue/android/nsGeckoUtils.cpp#95 [5] https://dxr.mozilla.org/mozilla-central/source/mozglue/linker/Zip.cpp#77 [6] https://dxr.mozilla.org/mozilla-central/source/mozglue/linker/Zip.cpp#220 [7] https://dxr.mozilla.org/mozilla-central/source/mozglue/linker/Zip.cpp#201
Assignee | ||
Comment 3•8 years ago
|
||
To avoid the race, we make ZipCollection increase the ref count so that we can control destruction and removal of a zip in a locked environment. Alternatively, we can remove ZipCollection at the cost of increased address space usage (otherwise, there is no measurable performance loss).
Attachment #8814157 -
Flags: review?(mh+mozilla)
Comment 4•8 years ago
|
||
Comment on attachment 8814157 [details] [diff] [review] 0001-Bug-1318247-1.0-Add-custom-refcounting-to-zips-in-Zi.patch Review of attachment 8814157 [details] [diff] [review]: ----------------------------------------------------------------- There is similar code for LibHandle refcounting, but slightly different. It would be preferable to use the same strategy for both types, preferably by sharing the Release() code.
Attachment #8814157 -
Flags: review?(mh+mozilla)
Assignee | ||
Comment 5•8 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #4) > There is similar code for LibHandle refcounting, but slightly different. It > would be preferable to use the same strategy for both types, preferably by > sharing the Release() code. We don't access the lib handles directly from JNI, so I think LibHandle/ElfLoader don't need to be thread-safe. The Release specialization for Zip is only effective with the locking mechanics we've introduced to ZipCollection before. We could change the debug mRefCnt value from 0x7fffdead to detail::DEAD (0xffffdead) in LibHandle::Release, otherwise we wouldn't trigger the assertion in AddRef in such a case, but that's unrelated to this bug.
Assignee | ||
Comment 6•7 years ago
|
||
(In reply to Eugen Sawin [:esawin] from comment #5) > (In reply to Mike Hommey [:glandium] from comment #4) > > There is similar code for LibHandle refcounting, but slightly different. It > > would be preferable to use the same strategy for both types, preferably by > > sharing the Release() code. > > We don't access the lib handles directly from JNI, so I think > LibHandle/ElfLoader don't need to be thread-safe. > > The Release specialization for Zip is only effective with the locking > mechanics we've introduced to ZipCollection before. > > We could change the debug mRefCnt value from 0x7fffdead to detail::DEAD > (0xffffdead) in LibHandle::Release, otherwise we wouldn't trigger the > assertion in AddRef in such a case, but that's unrelated to this bug. I've tried to picture what shared Release code would look like and I think it would require a friend entry in RefCounted to make it work since Release could not be a specialization of either Release template and would need to access mRefCnt. Alternatively, I can modify LibHandleList to hold RefPtr and use the same forget-on-1 mechanics (essentially copy the Release specialization), however this wouldn't solve any existing issues afaik and should probably go into a refactoring bug instead.
Flags: needinfo?(mh+mozilla)
Comment 7•7 years ago
|
||
You can share by declaring something like: template <typename T> inline void DoRelease(const T* obj, Atomic<MozRefCountType> &aRefCnt) And making the RefCounted<*>::Release() implementations call that with (this, mRefCnt)
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 8•7 years ago
|
||
I'm assuming that DoRelease would implement both, the forget-on-1 and release mechanics. In this case, we can't easily share it because lib handles don't have a single authoritative collection like zips have [1]. Zips are registered on creation and, with the new mechanics, released once the ref count drops to 1. Lib handles have multiple creation and registration paths and have two separate collections [2][3], so that the release-on-1 mechanics would not be applicable without some modifications. Unless I'm misunderstanding which part of the release mechanics we want to share, I think it would be safer to fix this issue in isolation to have a relatively simple and upliftable fix. [1] https://dxr.mozilla.org/mozilla-central/source/mozglue/linker/Zip.h#497 [2] https://dxr.mozilla.org/mozilla-central/source/mozglue/linker/ElfLoader.h#484 [3] https://dxr.mozilla.org/mozilla-central/source/mozglue/linker/CustomElf.h#138
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(mh+mozilla)
Assignee | ||
Comment 10•7 years ago
|
||
Comment on attachment 8814157 [details] [diff] [review] 0001-Bug-1318247-1.0-Add-custom-refcounting-to-zips-in-Zi.patch Can you please revisit the patch given the locking mechanics introduced in bug 1358241 for the lib handles case? I'm uncomfortable changing the release mechanics for the lib handles (see comment 8), where the zip collection case is simpler and we still see crashes there (bug 1364310).
Attachment #8814157 -
Flags: review?(mh+mozilla)
Comment 11•7 years ago
|
||
Comment on attachment 8814157 [details] [diff] [review] 0001-Bug-1318247-1.0-Add-custom-refcounting-to-zips-in-Zi.patch Review of attachment 8814157 [details] [diff] [review]: ----------------------------------------------------------------- ::: mozglue/linker/Zip.h @@ +508,5 @@ > +{ > + MOZ_ASSERT(static_cast<int32_t>(mRefCnt) > 0); > + const auto count = --mRefCnt; > + if (count == 1) { > + // No external references are left, attempt to remove it from the collection. Add a comment that this is expected to loop back in this function from Forget calling vector::erase.
Attachment #8814157 -
Flags: review?(mh+mozilla) → review+
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(mh+mozilla)
Comment 12•7 years ago
|
||
Pushed by esawin@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/8c0a3de985d4 [1.0] Add custom refcounting of zips in ZipCollection to allow for thread-safe reuse of zips. r=glandium
Assignee | ||
Comment 13•7 years ago
|
||
Added comment.
Attachment #8814157 -
Attachment is obsolete: true
Attachment #8882668 -
Flags: review+
Comment 14•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/8c0a3de985d4
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 56
Comment 15•7 years ago
|
||
Should we consider this for backport to 55 or let it ride the 56 train?
status-firefox54:
--- → wontfix
status-firefox55:
--- → affected
status-firefox-esr52:
--- → unaffected
Flags: needinfo?(esawin)
Keywords: crash
Assignee | ||
Comment 16•7 years ago
|
||
It's a low-volume crash in automation and I don't expect users to hit this a lot in release. Although the patch is simple, it changes our custom ref-counting behavior. I would prefer to let this ride the train.
Flags: needinfo?(esawin)
Updated•7 years ago
|
Updated•3 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•