Closed Bug 1346042 Opened 3 years ago Closed 3 years ago

Intermittent dom/media/test/test_invalid_reject_play.html | application crashed [@ 0xe7574c24][@ LibHandle::ReleaseDirectRef]

Categories

(Core :: mozglue, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
fennec + ---
firefox53 --- wontfix
firefox54 --- wontfix
firefox55 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: esawin)

References

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

https://treeherder.mozilla.org/logviewer.html#?job_id=82772935&repo=autoland&lineNumber=3579
Assertion failure: directRefCnt <= mozilla::external::AtomicRefCounted<LibHandle>::refCount(), at /home/worker/workspace/build/src/mozglue/linker/ElfLoader.h:167
Component: Audio/Video → mozglue
Summary: Intermittent dom/media/test/test_invalid_reject_play.html | application crashed [@ 0xe7574c24] → Intermittent dom/media/test/test_invalid_reject_play.html | application crashed [@ 0xe7574c24][@ LibHandle::ReleaseDirectRef]
Assignee: nobody → esawin
tracking-fennec: --- → ?
tracking-fennec: ? → +
I can't reproduce this locally with the test page, but the assertion should only fail when we're calling AddDirectRef and ReleaseDirectRef simultaneously on different threads.

Given that it's a low-frequency failure (1:100), I'm not sure how insightful this test run is, but it looks good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6ef201ca0182c90ee5e2f53323c103cbe6ff6d3c
Attachment #8855943 - Flags: review?(mh+mozilla)
Comment on attachment 8855943 [details] [diff] [review]
0001-Bug-1346042-1.0-Mutex-lock-LibHandle-direct-referenc.patch

Review of attachment 8855943 [details] [diff] [review]:
-----------------------------------------------------------------

Haven't thought much about it, but shouldn't making that refcounter atomic be enough?
Attachment #8855943 - Flags: review?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #7)
> Haven't thought much about it, but shouldn't making that refcounter atomic
> be enough?

We can check the assertion in ReleaseDirectRef() on a different thread after the directRefCnt increment but before calling AddRef() in AddDirectRef(). I think we need to lock this down to ensure correctness.

The chances for hitting this window are so slim that I expect that something else might be going wrong, but nonetheless, we should make the custom ref-counting thread-safe.
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(mh+mozilla)
Pushed by esawin@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7b3035c59fe9
[1.0] Mutex-lock LibHandle direct reference management. r=glandium
https://hg.mozilla.org/mozilla-central/rev/7b3035c59fe9
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Doesn't seem high enough volume to worry about backporting to 54, but feel free to set the flag back to affected and request uplift if you feel otherwise.
Comment on attachment 8855943 [details] [diff] [review]
0001-Bug-1346042-1.0-Mutex-lock-LibHandle-direct-referenc.patch

Approval Request Comment
[Feature/Bug causing the regression]: N/A.
[User impact if declined]: Crash on startup.
[Is this code covered by automated tests?]: Only indirectly through app startup.
[Has the fix been verified in Nightly?]: Yes, no crashes with this signature in automation.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: 1358241, 1361903.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: Affects only locking mechanics during startup.
[String changes made/needed]: None.
Attachment #8855943 - Flags: approval-mozilla-beta?
(In reply to Eugen Sawin [:esawin] from comment #12)
> Comment on attachment 8855943 [details] [diff] [review]
> 0001-Bug-1346042-1.0-Mutex-lock-LibHandle-direct-referenc.patch
> 
> Approval Request Comment
> [Feature/Bug causing the regression]: N/A.
> [User impact if declined]: Crash on startup.
> [Is this code covered by automated tests?]: Only indirectly through app
> startup.
> [Has the fix been verified in Nightly?]: Yes, no crashes with this signature
> in automation.
> [Needs manual test from QE? If yes, steps to reproduce]: No.
> [List of other uplifts needed for the feature/fix]: 1358241, 1361903.
> [Is the change risky?]: No.
> [Why is the change risky/not risky?]: Affects only locking mechanics during
> startup.
> [String changes made/needed]: None.

There's no uplift request on the bugs you list as prerequisites, is that an oversight?  Is there any evidence of this being a high-volume issue for users?  We're a week away from building release candidates for 54 so unless that's the case I think it's too late to take this change.
Flags: needinfo?(esawin)
(In reply to Julien Cristau [:jcristau] from comment #13)
> There's no uplift request on the bugs you list as prerequisites, is that an
> oversight?

I assumed listing the blocking bugs in the request takes care of it.
I can request for uplift individually if that's not the case.

> Is there any evidence of this being a high-volume issue for
> users?  We're a week away from building release candidates for 54 so unless
> that's the case I think it's too late to take this change.

This does not seem to be a high-volume issue in production, we only see this in test automation due to the assertions.

The reason for the uplift request is to remove any known linker-related issues to improve the results of LG's MTBF testing (e.g., bug 1349099) and to help isolate currently unknown issues.
Flags: needinfo?(esawin)
Comment on attachment 8855943 [details] [diff] [review]
0001-Bug-1346042-1.0-Mutex-lock-LibHandle-direct-referenc.patch

I agree with Julien's assessment. Snorp, is this something that is a must for 54 in terms of a partner ask? I am not sure this uplift right before we enter RC week is justified.
Flags: needinfo?(snorp)
Attachment #8855943 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
It could help with the partner, but no reason to rush it out, and I don't think we have high confidence that it will fix their issue.
Flags: needinfo?(snorp)
Thanks Snorp, we will let this fix ride the 55 train then.
You need to log in before you can comment on or make changes to this bug.