Closed Bug 1556846 Opened 5 years ago Closed 5 years ago

[10.15] Crash in [@ mozilla::plugins::PluginUtilsOSX::SetProcessName]

Categories

(Core Graveyard :: Plug-ins, defect, P1)

Unspecified
macOS
defect

Tracking

(firefox-esr60 unaffected, firefox-esr6868+ verified, firefox68+ verified, firefox69blocking verified, firefox70+ verified)

VERIFIED FIXED
mozilla70
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 68+ verified
firefox68 + verified
firefox69 blocking verified
firefox70 + verified

People

(Reporter: marcia, Assigned: haik)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression, Whiteboard: [rca - External API Failure])

Crash Data

Attachments

(2 files)

This bug is for crash report bp-d40ce24a-7d3a-4a14-ab28-63b350190604.

Seen while looking at nightly crash stats. There are a few different signatures that appear to have a similar stack. Most of them are single install crashes.

We had a somewhat similar signature back in the 58 timeframe: Bug 1419004

Top 10 frames of crashing thread:

0  @0x7fff6c9b47a2 
1  @0x7fff6c99474d 
2  @0x7fff6c9958c6 
3  @0x7fff6c9b3464 
4  @0x7fff6cc07c64 
5  @0x7fff6c99474d 
6  @0x7fff6c9958c6 
7  @0x7fff6cc07bba 
8  @0x7fff36104d5a 
9  @0x7fff6c99474d 

Crash Signature: [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] → [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ IOKit@0x2dd5a]
Crash Signature: [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ IOKit@0x2dd5a] → [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ [@ WebCore@0x8b8d5a ][@ IOKit@0x2dd5a]
Crash Signature: [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ [@ WebCore@0x8b8d5a ][@ IOKit@0x2dd5a] → [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ WebCore@0x8b8d5a ][@ IOKit@0x2dd5a]
Crash Signature: [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ WebCore@0x8b8d5a ][@ IOKit@0x2dd5a] → [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ WebCore@0x8b8d5a ][@ IOKit@0x2dd5a] [@ Metal@0x41d5a ]
Crash Signature: [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ WebCore@0x8b8d5a ][@ IOKit@0x2dd5a] [@ Metal@0x41d5a ] → [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ WebCore@0x8b8d5a ][@ IOKit@0x2dd5a] [@ Metal@0x41d5a ] [@ libwebrtc.dylib@0x48ed5a ] [@ ImageKit@0x2447e6 ] [@ MediaToolbox@0xc5639 ]

Stephen, it looks like you fixed a similar crash a few years ago (bug 1419004). Any chance you could take a look at this? Thanks.

Flags: needinfo?(spohl.mozilla.bugs)
See Also: → 1419004
Blocks: catalina

This is the #2 top crash in the June 12 Nightlies, behind only bug 1558836, which was a pretty bad crash regression.

Adding some more related signatures. These two signature each had a dozen or so crash reports, but only from a single installation each.

Crash Signature: [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ WebCore@0x8b8d5a ][@ IOKit@0x2dd5a] [@ Metal@0x41d5a ] [@ libwebrtc.dylib@0x48ed5a ] [@ ImageKit@0x2447e6 ] [@ MediaToolbox@0xc5639 ] → [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ WebCore@0x8b8d5a ][@ IOKit@0x2dd5a] [@ Metal@0x41d5a ] [@ libwebrtc.dylib@0x48ed5a ] [@ ImageKit@0x2447e6 ] [@ MediaToolbox@0xc5639 ] [@ MPSCore@0x15d5a…

Some more signatures.

Crash Signature: [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ WebCore@0x8b8d5a ][@ IOKit@0x2dd5a] [@ Metal@0x41d5a ] [@ libwebrtc.dylib@0x48ed5a ] [@ ImageKit@0x2447e6 ] [@ MediaToolbox@0xc5639 ] [@ MPSCore@0x15d5a… → [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] [@ WebCore@0x3c1d5a] [@ WebCore@0x1d547e6] [@ WebCore@0x8b8d5a ] [@ WebCore@0x12217e6 ] [@ IOKit@0x2dd5a] [@ Metal@0x41d5a ] [@ libwebrtc.dylib@0x48ed5a ] [@ ImageKit@0x2447e6 ] [@ MediaToolbox@0xc5…

Any idea why this would only be affecting Nightlies? Something sandbox-related maybe?

Maybe this should move to Core | AV? The process type is rdd (web).

I just installed 10.15 and should be able to take a look this week. Leaving n-i set.

This is currently the #7 overall top crash on nightly. spohl: have you had a chance to look at it yet? I can confirm there are crashes in both the first and second betas. Thanks.

No progress to report, but still looking into it. Keeping n-i set.

Crash Signature: MediaToolbox@0xc5639 ] [@ MPSCore@0x15d5a ] [@ AudioToolboxCore@0x8d7e6 ] [@ Intents@0x24dd5a ] [@ MPSMatrix@0xed5a ] [@ CarbonCore@0x2baed7 ] [@ AuthKit@0xdc7e6 ] → MediaToolbox@0xc5639 ] [@ MPSCore@0x15d5a ] [@ AudioToolboxCore@0x8d7e6 ] [@ Intents@0x24dd5a ] [@ MPSMatrix@0xed5a ] [@ CarbonCore@0x2baed7 ] [@ AuthKit@0xdc7e6 ] [@ OSServices@0xd13a ]

There are still tons of these crashes on the OSX Nightly, with various other signatures.

Investigations here are largely blocked by bug 1562684.

Depends on: 1562684

With the macOS 10.15 Beta 3 update, I'm not even able to build Firefox anymore because Python 2 is crashing. This will slow down investigations significantly.

I took a look at some of the correlations on nightly for the first signature (https://crash-stats.mozilla.org/signature/?signature=mozilla%3A%3Aplugins%3A%3APluginUtilsOSX%3A%3ASetProcessName).

(92.17% in signature vs 58.44% overall) Addon "amazondotcom@search.mozilla.org" = true
(31.18% in signature vs 00.83% overall) Addon "support@todoist.com" = true
(100.0% in signature vs 10.12% overall) Module "Backup" = true [203.33% vs 10.61% if startup_crash = null]
(100.0% in signature vs 07.48% overall) platform_pretty_version = OS X 10.15 [100.0% vs 73.90% if platform = Mac OS X]
(55.91% in signature vs 28.37% overall) Addon "uBlock0@raymondhill.net" Version = 1.20.0 [55.91% vs 35.37% if process_type = rdd]

Interesting that there is a fairly high correlation to the amazon search engine.

I think this is a sandboxing issue. I'm working on confirming that and will update the bug after I do some more testing.

While debugging why AV1 decoding is not working on 10.15, I found that the RDD process is crashing here in a SetProcessName() stack.

The message "BUG IN LIBDISPATCH: Unable to get the unique pid (size)" indicates the proc_pidinfo() syscall is failing and this is likely to be due to sandbox restrictions. This is one of the problems we hit with Widevine decoding on 10.15 on bug 1558924 and I didn't realize the SetProcessName() crash was similar.

https://hg.mozilla.org/mozilla-central/rev/83204e889b721c109c7296abd86b592553f825c5

(lldb) c
Process 50468 resuming
Process 50468 stopped
* thread #1, name = 'MainThread', queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
    frame #0: 0x00007fff6b66b762 libdispatch.dylib`_firehose_task_buffer_init + 266
libdispatch.dylib`_firehose_task_buffer_init:
->  0x7fff6b66b762 <+266>: ud2    
    0x7fff6b66b764 <+268>: cltq   
    0x7fff6b66b766 <+270>: leaq   0x15010(%rip), %rcx       ; "BUG IN LIBDISPATCH: Unable to get the unique pid (size)"
    0x7fff6b66b76d <+277>: movq   %rcx, 0x2ca484dc(%rip)    ; gCRAnnotations + 8
Target 0: (plugin-container) stopped.
(lldb) bt
* thread #1, name = 'MainThread', queue = 'com.apple.main-thread', stop reason = EXC_BAD_INSTRUCTION (code=EXC_I386_INVOP, subcode=0x0)
  * frame #0: 0x00007fff6b66b762 libdispatch.dylib`_firehose_task_buffer_init + 266
    frame #1: 0x00007fff6b64b60e libdispatch.dylib`_dispatch_client_callout + 8
    ...
    stack frames omitted to save space
    ...
    frame #30: 0x00000001185138d2 XUL`mozilla::plugins::PluginUtilsOSX::SetProcessName(char const*) + 130
    frame #31: 0x000000011814cf16 XUL`mozilla::RDDParent::Init(int, char const*, MessageLoop*, IPC::Channel*) + 214
    frame #32: 0x000000011814ec2d XUL`mozilla::RDDProcessImpl::Init(int, char**) + 445
    frame #33: 0x000000011a06b669 XUL`XRE_InitChildProcess(int, char**, XREChildData const*) + 4297
    frame #34: 0x000000010aac5f07 plugin-container`main + 103
    frame #35: 0x00007fff6b69bc49 libdyld.dylib`start + 1
Assignee: nobody → haftandilian
Priority: -- → P1

Disassembly of libdispatch.dylib`_firehose_task_buffer_init confirms the message BUG IN LIBDISPATCH: Unable to get the unique pid (size) is used after a call to the proc_pidinfo syscall fails.

Adding (allow process-info-pidinfo (target self)) to the utility sandbox policy (used by the RDD process) avoids the crash and allows AV1 decoding to work in my limited testing so far.

Allow limited access to the proc_pidinfo() syscall from the Mac utility process sandbox.

Pushed by haftandilian@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c3b98f220f4f
[10.15] Crash in [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] r=spohl
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70

I'll request uplift once this has been on Nightly for a bit longer. We will want the fix on 68 and 69.

Flags: needinfo?(spohl.mozilla.bugs) → needinfo?(haftandilian)

This is looking good on nightly so far. The main signature in this bug has not been seen since the fix landed, and I am not seeing any of the single installation Mac crashes in my nightly crash triage in the last few days :)

Comment on attachment 9077025 [details]
Bug 1556846 - [10.15] Crash in [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] r?spohl

Beta/Release Uplift Approval Request

  • User impact if declined: Some AV1 media decoding will fail on macOS 10.15 due to the Remote Data Decoder (RDD) process crashing every time it launches.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: On macOS 10.15 Beta, visit https://bitmovin.com/demos/av1 and ensure the AV1 embedded video plays.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The fix is small, limited to macOS, and just adds a new process sandbox rule needed on 10.15 that is unlikely to cause problems with earlier macOS versions.
  • String changes made/needed: None
Flags: needinfo?(haftandilian)
Attachment #9077025 - Flags: approval-mozilla-release?
Attachment #9077025 - Flags: approval-mozilla-beta?
Flags: qe-verify+

(In reply to Marcia Knous [:marcia - needinfo? me] from comment #20)

This is looking good on nightly so far. The main signature in this bug has not been seen since the fix landed, and I am not seeing any of the single installation Mac crashes in my nightly crash triage in the last few days :)

Thanks for verifying that, Marcia.

Whiteboard: [qa-triaged]

I tested this on iMac and MacBook Pro OS X 10.15 Beta (19A501i) with the latest FF Nightly 70.0a1(2019-07-16) and I can't reproduce any crash but there is a problem, the video doesn't start and there is an error displayed, please see the attached document.
Haik, can you please take a look at this? Thanks

Flags: needinfo?(haftandilian)

I just tested on my 10.15 machine using https://bitmovin.com/demos/av1 and the latest nightly, and the video does play fine for me. I am running (19A501i) as well.

(In reply to ovidiu boca[:Ovidiu] from comment #23)

Created attachment 9078380 [details]
Screenshot 2019-07-16 at 14.35.34.png

I tested this on iMac and MacBook Pro OS X 10.15 Beta (19A501i) with the latest FF Nightly 70.0a1(2019-07-16) and I can't reproduce any crash but there is a problem, the video doesn't start and there is an error displayed, please see the attached document.
Haik, can you please take a look at this? Thanks

That's the error you should see without the fix. The RDD process is crashing, but it doesn't cause a crashed tab page like a content process crash would.

Once you have the fix, you should not see that error and instead should see the video playing.

Flags: needinfo?(haftandilian)

Comment on attachment 9077025 [details]
Bug 1556846 - [10.15] Crash in [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] r?spohl

Fixes an RDD crash on OSX 10.15. Approved for 69.0b6.

Attachment #9077025 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9077025 [details]
Bug 1556846 - [10.15] Crash in [@ mozilla::plugins::PluginUtilsOSX::SetProcessName] r?spohl

crash fix for macos 10.15, approved for 68.0.1 and 68.1esr

Attachment #9077025 - Flags: approval-mozilla-release?
Attachment #9077025 - Flags: approval-mozilla-release+
Attachment #9077025 - Flags: approval-mozilla-esr68+

Tested again on MacBook OS X 10.15 Beta (19A512f) with FF Nightly 70.0aa1(2019-07-17) and the error is still there.

(In reply to ovidiu boca[:Ovidiu] from comment #29)

Tested again on MacBook OS X 10.15 Beta (19A512f) with FF Nightly 70.0aa1(2019-07-17) and the error is still there.

Is there some kind of macOS version and Firefox version combination that plays this successfully for you?

Flags: needinfo?(ovidiu.boca)

For example, Mac 10.14 with FF Nightly 70.0a1(2019-07-17) is working for me.

Flags: needinfo?(ovidiu.boca)

Per discussion with jcristau, we're uplifting this to 68.0.1esr also to maintain parity with the non-ESR 68.0.1 release and hopefully avoid some confusion.

(In reply to Ryan VanderMeulen [:RyanVM] from comment #33)

Per discussion with jcristau, we're uplifting this to 68.0.1esr also to maintain parity with the non-ESR 68.0.1 release and hopefully avoid some confusion.

Thanks for catching that. ESR 68 should have been in my uplift request. This fix is needed in ESR 68 because the bug affects all Firefox versions with the RDD process enabled (65+) run on macOS 10.15 and we'll be supporting ESR 68 on 10.15.

Hi everyone, after further testing this issue I can safely say that this issue no longer occurs on Esr 68.0.1 or Release candidate 68.0.1, the video starts playing without issues.

It DOES however still occur on Beta 69.0b5 as well as our latest Nightly 70.0a1 (2019-07-18).

(In reply to Rares Doghi from comment #36)

It DOES however still occur on Beta 69.0b5 as well as our latest Nightly 70.0a1 (2019-07-18).

I'll look into this. I upgraded to the newest macOS Beta yesterday (19A512f) and am seeing this problem too with Beta and Nightly.

Flags: needinfo?(haftandilian)

(In reply to Haik Aftandilian [:haik] from comment #37)

(In reply to Rares Doghi from comment #36)

It DOES however still occur on Beta 69.0b5 as well as our latest Nightly 70.0a1 (2019-07-18).

I'll look into this. I upgraded to the newest macOS Beta yesterday (19A512f) and am seeing this problem too with Beta and Nightly.

Same here - using (19A512f) the video is now giving me an error on the latest nightly :(.

Mozregression run on macOS 10.15 Beta 4 (19A512f)

$ mozregression --good 2019-07-10 --bad 2019-07-18

led me to https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=5671bf0f8a214d690b4754aab8e6304102c1b791&tochange=368e9ae19115fce103b410d423d78076d59db7e7 indicating bug 1560368 caused this regression.

But bug 1560368 is only on 70 which wouldn't explain the issue on Beta.

I'll file a new bug post details there.

Flags: needinfo?(haftandilian) → needinfo?(mfroman)

There's already a bug on the regression from bug 1560368. I'll see if I can find it...

Bug 1566540 is what I was thinking of.

See Also: → 1566540

(In reply to Haik Aftandilian [:haik] from comment #39)

Mozregression run on macOS 10.15 Beta 4 (19A512f)

$ mozregression --good 2019-07-10 --bad 2019-07-18

led me to https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=5671bf0f8a214d690b4754aab8e6304102c1b791&tochange=368e9ae19115fce103b410d423d78076d59db7e7 indicating bug 1560368 caused this regression.

But bug 1560368 is only on 70 which wouldn't explain the issue on Beta.

I'll file a new bug post details there.

Haik, I'm not exactly sure what the question here is for me. What I've gathered from reading through the 2 bugs (this one and Bug 1566540) is that there is possibly a sandbox issue related only to RDD and macOS 10.15? Please let me know what I can do to help. I don't have a 10.15 machine here, but if necessary I can probably track down a spare drive to install the 10.15 beta on.

Flags: needinfo?(mfroman)

(In reply to Michael Froman [:mjf] from comment #42)

Haik, I'm not exactly sure what the question here is for me. What I've gathered from reading through the 2 bugs (this one and Bug 1566540) is that there is possibly a sandbox issue related only to RDD and macOS 10.15? Please let me know what I can do to help. I don't have a 10.15 machine here, but if necessary I can probably track down a spare drive to install the 10.15 beta on.

Sorry for not being clear. Originally, due to the mozregression result, the crash looked like a possible regression caused by bug 1560368. Now, after looking at the crash some more, I think it could be a new 10.15 sandboxing issue exposed by the RDD changes in bug 1560368. I'll work on bug 1566540 assuming it is a sandboxing issue.

(In reply to Rares Doghi from comment #36)

Hi everyone, after further testing this issue I can safely say that this issue no longer occurs on Esr 68.0.1 or Release candidate 68.0.1, the video starts playing without issues.

It DOES however still occur on Beta 69.0b5 as well as our latest Nightly 70.0a1 (2019-07-18).

Are we able to call this verified for 69/70 now?

Flags: needinfo?(rares.doghi)

Hi Ryan, we tested this issue on Mac OS X 10.15 Beta 5 but we are blocked by bug 1570451, however, we manage to test this issue using an older OS 10.15 Beta version and with the Latest Nightly 70.0a1 (2019-08-15) and Beta 69.0b14 the issue is no longer reproducible.
So to answer your question, yes we can call this verified for 69/70.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Whiteboard: [qa-triaged]
Flags: needinfo?(rares.doghi)
No longer depends on: 1562684

This bug has been identified as part of a pilot on determining root causes of blocking and dot release drivers.

It needs a root-cause set for it. Please see the list at https://docs.google.com/document/d/1FFEGsmoU8T0N8R9kk-MXWptOPtXXXRRIe4vQo3_HgMw/.

Add the root cause as a whiteboard tag in the form [rca - <cause> ] and remove the rca-needed keyword.

If you have questions, please contact :tmaity.

Keywords: rca-needed
Keywords: rca-needed
Whiteboard: [rca - External API Failure]
See Also: → 1689626
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: