Closed Bug 1604564 Opened 4 years ago Closed 4 years ago

Crash in [@ mozilla::ipc::ProcessLink::SendMessage | IPC_Message_Name=PWindowGlobal::Msg_RawMessage] increasing in 72, possibly related to media upload

Categories

(Core :: DOM: Content Processes, defect)

72 Branch
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla73
Tracking Status
firefox-esr68 --- unaffected
firefox71 --- unaffected
firefox72 + disabled
firefox73 + fixed

People

(Reporter: philipp, Unassigned)

References

(Depends on 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

This bug is for crash report bp-4b09aaf2-8d28-47b0-8131-799660191217.

Top 10 frames of crashing thread:

0 xul.dll void mozilla::ipc::ProcessLink::SendMessage ipc/glue/MessageLink.cpp:151
1 xul.dll mozilla::ipc::MessageChannel::Send ipc/glue/MessageChannel.cpp:1020
2 xul.dll mozilla::ipc::IProtocol::ChannelSend ipc/glue/ProtocolUtils.cpp:477
3 xul.dll mozilla::dom::PWindowGlobalChild::SendRawMessage ipc/ipdl/PWindowGlobalChild.cpp:71
4 xul.dll mozilla::dom::JSWindowActorParent::SendRawMessage dom/ipc/JSWindowActorParent.cpp:88
5 xul.dll mozilla::dom::JSWindowActor::SendQuery dom/ipc/JSWindowActor.cpp:160
6 xul.dll static bool mozilla::dom::JSWindowActorParent_Binding::sendQuery_promiseWrapper dom/bindings/JSWindowActorBinding.cpp:1617
7 xul.dll mozilla::dom::binding_detail::GenericMethod<mozilla::dom::binding_detail::NormalThisPolicy, mozilla::dom::binding_detail::ConvertExceptionsToPromises> dom/bindings/BindingUtils.cpp:3153
8  @0x2edf0ec6bd9 
9 mozglue.dll arena_t::MallocSmall memory/build/mozjemalloc.cpp:2849

according to https://bugzilla.mozilla.org/show_bug.cgi?id=1576565#c6 this signature is capturing issues with multiple different issues.

i'm filing this report for an increase we see with this signature in the firefox 72 cycle - it's happening cross-platform and with MOZ_CRASH(IPC message size is too large). the first affected nightly build apparently was 72.0a1 20191120234543.

some of the attached urls and comments in crash reports indicate that among other scenarios this crash might happen on sites offering uploading functionality for photos or videos, like https://www.inaturalist.org/observations/upload, https://www.eyeem.com/upload or https://studio.youtube.com/channel/*/videos/upload

Matt, do you have thoughts on what could be going on here?

Flags: needinfo?(matt.woodrow)

The crashing message in this case is "RunListener", so looks like an issue with the new extension fission actors.

I very much agree with bug 1576565 comment 6, having explicit crashes on too-large window actor messages (with the message name written to the dump) would be super helpful.

Flags: needinfo?(matt.woodrow) → needinfo?(tomica)
Depends on: 1604609

Can we do something in the short term for 72? Waiting on telemetry doesn't sound great.

Flags: needinfo?(matt.woodrow)

(In reply to Julien Cristau [:jcristau] from comment #3)

Can we do something in the short term for 72? Waiting on telemetry doesn't sound great.

My patch in bug 1604609 will fix these crashes by just dropping the messages instead of crashing.

I think the basic issue is that WebExtension messages were swapped over to this new system, and we were implicitly depending on the old message manager filtering to avoid crashes. The risk of bug 1604609 is that it will change the behavior of more than WebExtension messages. On the other hand, in any case where it changes the behavior we were almost certainly going to just crash, so it seems like not crashing is probably better. But maybe less visible to us, which might be concerning. I'm not sure how easy it would be to drop the WebExtension messages specifically. You could probably hack it in by doing a string comparison on the actor name.

(In reply to Matt Woodrow (:mattwoodrow) from comment #2)

The crashing message in this case is "RunListener", so looks like an issue with the new extension fission actors.

In many cases, we're just sending opaque data from extensions over actor IPC, and we can't even know the length because we're using a StructuredCloneHolder to avoid multiple cross-comparment clonings.

(In reply to Andrew McCreight [:mccr8] from comment #5)

I'm not sure how easy it would be to drop the WebExtension messages specifically. You could probably hack it in by doing a string comparison on the actor name.

I'm not sure why extension messages would be special, I think it would be fine to drop all of JSWindowActor messages (with telemetry so we can fix them), but crash on any large messages from C++ actors.

In any case, weather we go with crashing and annotating, or dropping and logging the message name in telemetry, I'm going to improve the specificity of those message names, since RunListener is used for all extension event APIs, and we can definitely narrow that down.

Flags: needinfo?(tomica)

(In reply to :Tomislav Jovanovic :zombie from comment #6)

I'm not sure why extension messages would be special, I think it would be fine to drop all of JSWindowActor messages (with telemetry so we can fix them), but crash on any large messages from C++ actors.

Note that that's what the MessageManager implementation did, and why this is now an issue just from switching to actor messages instead.

Depends on: 1605098
Flags: needinfo?(matt.woodrow)

Bug 1604609 turned this into not-a-crash in 72.0b10.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla73

Hi Andrew, should we mark this bug qe+? Is there a clear way to reproduce the crash?

Flags: qe-verify?
Flags: needinfo?(continuation)

(In reply to Ciprian Georgiu [:ciprian_georgiu], Release Desktop QA from comment #9)

Hi Andrew, should we mark this bug qe+? Is there a clear way to reproduce the crash?

The cause was clear enough that we never tried to come up with a way to actually reproduce the crash. Our only evidence that it worked is that the crash disappeared from beta when it landed.

Flags: needinfo?(continuation)

(In reply to Andrew McCreight [:mccr8] from comment #10)

The cause was clear enough that we never tried to come up with a way to actually reproduce the crash. Our only evidence that it worked is that the crash disappeared from beta when it landed.

Thanks for the explanations, Andrew.

Flags: qe-verify? → qe-verify-
You need to log in before you can comment on or make changes to this bug.