<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Updated

•

6 years ago

Fission Milestone: --- → M4

Priority: -- → P2

[:philipp]

Updated

•

6 years ago

Component: Mochitest → DOM: Content Processes

Product: Testing → Core

Version: Version 3 → unspecified

Jim Mathies [:jimm]

Updated

•

6 years ago

Flags: needinfo?(kmaglione+bmo)

Nika Layzell [:nika] (ni? for response)

Comment 1

•

6 years ago

Neha, this is affecting Beta69 in the wild too. Any chance we can re-prioritize investigation?

Flags: needinfo?(nkochar)

Comment 2

•

6 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #1)

Neha, this is affecting Beta69 in the wild too. Any chance we can re-prioritize investigation?

This is a diagnostic assert so it won't impact release or beta. The crashes are probably on devedition (or wherever we actually run MOZ_DIAGNOSTIC_ASSERTs)

Flags: needinfo?(nkochar)

Comment 3

•

6 years ago

We'll add more logging in 70 for this assert. But this shouldn't block 69.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 4

•

6 years ago

Ah indeed, the 69 reports are all from DevEdition. Thanks!

status-firefox69: affected → wontfix

Updated

•

5 years ago

status-firefox70: affected → fix-optional

Comment 5

•

5 years ago

John, could you look into adding more logging to debug this further?

Flags: needinfo?(jdai)

Comment 6

•

5 years ago

(In reply to Neha Kochar [:neha] from comment #5)

John, could you look into adding more logging to debug this further?

Sure. I'll take a look.

Flags: needinfo?(jdai)

Updated

•

5 years ago

Assignee: nobody → jdai

Status: NEW → ASSIGNED

Comment 7

•

5 years ago

Roll some unfixed bugs from Fission Milestone M4 to M5

0ee3c76a-bc79-4eb2-8d12-05dc0b68e732

Fission Milestone: M4 → M5

Comment 8

•

5 years ago

John, do you have any updates on this crash? We're still seeing about 10-20 crash reports per day.

kmag thinks we might be trying to send an invalid BrowsingContext. We should add more logging to help diagnose.

Curiously, 99.99% of the reports for this crash signature for the last six months (1419 out of 1420) are x86-64, compared to 83% x86-64 for all other Firefox Nightly crashes.

status-firefox70: fix-optional → wontfix

status-firefox71: --- → affected

status-firefox72: --- → affected

Flags: needinfo?(jdai)

OS: Linux → All

Hardware: Unspecified → x86_64

Comment 9

•

5 years ago

Hi Chris,
I am going to add more logging to help diagnose. Thank you.

Flags: needinfo?(jdai)

Pascal Chevrel:pascalc

Comment 10

•

5 years ago

We have shipped our last beta for 71 but the crash volume is low to medium, I am marking it as fix-optional in case a safe uplift would be possible in a dot release as a ridealong.

status-firefox71: affected → fix-optional

Comment 11

•

5 years ago

This only crashes on nightly and devedition, diagnostic asserts are disabled on beta and release, so updating status.

status-firefox69: wontfix → disabled

status-firefox70: wontfix → disabled

status-firefox71: fix-optional → disabled

status-firefox72: affected → fix-optional

Comment 12

•

5 years ago

(In reply to Chris Peterson [:cpeterson] from comment #8)

John, do you have any updates on this crash? We're still seeing about 10-20 crash reports per day.

kmag thinks we might be trying to send an invalid BrowsingContext. We should add more logging to help diagnose.

Curiously, 99.99% of the reports for this crash signature for the last six months (1419 out of 1420) are x86-64, compared to 83% x86-64 for all other Firefox Nightly crashes.

Bug 1580176 is for adding MOZ_LOG for JSWindowActor, we can use bug 1580176 for tracking all of the JSWindowActor logs. After bug 1580176 fixed, I can help to diagnose this crash.

Updated

•

5 years ago

status-firefox73: --- → affected

Depends on: 1580176

Updated

•

5 years ago

status-firefox72: fix-optional → wontfix

status-firefox73: affected → fix-optional

status-firefox74: --- → fix-optional

Updated

•

5 years ago

Depends on: 1623981

Updated

•

5 years ago

Depends on: 1623989

Calixte Denizet (:calixte)

Comment 13

•

5 years ago

Nika and kmag say this crash is likely caused by sending a discarded BrowsingContext. Deferring to Fission Nightly (M6) because this crash is low volume.

Some new bugs to help diagnose IPC message crashes like this:

bug 1623981 to replace MOZ_DIAGNOSTIC_ASSERT with a MozCrashPrintf that reports the name of the crashing message
bug 1623989 to add a MaybeDiscarded-like wrapper for sending BrowsingContext from JS

Unlinking JSWindowActor logging bug 1580176 because Nika says it won't help diagnose this crash.

Fission Milestone: M5 → M6b

No longer depends on: 1580176

Reporter

Updated

•

5 years ago

Crash Signature: [@ mozilla::dom::JSWindowActor::ReceiveRawMessage] → [@ mozilla::dom::JSWindowActor::ReceiveRawMessage] [@ mozilla::dom::JSActor::ReceiveRawMessage]

status-firefox73: fix-optional → wontfix

status-firefox74: fix-optional → wontfix

status-firefox75: --- → affected

status-firefox76: --- → affected

status-firefox77: --- → affected

Gian-Carlo Pascutto [:gcp]

Updated

•

5 years ago

status-firefox75: affected → wontfix

status-firefox76: affected → wontfix

Rachel Tublitz [:rachel]

Updated

•

5 years ago

status-firefox77: affected → wontfix

status-firefox78: --- → affected

Updated

•

5 years ago

status-firefox78: affected → wontfix

status-firefox79: --- → affected

Comment 15

•

5 years ago

Hi Calixte,
Is the crash still happen? Do you have a recent crash report for me to investigate? Thank you.

Flags: needinfo?(cdenizet)

Comment 16

•

5 years ago

It does, there's a table and links in the "crash data" section of the bug page; see e.g. bp-13ce9caa-8ee9-4676-bdd8-9b95c0200616

Flags: needinfo?(cdenizet)

Comment 17

•

5 years ago

I looked through a bunch of these reports and wrote down the actor and message name. The bulk of them were in Conduits, and most of the Conduits messages were RuntimeMessage and RunListener.

Conduits messages:

RunListener (12 times),
RuntimeMessage (10 times)
PortConnect (twice)
CallResult (once)

Lots of them were in a preallocated process. I don't know if that's meaningful.

Here are some other actors that showed up in these crashes (and their message):

UnselectedTabHover (Browser:UnselectedTabHover, three times)
BrowserTab (Browser:Reload, three times)
BrowserElement (PermitUnload)
BrowserTab (Browser:AppTab)
AutoComplete (FormAutoComplete:HandleEnter)

Here's an example of a crash with Conduits and RunListener: https://crash-stats.mozilla.org/report/index/a898751d-4291-4b7e-994d-09bf10200616

Tomislav Jovanovic :zombie

Comment 18

•

5 years ago

Tom, could you look into this further, using :mccr8's info above.

Flags: needinfo?(tomica)

Comment 19

•

5 years ago

I couldn't find any instance where we send a BrowsingContext in the extension framework.

Other than that, I don't know there's anything that we might be sending that would cause crashes. We do send extension-provided data, but we have it serialized into StructuredCloneHolders, so other than the size of those messages, anything else that would cause issues on deseralization would presumably throw when we do the serialization in the first place.

I don't have any other leads here, except maybe to prioritize bug 1605098 so that we get a bit more info in crash reports that include message names.

Flags: needinfo?(tomica)

Tomislav Jovanovic :zombie

Updated

•

5 years ago

Depends on: 1605098

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 20

•

5 years ago

According to the previous comments, we need more information to move this bug forward. There's no clear action we can take, unassign John for now. Please feel free to reach out or re-assign.

Assignee: jdai → nobody

Status: ASSIGNED → NEW

Randell Jesup [:jesup] (needinfo me)

Updated

•

5 years ago

status-firefox79: affected → wontfix

status-firefox80: --- → fix-optional

status-firefox-esr78: --- → wontfix

u608768

Updated

•

5 years ago

Crash Signature: [@ mozilla::dom::JSWindowActor::ReceiveRawMessage] [@ mozilla::dom::JSActor::ReceiveRawMessage] → [@ mozilla::dom::JSWindowActor::ReceiveRawMessage] [@ mozilla::dom::JSActor::ReceiveRawMessage] [@ mozilla::dom::JSActorManager::ReceiveRawMessage]

Comment 21

•

5 years ago

Collect JSActorName/Message info

Flags: needinfo?(rjesup)

Nika Layzell [:nika] (ni? for response)

Comment 22

•

5 years ago

•

Edited

Nika said she'll look into this.

Flags: needinfo?(nika)

Comment 23

•

5 years ago

I looked through a few of the reports here, and a large number of them seem to have the JSOutOfMemory annotation set to "Reported" on them, which somewhat implies that these assertion failures are being caused by an OOM while deserializing the structured clone data.

It would be nice if an OOM while deserializing structured clone data here could produce a different error from other types of deserialization failures, so we can handle them differently.

Flags: needinfo?(nika)

Comment 24

•

4 years ago

Neha asked me to look at this.

Nika's point about JSOutOfMemory being set is a good one, and does seem like a plausible explanation. One line of investigation here would be to get a better error annotation. The structured clone code seems to generate more detailed error information in ReportDataCloneError, but right now we're not recording that in the annotation. Maybe we could even explicitly check if there's a pending OOM exception? I'm not sure where that gets cleaned up. The goal of this line of investigation would be to confirm that these are OOM crashes, or if they aren't, try to figure out what they are. I saw one report that had only 20MB of physical memory free, but I saw another one that seemed to have plenty of memory and pagefile (the latter being common for low memory situations). It would also be nice to know how large the allocation was that failed, but structured clone code lives in JS, and it is hard to find that out in SpiderMonkey, in my experience.

Another line of investigation would be to split up these crashes by actor type. The crash reports already contain JSActorName and JSActorMessage fields, which is good, but as far as I can see these are not indexed in a way that lets you do any aggregation. Maybe we could get the Socorro people to index these fields so a query could facet on these fields. If that proves to be useful, maybe the signature could reflect the actor name and the message. If this is really an OOM crash, maybe some actor is sending way too much data, so it would make more sense to ascribe the crash to the specific actor and not the general window actor infrastructure.

I looked at the first 20 reports, and the actor and message were as follows:
7 BrowserTab, Browser:Reload
5 Conduits, PortMessage
3 AboutReader, Reader:PushState
3 Conduits, RuntimeMessage
1 Conduits, CallResult
1 ExtensionContent, Execute

A third line of investigation would be to look over the actor messages that show up a lot (in this or another sample of crashes) and figure out if there's something we could do to make them smaller.

Comment 25

•

4 years ago

I guess I looked at the messages before. Browser:Reload is kind of interesting to see because as far as I can see it only sends an integer and a boolean, so maybe the contents of the message aren't to blame for whatever is happening, at least if it is an OOM.

Brian Carpenter [:geeknik]

Comment 26

•

4 years ago

kmag will add payload size so we get more info from these crash reports.

Assignee: nobody → kmaglione+bmo

Status: NEW → ASSIGNED

Fission Milestone: M6b → M6c

Flags: needinfo?(rjesup)

Comment 27

•

4 years ago

•

Edited

Visiting https://v8.github.io/test262/website/default.html# , clicking Run, clicking Run All and allowing the tests to run will trigger this crash semi-reliably:

==2160546==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000000 (pc 0x7f01e8f7bd5a bp 0x7ffcfa6983f0 sp 0x7ffcfa697fa0 T0)
==2160546==The signal is caused by a WRITE memory access.
==2160546==Hint: address points to the zero page.
    #0 0x7f01e8f7bd5a in mozilla::dom::JSActorManager::ReceiveRawMessage(mozilla::dom::JSActorMessageMeta const&, mozilla::dom::ipc::StructuredCloneData&&, mozilla::dom::ipc::StructuredCloneData&&) /builds/worker/checkouts/gecko/dom/ipc/jsactor/JSActorManager.cpp:161:5
    #1 0x7f01e8f5693c in mozilla::dom::WindowGlobalChild::RecvRawMessage(mozilla::dom::JSActorMessageMeta const&, mozilla::dom::ClonedMessageData const&, mozilla::dom::ClonedMessageData const&) /builds/worker/checkouts/gecko/dom/ipc/WindowGlobalChild.cpp:561:3
    #2 0x7f01e298397f in mozilla::dom::PWindowGlobalChild::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PWindowGlobalChild.cpp:1175:61
    #3 0x7f01e22a08ae in mozilla::dom::PContentChild::OnMessageReceived(IPC::Message const&) /builds/worker/workspace/obj-build/ipc/ipdl/PContentChild.cpp:8621:32
    #4 0x7f01e20a24e8 in mozilla::ipc::MessageChannel::DispatchAsyncMessage(mozilla::ipc::ActorLifecycleProxy*, IPC::Message const&) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:2150:25
    #5 0x7f01e209eba2 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:2074:9
    #6 0x7f01e20a084a in mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1922:3
    #7 0x7f01e20a0e8d in mozilla::ipc::MessageChannel::MessageTask::Run() /builds/worker/checkouts/gecko/ipc/glue/MessageChannel.cpp:1953:13
    #8 0x7f01e0dd4757 in mozilla::RunnableTask::Run() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:245:16
    #9 0x7f01e0dd0404 in mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:515:26
    #10 0x7f01e0dcde65 in mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex&> const&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:374:15
    #11 0x7f01e0dce3f7 in mozilla::TaskController::ProcessPendingMTTask(bool) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:171:36
    #12 0x7f01e0dd9a71 in operator() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:85:37
    #13 0x7f01e0dd9a71 in mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal()::$_3>::Run() /builds/worker/workspace/obj-build/dist/include/nsThreadUtils.h:577:5
    #14 0x7f01e0df85c2 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/checkouts/gecko/xpcom/threads/nsThread.cpp:1197:14
    #15 0x7f01e0e025f1 in NS_ProcessNextEvent(nsIThread*, bool) /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp:513:10
    #16 0x7f01e20a9e47 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /builds/worker/checkouts/gecko/ipc/glue/MessagePump.cpp:87:21
    #17 0x7f01e1fc2312 in RunInternal /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:334:10
    #18 0x7f01e1fc2312 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:327:3
    #19 0x7f01e1fc2312 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:309:3
    #20 0x7f01e987313a in nsBaseAppShell::Run() /builds/worker/checkouts/gecko/widget/nsBaseAppShell.cpp:137:27
    #21 0x7f01ed83844f in XRE_RunAppShell() /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:913:20
    #22 0x7f01e1fc2312 in RunInternal /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:334:10
    #23 0x7f01e1fc2312 in RunHandler /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:327:3
    #24 0x7f01e1fc2312 in MessageLoop::Run() /builds/worker/checkouts/gecko/ipc/chromium/src/base/message_loop.cc:309:3
    #25 0x7f01ed837ce1 in XRE_InitChildProcess(int, char**, XREChildData const*) /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:744:34
    #26 0x55ebc515dac8 in content_process_main /builds/worker/checkouts/gecko/browser/app/../../ipc/contentproc/plugin-container.cpp:56:28
    #27 0x55ebc515dac8 in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:304:18
    #28 0x7f01f7d03041 in __libc_start_main (/lib64/libc.so.6+0x27041)
    #29 0x55ebc50b09a8 in _start (/home/geeknik/firefox/firefox-bin+0xb69a8)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /builds/worker/checkouts/gecko/dom/ipc/jsactor/JSActorManager.cpp:161:5 in mozilla::dom::JSActorManager::ReceiveRawMessage(mozilla::dom::JSActorMessageMeta const&, mozilla::dom::ipc::StructuredCloneData&&, mozilla::dom::ipc::StructuredCloneData&&)
==2160546==ABORTING

Brian Carpenter [:geeknik]

Comment 28

•

4 years ago

•

Edited

This also appears in the console if helpful:
Assertion failure: false (Should not receive non-decodable data), at /builds/worker/checkouts/gecko/dom/ipc/jsactor/JSActorManager.cpp:161

It appears to be the result-get-matched-err test, it triggers the slow page warning and the tab will crash if you don't do anything about the slow script warning. Even if you do click Stop It the tab will likely crash. Whilst the browser "hangs" memory use eventually reaches 100% at which point the tab crashes? Last image seen before the tab crashed: https://i.imgur.com/7e4dfhs.png

Brian Carpenter [:geeknik]

Comment 29

•

4 years ago

I wonder if bug 1660539 is related to this.

Comment 30

•

4 years ago

Could be, I haven’t been able to reproduce this since bug 1660539 was fixed.

Sylvestre Ledru [:Sylvestre]

Comment 31

•

4 years ago

I just experienced with fission on today's nightly on Linux on a Google doc bp-011a96d9-0075-4352-ae51-d21fb0201201

status-firefox80: fix-optional → wontfix

status-firefox84: --- → affected

status-firefox85: --- → affected

Rob Wu [:robwu]

Comment 32

•

4 years ago

Sylverstre, were you running out of memory?

In the (non-public) details tab of your crash report, I see

JSActorMessage Execute
JSActorName ExtensionContent
JSOutOfMemory Reported

The "Execute" message is certainly expected to be "decodable", it is a JSON-serializable object.

Sylvestre Ledru [:Sylvestre]

Comment 33

•

4 years ago

Unlikely, the system on which I had the issue:

% cat /proc/meminfo
MemTotal:       65723420 kB
MemFree:        20876544 kB
MemAvailable:   51710984 kB
Buffers:         1378132 kB
Cached:         30175740 kB
SwapCached:        45928 kB
Active:         28126880 kB

Brendan Dahl [:bdahl]

Updated

•

4 years ago

status-firefox84: affected → wontfix

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

4 years ago

status-firefox85: affected → wontfix

Comment 34

•

4 years ago

Hello,
A process on my Firefox session just crashed.
And via about:crashes, I've been redirected to this bug.

So, here is a STR, I hope this may help :

On a Windows 64-bit, launch the latest Nightly (in this case, 2020-12-20)
Launch a live video on YouTube.
Watch this live during one hour or one and half hour.

Obtained result :
During the live, the memory has risen to the maximum available limit (on my PC, around 10 GB for a PC with a RAM of 12 GB).

Excepted result :
The memory remains at a nominal level.

This is not the first time I have this crash on a live video on YouTube.
But this never occurs on a "classic" (ie, not live) video on YouTube.

Comment 35

•

4 years ago

I don't see evidence of frequent OOM conditions in the crash reports save for a few (~5% of them). Can you point us to your particular report? If Firefox ran out of memory we might find a memory report attached which could help us diagnose the issue.

Flags: needinfo?(lolo2bdx)

Updated

•

4 years ago

Assignee: kmaglione+bmo → continuation

Comment 36

•

4 years ago

Hello Gabriele,
I have been to able to reproduce the conditions just before the crash (Firefox uses a lot of memory while a live video on Youtube is played).
And I have obtained a memory report, so here it is.

Flags: needinfo?(lolo2bdx)

Comment 37

•

4 years ago

Attached file High memory used by Firefox on a live video on Youtube — Details

Updated

•

4 years ago

Attachment #9197591 - Attachment description: High memory used by Firefox on a live video on outube → High memory used by Firefox on a live video on Youtube

Comment 38

•

4 years ago

•

Edited

Thanks, this is extremely useful. The process playing YouTube isn't particularly large but the extension process is huge, it's taking almost 1.5GiB of memory on its own. Looking at the various bits under that it seems that VideoDownloadHelper has allocated and never freed hundreds of megabytes of strings. So it seems that it's leaking memory somehow, can you try disabling the extension and seeing if the problem goes away? I'll inspect the other crash to see if they're also using the same extension.

[edit] I misinterpreted the memory report, it's not 2-byte strings, it's TwoByte strings so non-Latin unicode strings.

Comment 39

•

4 years ago

I inspected a few more memory reports in the crashes and I don't see a pattern unfortunately. The report attached as part of comment 37 is definitely a leak, so this crash might be also triggered by OOM-like conditions, or they might make it more likely. I poked a few more crashes for URLs and comments and it seems that pages with videos (Facebook feeds, YouTube and other streaming services) are more common than others, but there's no clear pattern.

Is the value of the data that has been received by JSActorManager::ReceiveRawMessage() important? If it is I can crack open a few minidumps and see if I can extract some useful samples.

Comment 40

•

4 years ago

We have bug 1686267 on file regarding VideoDownloadHelper memory spiraling out of control, so you could look at that.

(In reply to Gabriele Svelto [:gsvelto] from comment #39)

Is the value of the data that has been received by JSActorManager::ReceiveRawMessage() important? If it is I can crack open a few minidumps and see if I can extract some useful samples.

I don't know if the data per se is important, but we are interested in the size of the message being received, or the size of the structured clone data.

Comment 41

•

4 years ago

I think we should reconsider tracking bug 1563825 as part of Fission m6c. The crash is showing up in Developer edition, where Fission can't be enabled, and only about 10% of crashes with this signature on Nightly have Fission enabled over the last month. If you look across all crashes on Nightly in the last month, 26% of them have Fission enabled. This might be skewed a bit due to the recent high frequency crash that affected Fission more, but it still suggests this isn't a Fission-specific problem, but rather a problem with infrastructure introduced to support Fission. As such, it feels like it shouldn't block Fission rollout.

Fission Milestone: M6c → ?

Comment 42

•

4 years ago

(In reply to Andrew McCreight [:mccr8] from comment #41)

I think we should reconsider tracking bug 1563825 as part of Fission m6c. The crash is showing up in Developer edition, where Fission can't be enabled, and only about 10% of crashes with this signature on Nightly have Fission enabled over the last month. If you look across all crashes on Nightly in the last month, 26% of them have Fission enabled. This might be skewed a bit due to the recent high frequency crash that affected Fission more, but it still suggests this isn't a Fission-specific problem, but rather a problem with infrastructure introduced to support Fission. As such, it feels like it shouldn't block Fission rollout.

Clearing Fission Milestone because Nika says this is not a Fission-specific bug.

26% of crash reports had Fission enabled in the last month, but that's expected for almost any crash because Fission is enabled for about 20-25% of Nightly users.

Fission Milestone: ? → ---

status-firefox86: --- → affected

Whiteboard: [fission-]

Sylvestre Ledru [:Sylvestre]

Comment 43

•

4 years ago

Not a Fission bug

Whiteboard: [fission-] → [not-a-fission-bug]

Updated

•

4 years ago

status-firefox87: --- → affected

Pascal Chevrel:pascalc

Comment 44

•

4 years ago

Too late for 86 RC but I am keeping the status for firefox as fix-optional as I would probably take a safe patch in a potantial dot release.

status-firefox86: affected → fix-optional

Comment 45

•

4 years ago

I opened up a couple of minidumps but had a hard time figuring out where to find the size of the data being cloned. What field should I be looking at? In one minidump aData.mStorage.val.mExternalData.bufList_ has a mSize field of ~8KiB, in the other 1.5KiB. In both cases drilling down the error object I find a mErrorNumber field set to MSG_INVALID_ENUM_VALUE. Is any of this useful?

Comment 46

•

4 years ago

ni?mccr8 in case comment 45 helps.

Flags: needinfo?(continuation)

Updated

•

4 years ago

status-firefox86: fix-optional → wontfix

status-firefox87: affected → wontfix

status-firefox88: --- → affected

Comment 47

•

4 years ago

I don't know what comment 45 means.

Flags: needinfo?(continuation)

Updated

•

4 years ago

Severity: critical → S2

status-firefox88: affected → fix-optional

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 48

•

4 years ago

I'm not going to have time to look at this soon.

Assignee: continuation → nobody

Status: ASSIGNED → NEW

RaresB

Updated

•

4 years ago

QA Whiteboard: qa-not-actionable

Updated

•

4 years ago

Crash Signature: [@ mozilla::dom::JSWindowActor::ReceiveRawMessage] [@ mozilla::dom::JSActor::ReceiveRawMessage] [@ mozilla::dom::JSActorManager::ReceiveRawMessage] → [@ mozilla::dom::JSWindowActor::ReceiveRawMessage] [@ mozilla::dom::JSActor::ReceiveRawMessage] [@ mozilla::dom::JSActorManager::ReceiveRawMessage] [@ OOM | unknown | mozilla::dom::JSActorManager::ReceiveRawMessage]

Comment 49

•

3 years ago

These crashes are all fallout from JS OOMs. Even if we ignore the message decoding errors instead of crashing, the browser is probably going to crash elsewhere soon or we will miss an important message, leaving the content process in an inconsistent state.

Updated

•

3 years ago

Has Regression Range: --- → yes

Randell Jesup [:jesup] (needinfo me)

Comment 50

•

3 years ago

(In reply to Chris Peterson [:cpeterson] from comment #49)

These crashes are all fallout from JS OOMs. Even if we ignore the message decoding errors instead of crashing, the browser is probably going to crash elsewhere soon or we will miss an important message, leaving the content process in an inconsistent state.

I assume this makes it less severe.

FWIW, I see very few crashes like https://crash-stats.mozilla.org/report/index/fd18037f-32b8-4cd8-aaaa-926050220301 in release that seem to indicate that while trying to do something with the mPendingQueries of an actor returned from GetActor we have a nullptr access. And there seems to be a ::new on that stack, too.

Severity: S2 → S3

Priority: P2 → P3

Comment 51

•

3 years ago

Just hit this crash, in a mozilla.org process that was using >5.8GB of memory (per system sysinfo). It may have happened when I ran about:memory, which requires allocating memory to return a result, which may have triggered an JS OOM.

Randell Jesup [:jesup] (needinfo me)

Updated

•

3 years ago

Updated

•

3 years ago

Comment 52

•

3 years ago

(In reply to Randell Jesup [:jesup] (needinfo me) from comment #51)

Just hit this crash, in a mozilla.org process that was using >5.8GB of memory (per system sysinfo). It may have happened when I ran about:memory, which requires allocating memory to return a result, which may have triggered an JS OOM.

Yup, this crash tends to occur due to a JS heap OOM while deserializing.

Updated

•

3 years ago

status-firefox88: fix-optional → wontfix

Comment 53

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 10 desktop browser crashes on nightly

:janv, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jvarga)

Keywords: topcrash

Kris Maglione [:kmag]

Comment 54

•

2 years ago

I wonder if we should stop asserting when the failure is because of an OOM. I don't like the idea of continuing with a child process when it's failed to process a message, since that could mean its state could be out of sync with the parent in dangerous ways. But this isn't a release assert, so it isn't helping release users at all. And the number of crashes from OOMs means we don't actually see any reports that failed to deserialize the message for other reasons that we can actually fix...

Comment 55

•

2 years ago

(In reply to Kris Maglione [:kmag] from comment #54)

I wonder if we should stop asserting when the failure is because of an OOM. I don't like the idea of continuing with a child process when it's failed to process a message, since that could mean its state could be out of sync with the parent in dangerous ways. But this isn't a release assert, so it isn't helping release users at all. And the number of crashes from OOMs means we don't actually see any reports that failed to deserialize the message for other reasons that we can actually fix...

If I read StructuredCloneHolder::ReadFromBuffer correctly, it seems we get a specific error message from JS but throw always a DataCloneError.

IIUC that makes it difficult to just check for OOM and exclude that case from the assertion which might be still of interest in other cases?

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

2 years ago

Duplicate of this bug: 1751391

Kris Maglione [:kmag]

Comment 57

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #55)

If I read StructuredCloneHolder::ReadFromBuffer correctly, it seems we get a specific error message from JS but throw always a DataCloneError.

IIUC that makes it difficult to just check for OOM and exclude that case from the assertion which might be still of interest in other cases?

Yes. I looked into it after I made the suggestion and came to the conclusion that the simplest thing would be to just check whether the OOM reported flag was set. Unfortunately, even the error we get from the JS engine is not very specific, and the situation isn't very easy to improve. The spec says that we need to throw a DataCloneError, but it would be nice if internal consumers could still get more specific error details when they want them.

Comment hidden (Intermittent Failures Robot)

Comment 61

•

2 years ago

Spike in crashes over the last 2 days aligns with the spike in bug 1405521.

Updated

•

2 years ago

Comment 66

•

2 years ago

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 67

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 10 desktop browser crashes on nightly

:aiunusov, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(aiunusov)

Keywords: topcrash

Comment hidden (Intermittent Failures Robot)

Artur Iunusov

Comment 69

•

2 years ago

Still need to collect more informaton

Flags: needinfo?(aiunusov)

Updated

•

2 years ago

Component: DOM: Content Processes → DOM: Navigation

Jan Varga [:janv]

Updated

•

2 years ago

Flags: needinfo?(jvarga)

Comment hidden (Intermittent Failures Robot)

Comment 84

•

2 years ago

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

Comment hidden (Intermittent Failures Robot)

https://hg.mozilla.org/mozilla-central/rev/3b1e0a2fb06b

Comment 86

•

2 years ago

Sorry for removing the keyword earlier but there is a recent change in the ranking, so the bug is again linked to a topcrash signature, which matches the following criteria:

Top 10 desktop browser crashes on nightly
Top 10 content process crashes on beta

For more information, please visit auto_nag documentation.

Keywords: topcrash

Comment hidden (Intermittent Failures Robot)

Kris Maglione [:kmag]

Comment 88

•

2 years ago

Attached file Bug 1563825: Don't assert on failure to decode message after OOM. r=mccr8 — Details

Ideally, we don't want to continue running a child process if it fails to
handle a message from the parent, since that could mean child and parent state
could get out of sync. But since this assertion is only a diagnostic assert,
it isn't guaranteeing that in release builds anyway. And since the vast
majority of the crashes we are seeing in builds with diagnostic asserts
enabled appear to be OOMs, we can't really use crash reports to diagnose other
issues.

Ideally (again), we'd determine if the failure was caused by an OOM based on
the failure code returned by the structured clone decode call. Unfortunately,
though, since the spec requires that we return a generic DataCloneError on
failure, the structured clone code intentionally hides the specifics of
failure from callers. Propagating out more specific failure reasons for use by
privileged callers is nontrivial. So this patch essentially does the same
thing as crash reports do, and checks whether an OOM was reported recently,
and hasn't been recovered from by a successful GC.

Phabricator Automation

Updated

•

2 years ago

Assignee: nobody → kmaglione+bmo

Status: NEW → ASSIGNED

Pulsebot

Comment 89

•

2 years ago

Pushed by maglione.k@gmail.com: https://hg.mozilla.org/integration/autoland/rev/3b1e0a2fb06b Don't assert on failure to decode message after OOM. r=mccr8

Norisz Fay [:noriszfay]

Comment 90

•

2 years ago

bugherder

Status: ASSIGNED → RESOLVED

Closed: 2 years ago

status-firefox114: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 114 Branch

Cristian Tuns

Comment 91

•

2 years ago

This is still happening on autoland: https://treeherder.mozilla.org/logviewer?job_id=413723724&repo=autoland

Status: RESOLVED → REOPENED

status-firefox114: fixed → ---

Resolution: FIXED → ---

Target Milestone: 114 Branch → ---

Comment hidden (Intermittent Failures Robot)

Comment 93

•

2 years ago

Based on the topcrash criteria, the crash signatures linked to this bug are not in the topcrash signatures anymore.

For more information, please visit BugBot documentation.

Keywords: topcrash

Comment hidden (Intermittent Failures Robot)

Comment hidden (obsolete)

Updated

•

2 years ago

Flags: needinfo?(kmaglione+bmo)

Comment hidden (Intermittent Failures Robot)