Crash in [@ mozilla::ipc::MessageChannel::Send | mozilla::ipc::IPDLResolverInner::ResolveOrReject | IPC_Message_Name=PContent::Reply_FlushFOGData] (recent regression)
Categories
(Toolkit :: Telemetry, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox95 | --- | wontfix |
firefox96 | --- | fixed |
firefox97 | --- | fixed |
People
(Reporter: ole+mozilla, Assigned: chutten)
References
Details
(Whiteboard: qa-not-actionable)
Crash Data
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
|
Details | Review |
Since Firefox 94.0 I have been suffering from regular, but not very frequent crashes, often happening after a page has been opened for longer time.
https://crash-stats.mozilla.org/signature/?product=Firefox&signature=mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3ASend%20%7C%20mozilla%3A%3Aipc%3A%3AIPDLResolverInner%3A%3AResolveOrReject%20%7C%20IPC_Message_Name%3DPContent%3A%3AReply_FlushFOGData&date=%3E%3D2021-06-01T17%3A43%3A00.000Z&date=%3C2021-12-12T17%3A43%3A00.000Z shows 145 crashes with this signature, so it seems like a ~recent regression happening in Windows 7 - Windows 11.
Maybe Fission related. (DOMFissionEnabled=1)
Crash report: https://crash-stats.mozilla.org/report/index/45d1f8bc-dc9a-4ff9-baac-777e90211212
MOZ_CRASH Reason: MOZ_CRASH(IPC message size is too large)
Top 10 frames of crashing thread:
0 xul.dll mozilla::ipc::MessageChannel::Send ipc/glue/MessageChannel.cpp:888
1 xul.dll mozilla::ipc::IPDLResolverInner::ResolveOrReject ipc/glue/ProtocolUtils.cpp:944
2 xul.dll std::_Func_impl_no_alloc<`lambda at /builds/worker/workspace/obj-build/ipc/ipdl/PContentChild.cpp:15893:45', void, mozilla::ipc::ByteBuf&&>::_Do_call
3 xul.dll mozilla::glean::FlushFOGData toolkit/components/glean/ipc/FOGIPC.cpp:64
4 xul.dll mozilla::dom::PContentChild::OnMessageReceived ipc/ipdl/PContentChild.cpp:15910
5 xul.dll mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:1968
6 xul.dll mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:771
7 xul.dll mozilla::TaskController::ProcessPendingMTTask xpcom/threads/TaskController.cpp:391
8 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1175
9 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:107
First crash with this signature on my system:
https://crash-stats.mozilla.org/report/index/51777331-d720-42a9-bc8f-b52d70211111
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 1•2 years ago
|
||
The limit is like 256MB, how in the world are we hitting that. Ugh.
Assignee | ||
Comment 2•2 years ago
|
||
(I say "how in the world", but I have a pretty good idea its the same thing inflating our db size in bug 1743683)
Assignee | ||
Comment 3•2 years ago
|
||
Notes to self:
- Fixing this will increase the db size issue by sending more data (though not much given the frequencies of these crashes)
- I should send out an email to FOG data consumers to warn them that there is some missing data due to crashes
Assignee | ||
Comment 4•2 years ago
|
||
Pushed by chutten@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/383986e2f5cb Flush FOG IPC every 100k samples r=TravisLong
![]() |
||
Comment 6•2 years ago
|
||
bugherder |
Assignee | ||
Comment 7•2 years ago
|
||
Comment on attachment 9255091 [details]
Bug 1745660 - Flush FOG IPC every 100k samples r?TravisLong!
Beta/Release Uplift Approval Request
- User impact if declined: Unlikely but unavoidable tab crash
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Medium
- Why is the change risky/not risky? (and alternatives if risky): Medium risk because this crash has escaped us before and I have a healthy distrust for anything that touches IPC... that being said, this is tested and is very small so only has so much room for bugs.
- String changes made/needed:
Assignee | ||
Comment 8•2 years ago
|
||
For manual testing, here's the STR I used to verify this on Nightly. I don't think QE needs to follow this, but in case we end up needing it:
- Enable remote debugging in devtools settings
- Load
about:glean
and open a devtools console (privileged parent process JS) - Load a page that starts a content process (mozilla.org will do)
- On the content-process-having tab, open Tools > Browser Tools > Browser Content Toolbox (privileged content process JS)
- Open Tools > Browser Tools > Browser Console so you can see the logging
- In the Browser Content Toolbox, run this code:
Cu.importGlobalProperties(["Glean"]);
const { setTimeout } = ChromeUtils.import("resource://gre/modules/Timer.jsm");
const { console } = ChromeUtils.import("resource://gre/modules/Console.jsm");
var iterationCount = 0;
function iteration() { for (let i = 0; i < 2600; i++) { Glean.testOnlyIpc.anEvent.record({extra1: "A string that isn't 100 bytes but it's long enough to be annoying. Oh, okay, let's make it 96B."}); }; if (iterationCount < 1000) { iterationCount++; setTimeout(iteration, 0); } else { console.log("DONE!"); } }
This will set up all the things we need to record over 256MB of data to the FOG IPC Payload, yielding the main thread (So IPC can happen) every 2600 * (100 + overhead)
bytes.
7) In the same Browser Content Toolbox, run the now-set-up code with
iteration()
- After a short while (under 10min definitely) you should see
DONE!
in the Browser Console - If the tab hasn't yet crashed you can use the privileged parent process JS console to
await Services.fog.testFlushAllChildren();
Glean.testOnlyIpc.anEvent.testGetValue().length
If the tab crashed, congratulations, you've reproduced the bug. You are running a build without the fix.
If the tab doesn't crash, congratulations, you've not reproduced the bug. You should get a value of around 2602601
in the parent process JS console. You are running a build with the fix.
Updated•2 years ago
|
Comment 9•2 years ago
|
||
Comment on attachment 9255091 [details]
Bug 1745660 - Flush FOG IPC every 100k samples r?TravisLong!
Approved for 96.0b6
Comment 10•2 years ago
|
||
bugherder uplift |
Comment 11•2 years ago
|
||
Hi Chris, this patch was uplifted to beta and there is now an ESlint: https://treeherder.mozilla.org/logviewer?job_id=361502134&repo=mozilla-beta&lineNumber=127
Can you please take a look?
[task 2021-12-16T15:27:08.252Z] /builds/worker/checkouts/gecko/testing
[task 2021-12-16T15:27:08.252Z] /builds/worker/checkouts/gecko/layout
[task 2021-12-16T15:27:08.252Z] /builds/worker/checkouts/gecko/dom
[task 2021-12-16T15:27:08.252Z] /builds/worker/checkouts/gecko/chrome
[task 2021-12-16T15:27:08.252Z] /builds/worker/checkouts/gecko/xpfe
[task 2021-12-16T15:27:08.252Z] /builds/worker/checkouts/gecko/remote
[task 2021-12-16T15:27:08.253Z] /builds/worker/checkouts/gecko/config
[task 2021-12-16T15:27:08.253Z] /builds/worker/checkouts/gecko/memory
[task 2021-12-16T15:27:08.253Z] /builds/worker/checkouts/gecko/intl
[task 2021-12-16T15:27:08.253Z] /builds/worker/checkouts/gecko/caps
[task 2021-12-16T15:27:08.253Z] /builds/worker/checkouts/gecko/taskcluster
[task 2021-12-16T15:27:08.263Z] 15:27:08.262 eslint (93) | Command: /usr/local/bin/node /builds/worker/checkouts/gecko/node_modules/eslint/bin/eslint.js --ext [js,jsm,jsx,xul,html,xhtml,sjs] --format json --no-error-on-unmatched-pattern --ignore-pattern testing/mochitest/pywebsocket3 --ignore-pattern dom/media/webspeech/recognition/endpointer.cc --ignore-pattern testing/mochitest/MochiKit --ignore-pattern dom/media/platforms/ffmpeg/ffmpeg58 --ignore-pattern dom/webauthn/tests/pkijs --ignore-pattern dom/canvas/test/webgl-conf/checkout --ignore-pattern dom/media/gmp/widevine-adapter/content_decryption_module_proxy.h --ignore-pattern dom/media/gmp/widevine-adapter/content_decryption_module_ext.h --ignore-pattern testing/talos/talos/tests/kraken --ignore-pattern dom/imptests --ignore-pattern dom/media/webvtt/vtt.jsm --ignore-pattern testing/modules/ajv-6.12.6.js --ignore-pattern dom/media/webspeech/recognition/energy_endpointer.cc --ignore-pattern dom/u2f/tests/pkijs --ignore-pattern dom/tests/mochitest/ajax --ignore-pattern testing/mochitest/tests/MochiKit-1.4.2 --ignore-pattern dom/media/webspeech/recognition/energy_endpointer_params.h --ignore-pattern testing/web-platform/tests/tools/third_party --ignore-pattern intl/icu --ignore-pattern dom/media/gmp/widevine-adapter/content_decryption_module.h --ignore-pattern testing/xpcshell/dns-packet --ignore-pattern remote/test/puppeteer --ignore-pattern dom/tests/mochitest/dom-level2-html --ignore-pattern remote/cdp/test/browser/chrome-remote-interface.js --ignore-pattern dom/tests/mochitest/dom-level1-core --ignore-pattern intl/unicharutil/util/nsUnicodePropertyData.cpp --ignore-pattern testing/talos/talos/tests/dromaeo --ignore-pattern dom/media/platforms/ffmpeg/libav54 --ignore-pattern dom/media/platforms/ffmpeg/libav55 --ignore-pattern layout/docs/css-gap-decorations --ignore-pattern testing/gtest/gtest --ignore-pattern testing/xpcshell/odoh-wasm --ignore-pattern testing/xpcshell/node-http2 --ignore-pattern dom/media/webspeech/recognition/energy_endpointer.h --ignore-pattern testing/gtest/gmock --ignore-pattern intl/unicharutil/util/nsUnicodeScriptCodes.h --ignore-pattern dom/media/webrtc/transport/third_party --ignore-pattern dom/media/webaudio/test/blink --ignore-pattern dom/media/webspeech/recognition/energy_endpointer_params.cc --ignore-pattern dom/tests/mochitest/dom-level2-core --ignore-pattern dom/media/platforms/ffmpeg/ffmpeg57 --ignore-pattern dom/media/gmp/rlz --ignore-pattern testing/modules/sinon-7.2.7.js --ignore-pattern testing/web-platform/tests/resources/webidl2 --ignore-pattern testing/talos/talos/tests/v8_7 --ignore-pattern testing/mozbase/mozproxy/mozproxy/backends/mitm/scripts/catapult --ignore-pattern dom/media/webspeech/recognition/endpointer.h --ignore-pattern dom/webauthn/cbor-cpp --ignore-pattern dom/media/platforms/ffmpeg/libav53 --ignore-pattern testing/xpcshell/node-ip --ignore-pattern intl/unicharutil/util/nsSpecialCasingData.cpp --ignore-pattern dom/media/gmp/widevine-adapter/content_decryption_module_export.h /builds/worker/checkouts/gecko/gradle /builds/worker/checkouts/gecko/testing /builds/worker/checkouts/gecko/layout /builds/worker/checkouts/gecko/dom /builds/worker/checkouts/gecko/chrome /builds/worker/checkouts/gecko/xpfe /builds/worker/checkouts/gecko/remote /builds/worker/checkouts/gecko/config /builds/worker/checkouts/gecko/memory /builds/worker/checkouts/gecko/intl /builds/worker/checkouts/gecko/caps /builds/worker/checkouts/gecko/taskcluster
[task 2021-12-16T15:28:49.997Z] 15:28:49.997 eslint (94) | Finished in 101.93 seconds
[task 2021-12-16T15:30:44.456Z] 15:30:44.456 eslint (91) | Finished in 216.40 seconds
[task 2021-12-16T15:31:34.152Z] 15:31:34.152 eslint (93) | Finished in 266.09 seconds
[task 2021-12-16T15:33:33.730Z] 15:33:33.730 eslint (92) | Finished in 385.67 seconds
[task 2021-12-16T15:33:33.739Z] TEST-UNEXPECTED-ERROR | /builds/worker/checkouts/gecko/toolkit/components/glean/tests/xpcshell/test_FOGIPCLimit.js:19:5 | 'Services' is not defined. (no-undef)
[taskcluster 2021-12-16 15:33:34.287Z] === Task Finished ===
[taskcluster 2021-12-16 15:33:34.906Z] Unsuccessful task run with exit code: 1 completed in 476.507 seconds
Assignee | ||
Comment 12•2 years ago
|
||
Huh. Could've sworn Services
was in scope. How do you want this :apavel? Another patch on the same stack? Should just need a const { Services } = ChromeUtils.import("resource://gre/modules/Services.jsm");
...though mozilla-beta probably doesn't have Services.fog (came in bug 1715542) and requires instead
const FOG = Cc["@mozilla.org/toolkit/glean;1"].createInstance(Ci.nsIFOG);
FOG.initializeFOG();
Instead of Services.fog.initializeFOG();
Comment 13•2 years ago
|
||
bugherder uplift |
Updated•2 years ago
|
Description
•