Closed Bug 1664151 Opened 1 year ago Closed 4 months ago

Crash in [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::dom::PContentChild::OnMessageReceived] IPDL error: "Error deserializing 'SharedMemoryHandle[]'". abort()ing as a result.

Categories

(Core :: IPC, defect)

defect

Tracking

()

RESOLVED FIXED
91 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox83 --- disabled
firefox89 --- disabled
firefox90 --- fixed
firefox91 --- fixed

People

(Reporter: sg, Assigned: jfkthame, NeedInfo)

References

(Regression)

Details

(Keywords: crash, regression, Whiteboard: [sharedUASheetHandle])

Crash Data

Attachments

(1 file)

Crash report: https://crash-stats.mozilla.org/report/index/713f0728-a0e2-4c05-8b3c-90e550200910

Top 10 frames of crashing thread:

0 libxul.so mozilla::ipc::FatalError ipc/glue/ProtocolUtils.cpp:193
1 libxul.so mozilla::ipc::IProtocol::HandleFatalError const ipc/glue/ProtocolUtils.cpp:422
2 libxul.so mozilla::dom::PContentChild::OnMessageReceived ipc/ipdl/PContentChild.cpp:11541
3 libxul.so mozilla::ipc::MessageChannel::DispatchMessage ipc/glue/MessageChannel.cpp:2074
4 libxul.so mozilla::ipc::MessageChannel::MessageTask::Run ipc/glue/MessageChannel.cpp:1953
5 libxul.so mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:512
6 libxul.so mozilla::detail::RunnableFunction<mozilla::TaskController::InitializeInternal xpcom/threads/nsThreadUtils.h:577
7 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1234
8 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:87
9 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:309

There's a recent spike with this signature on Nightly starting with 20200907214307, but only from a few installations.

I looked at about a half dozen of these crashes and they were all happening here:

            if ((!(ReadIPDLParam((&(msg__)), (&(iter__)), this, (&(sharedUASheetHandle)))))) {
                FatalError("Error deserializing 'SharedMemoryHandle?'");
                return MsgValueError;
            }

It looks like this is for the message SetXPCOMProcessAttributes, and this argument related to user agent style sheets. Any ideas, Heycam? Thanks.

Maybe somebody has a really huge style sheet that is causing issues, or maybe somebody is having general memory issues and happens to be using these.

Component: IPC → CSS Parsing and Computation
Flags: needinfo?(cam)
Whiteboard: [sharedUASheetHandle]
Severity: -- → S2

I don't think it would be anything to do with the size of the UA style sheets (which are constant, anyway), because this is just the receiving of the handle value. Nothing stands out to me here, but it looks like the spike has gone away now.

Flags: needinfo?(cam)

There is an increased volume of these crashes in 83 beta 1 and 2.

SetXPCOMProcessAttributes is one of the first messages (or the first message, maybe?) sent to a content process.

Reading a shared memory handle doesn't really have any normal failure cases… but it could fail if the message ended unexpectedly. (IPC messages have their own length fields, so it really would have to have been sent with the wrong length for that to happen.) For example, if there were a version mismatch between processes that somehow wasn't caught by the build ID check, and the type of that message (or some type contained in it) were changed, that could possibly show up like this.

So I still don't know what's going on here, but it's probably not about stylesheets.

Component: CSS Parsing and Computation → IPC

…also, I think this isn't sharedUASheetHandle, even though the line number seems to point to the start of the block dealing with it (and that also seems wrong, because we're supposed to be in the FatalError call, or maybe the instruction after it returns). The reports on Linux are Error deserializing 'SharedMemoryHandle[]', which suggests sharedFontListBlocks, the last member — assuming we're actually in the SetXPCOMProcessAttributes case.

There's also a Windows crash where the message is about SystemFontListEntry[], which is the type of the field before sharedUASheetHandle, but that one has no line number.

Duplicate of this bug: 1674038

Looking at the crash reports again, I notice that the Windows crashes with signature mention a variety of types, including plain data but not shared memory handles; however, the Linux crashes are over an order of magnitude more numerous and they're all about SharedMemoryHandle arrays.

This lines up with bug 1674038 comment #9 about failing to receive file descriptors sent with SCM_RIGHTS.

If we'd gotten MSG_CTRUNC we should get this error message and raise an error on the channel and (I think?) not get as far as message dispatch. Unreported loss of fds is also detected similarly.

But we can get a deserialization error if a received fd is invalid (received as a negative number). So, maybe the other process sent an invalid fd?

(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #9)

But we can get a deserialization error if a received fd is invalid (received as a negative number). So, maybe the other process sent an invalid fd?

Doubtful. The opposite of that function, WriteFileDescriptor, is called only if the fd is valid — the serialized form has a bool for whether the fd is valid, and the fd is attached if and only if that's true.

So… in theory this shouldn't happen, unless there was data corruption, or an undetected build mismatch between the parent and content processes, or something else that could cause an IPC protocol violation.

But then there's bug 1674038 comment #1, which if I understand correctly is reporting it happened reliably… but only with add-ons enabled? I can't think of anything right now that would explain that.

Duplicate of this bug: 1686205

(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #10)

But then there's bug 1674038 comment #1, which if I understand correctly is reporting it happened reliably… but only with add-ons enabled? I can't think of anything right now that would explain that.

Not about add-ons, I think: the fact that Safe Mode avoided this crash was probably simply because in Safe Mode we (currently) disable the shared font list and just use the legacy per-process font management. So there'd be no list of sharedFontListBlocks being sent in the SetXPCOMProcessAttributes message.

Crash Signature: [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::dom::PContentChild::OnMessageReceived] → [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::dom::PContentChild::OnMessageReceived] [@ mozilla::ipc::FatalError | mozilla::dom::PContentChild::OnMessageReceived]
Duplicate of this bug: 1714310

I think this issue might be related to this one discussed here

image.png

Basically this thing just happens everytime no matter what I do. Only way I can make firefox load websites is using troubleshoot option which disables all addons etc but I have fresh new profile with 0 changes and still it crashes?

if someone can guide me I can offer better troubleshoot.

Have the same issue with the following bug report https://crash-stats.mozilla.org/report/index/5592d1f3-25c3-4da0-8b2c-747640210608

A workaround for me: set gfx.e10s.font-list.shared to false.

Crash Signature: [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::dom::PContentChild::OnMessageReceived] [@ mozilla::ipc::FatalError | mozilla::dom::PContentChild::OnMessageReceived] → [@ mozilla::ipc::FatalError | mozilla::ipc::IProtocol::HandleFatalError | mozilla::dom::PContentChild::OnMessageReceived] [@ mozilla::ipc::FatalError | mozilla::dom::PContentChild::OnMessageReceived]
Keywords: regression
Regressed by: 1694174

Jed, do you have any ideas for how we could try to get to the bottom of this? The summary of comments above seems to be more-or-less "this can't happen" short of some kind of IPC data corruption or a broken build, yet it refuses to go away; most recently, shipping 89 to release with gfx.e10s.font-list.shared enabled has given us a spike, but it's the same thing as has been happening for months.

(FWIW, I notice looking at a couple of crashes that are on Windows, the errors are not about deserializing SharedMemoryHandle[]; one was an nsString, and another was VisitedQueryResult. So not every crash shown in the stats here is connected to the font-list shared memory, but that seems to be the most prevalent issue for some reason.)

Flags: needinfo?(jld)

(In reply to styang from comment #15)

Have the same issue with the following bug report https://crash-stats.mozilla.org/report/index/5592d1f3-25c3-4da0-8b2c-747640210608

A workaround for me: set gfx.e10s.font-list.shared to false.

this fixed the issue for me now firefox seems to be working as it should.

Hello!

This error has also started happening to me on FF 89.0.

  • this didn't happen using 88.x on the same machine.
  • setting gfx.e10s.font-list.shared to false fixed it for me.

Any help or testing I can do, don't hesitate to ask.

Duplicate of this bug: 1715613

(In reply to tbeckerson from comment #18)

Hello!

This error has also started happening to me on FF 89.0.

  • this didn't happen using 88.x on the same machine.
  • setting gfx.e10s.font-list.shared to false fixed it for me.

Any help or testing I can do, don't hesitate to ask.

As mentioned, this has been a valid workaround.
Do all of you have a large amount of fonts installed? I've done some testing in Ubuntu via the ~/.fonts folder and found that, for me (x86_64 with 8 GB RAM):

  • <3 GB of fonts installed, FF works fine
  • >3 GB of fonts installed, new tabs crash occasionally
  • >4 GB of fonts installed, every new tab crashes. This also happens if my fonts are in /usr/local/share/fonts .

This happens even when the fonts folder is only full of duplicates of large fonts (Source Han fonts are >50MB for each weight, for instance), so I'm led to think it's not about which fonts are installed, but the total size of fonts installed. The thing about 3-4 GB being a "limit" could also be down the RAM available on my system. Maybe some further testing can be done on this front.

Hope this helps.

Thanks for your testing efforts, 17sclu, that's really helpful detail.

I have two machines, both running Arch Linux.

  • A desktop with 32G RAM, 800MB of fonts in ~/.local/share/fonts, 3.4G of fonts in /usr/share/fonts, and Firefox 89 works fine without setting gfx.e10s.font-list.shared to false.
  • A laptop with 16G RAM, 800MB of fonts in ~/.local/share/fonts, 44MB of fonts in ~/.fonts, 3.3G of fonts in /usr/share/fonts, and Firefox 89 crashes on every tab unless gfx.e10s.font-list.shared is set to to false. (for this machine, I have some additional fonts in ~/.fonts_backup, hope it does not make a difference. It is about 2.4G.)

All sizes calculated with du -d 1 -a -h|tail -1

Confirmed, I also have lots of fonts installed.

5.1G in ~/.fonts, and 1.6G in /usr/share/fonts.
My desktop has 32G of RAM.

Assignee: nobody → jfkthame
Status: NEW → ASSIGNED

Thanks everyone for the information, that's been really helpful. I have a patch that should help here.

An experimental build is under way at https://treeherder.mozilla.org/jobs?repo=try&revision=561730de7b65cb9a128a380fbfd4d3f3cd65590a, and should be available for testing in an hour or two; if some people who have encountered this crash could confirm if it resolves the problem, that'd be great.

See also bug 1714652. The warning logs posted there show that we're running into the MAX_DESCRIPTORS_PER_MESSAGE limit, and the IPC code doesn't handle that, it just falls over.

The patch here should greatly reduce the amount of shared memory being passed around with these massive font installations; we might also want to make the code more robust in the event that we still encounter MAX_DESCRIPTORS_PER_MESSAGE in any situation, but I believe this patch will help a lot.

See Also: → 1714652

(In reply to Jonathan Kew (:jfkthame) from comment #25)

Thanks everyone for the information, that's been really helpful. I have a patch that should help here.

An experimental build is under way at https://treeherder.mozilla.org/jobs?repo=try&revision=561730de7b65cb9a128a380fbfd4d3f3cd65590a, and should be available for testing in an hour or two; if some people who have encountered this crash could confirm if it resolves the problem, that'd be great.

It works with a large .fonts directory. Thank you!

Comment on attachment 9226336 [details]
Bug 1664151 - Drop FC_CHARSET element from fontconfig patterns for TrueType/OpenType fonts, as we will read the cmap directly anyhow. r=lsalzman

Beta/Release Uplift Approval Request

  • User impact if declined: Crashes for Linux users with extremely large numbers of big fonts installed.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: (Fix confirmed on try build by an affected user.)
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Just filters out redundant, potentially-large data from the fontconfig patterns we store in shared memory, to reduce shmem pressure.
  • String changes made/needed:
Attachment #9226336 - Flags: approval-mozilla-beta?
Pushed by jkew@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/71ef967a7395
Drop FC_CHARSET element from fontconfig patterns for TrueType/OpenType fonts, as we will read the cmap directly anyhow. r=lsalzman

(In reply to Jonathan Kew (:jfkthame) from comment #25)

An experimental build is under way at https://treeherder.mozilla.org/jobs?repo=try&revision=561730de7b65cb9a128a380fbfd4d3f3cd65590a, and should be available for testing in an hour or two; if some people who have encountered this crash could confirm if it resolves the problem, that'd be great.

How can I test it? By checking out the code and build on my machine?

Flags: needinfo?(jfkthame)

(In reply to Julien Cristau [:jcristau] from comment #31)

You can download a (linux x86_64) build from https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/A1jQ5XeCQpauLsPm2FoBzg/runs/0/artifacts/public/build/target.tar.bz2

Thanks! This build fixes the problem for me.

I can also confirm that build fixes the problem, thanks!

Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 91 Branch

Comment on attachment 9226336 [details]
Bug 1664151 - Drop FC_CHARSET element from fontconfig patterns for TrueType/OpenType fonts, as we will read the cmap directly anyhow. r=lsalzman

approved for 90.0b7, thanks

Attachment #9226336 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.