Crash in [@ nsHtml5StreamParser::nsHtml5StreamParser]
Categories
(Core :: DOM: HTML Parser, defect, P5)
Tracking
()
People
(Reporter: gsvelto, Unassigned)
Details
(Keywords: crash, Whiteboard: [likely a CPU bug plus failure to update microcode])
Crash Data
Crash report: https://crash-stats.mozilla.org/report/index/dd696029-1a5c-4345-a2f4-c89730220131
Reason: SIGSEGV / SEGV_MAPERR
Top 10 frames of crashing thread:
0 libxul.so nsHtml5StreamParser::nsHtml5StreamParser parser/html/nsHtml5StreamParser.cpp:236
1 libxul.so nsHtml5Parser::MarkAsNotScriptCreated parser/html/nsHtml5Parser.cpp:536
2 libxul.so nsHTMLDocument::StartDocumentLoad dom/html/nsHTMLDocument.cpp:378
3 libxul.so nsContentDLF::CreateInstance layout/build/nsContentDLF.cpp:123
4 libxul.so nsDocShell::CreateContentViewer docshell/base/nsDocShell.cpp:7683
5 libxul.so nsDSURIContentListener::DoContent docshell/base/nsDSURIContentListener.cpp:186
6 libxul.so nsDocumentOpenInfo::TryContentListener uriloader/base/nsURILoader.cpp:632
7 libxul.so nsDocumentOpenInfo::OnStartRequest uriloader/base/nsURILoader.cpp:155
8 libxul.so nsBaseChannel::OnStartRequest netwerk/base/nsBaseChannel.cpp:819
9 libxul.so nsInputStreamPump::OnInputStreamReady netwerk/base/nsInputStreamPump.cpp:371
Not a new crash but it seems to be gaining significant volume on nightly. The crash is happening here which suggests that nsHtml5Module::GetStreamParserThread()
returned NULL. This might be related to bug 1642086 comment 2.
Comment 1•2 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #0)
which suggests that
nsHtml5Module::GetStreamParserThread()
returned NULL.
Indeed.
All but two crashes are on Linux.
All are 64-bit.
A suspiciously high proportion is ALT 8 SP Workstation (cliff).
The crashes are clustered to Coffee Lake and Atom Z36xxx/Z37xxx, which hints at a CPU bug and Linux configurations that don't load microcode updates into the CPU.
Most crashes are near startup but still late enough that normal startup should have completed.
We don't seem to get microcode versions in Linux crash reports, but I think there are enough indications to dismiss this as a CPU bug that's probably remedied by a microcode update that ALT Linux probably isn't applying.
This might be related to bug 1642086 comment 2.
What relation do you mean?
Aside: We could remove StaticPrefs::html5_offmainthread()
. I'm not aware of anyone flipping the pref for any legitimate reason anyway.
Reporter | ||
Comment 2•2 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #1)
The crashes are clustered to Coffee Lake and Atom Z36xxx/Z37xxx, which hints at a CPU bug and Linux configurations that don't load microcode updates into the CPU.
Most crashes are near startup but still late enough that normal startup should have completed.
We don't seem to get microcode versions in Linux crash reports, but I think there are enough indications to dismiss this as a CPU bug that's probably remedied by a microcode update that ALT Linux probably isn't applying.
The data is in the dump but we don't surface it, see bug 1320921.
(In reply to Henri Sivonen (:hsivonen) from comment #1)
What relation do you mean?
This is happening on the main thread, IIUC that object might be freed on the parser thread so I thought we might be getting a NULL pointer because it was cleared in the parser thread.
Aside: We could remove
StaticPrefs::html5_offmainthread()
. I'm not aware of anyone flipping the pref for any legitimate reason anyway.
Good point.
Comment 3•2 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #2)
(In reply to Henri Sivonen (:hsivonen) from comment #1)
The crashes are clustered to Coffee Lake and Atom Z36xxx/Z37xxx, which hints at a CPU bug and Linux configurations that don't load microcode updates into the CPU.
Most crashes are near startup but still late enough that normal startup should have completed.
We don't seem to get microcode versions in Linux crash reports, but I think there are enough indications to dismiss this as a CPU bug that's probably remedied by a microcode update that ALT Linux probably isn't applying.
The data is in the dump but we don't surface it, see bug 1320921.
I'm leaving this open in order to have a bug number to attribute the stacks to, but I'm treating this as a WONTFIX given the information available so far. Hence, lowering severity and priority.
(In reply to Henri Sivonen (:hsivonen) from comment #1)
What relation do you mean?
This is happening on the main thread, IIUC that object might be freed on the parser thread so I thought we might be getting a NULL pointer because it was cleared in the parser thread.
nsHtml5StreamParser
is supposed to be freed on the main thread, always. Bug 1642086 relates to the mechanism that handles the freeing on the main thread.
Comment 4•2 years ago
|
||
The bug is linked to a topcrash signature, which matches the following criteria:
- Top 10 content process crashes on beta
- Top 5 desktop browser crashes on Linux on beta
:hsivonen, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 5•2 years ago
|
||
(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #4)
:hsivonen, could you consider increasing the severity of this top-crash bug?
Continuing to assume that this crash is due to failure to apply CPU microcode updates on Linux.
Comment 6•2 years ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 7•1 year ago
|
||
The bug is linked to a topcrash signature, which matches the following criterion:
- Top 5 desktop browser crashes on Linux on beta
:hsivonen, could you consider increasing the severity of this top-crash bug?
For more information, please visit auto_nag documentation.
Comment 8•1 year ago
|
||
Bug 1791728 would make it easy to confirm the hypothesis at a glance.
I see a lot of crashes with Intel family 158 (decimal), which by now should be at least at microcode version 0xEC, but the crashes all show earlier microcode. This supports the hypothesis that this crash is due to failure to apply microcode updates.
gsvelto, now that crash stats has CPU microcode data for Linux, how do I see the CPU info and CPU microcode fields in a joined table so that I can systematically see what CPU models the microcode versions apply to?
Reporter | ||
Comment 9•1 year ago
|
||
You can use this query that adds two columns to the reports table with CPU info values and microcode versions. The majority of the crashes are indeed coming from Intel family 158 machines with microcodes that are lower than 0xEC.
Comment 10•1 year ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #1)
Aside: We could remove
StaticPrefs::html5_offmainthread()
. I'm not aware of anyone flipping the pref for any legitimate reason anyway.
Bug 1801862. It might perturb things enough to make this go away. Or it might just change the crash signature.
Comment 11•1 year ago
|
||
Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.
For more information, please visit auto_nag documentation.
Comment 12•1 year ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #10)
Bug 1801862. It might perturb things enough to make this go away. Or it might just change the crash signature.
This has now landed, but after the bot remark in the previous comment.
Comment 13•1 year ago
|
||
Looks like this simply went away after 108.0rc2. Perhaps a compiler optimization was perturbed enough not to trigger the CPU bug anymore?
Updated•1 year ago
|
Comment 14•1 year ago
|
||
Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.
Description
•