Closed Bug 1448868 Opened 5 years ago Closed 4 years ago

Crash in xul.dll@0x... | style::media_queries::parse_media_query_list<T>

Categories

(Core :: CSS Parsing and Computation, defect, P3)

60 Branch
x86
Windows
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 --- wontfix
firefox59 --- unaffected
firefox60 --- wontfix
firefox61 --- wontfix

People

(Reporter: philipp, Unassigned)

References

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is
report bp-f39d8355-d79b-4e46-93a2-708a20180326.
=============================================================

Top 10 frames of crashing thread:

0 xul.dll xul.dll@0x3faa53 
1 xul.dll style::media_queries::parse_media_query_list<style::error_reporting::NullReporter> servo/components/style/media_queries.rs:265
2 xul.dll geckoservo::glue::Servo_MediaList_SetText servo/ports/geckolib/glue.rs:3102
3 xul.dll mozilla::ServoMediaList::SetTextInternal layout/style/ServoMediaList.cpp:53
4 xul.dll mozilla::dom::MediaList::Create layout/style/MediaList.cpp:82
5 xul.dll mozilla::dom::MediaQueryList::MediaQueryList layout/style/MediaQueryList.cpp:30
6 xul.dll nsIDocument::MatchMedia dom/base/nsDocument.cpp:6627
7 xul.dll nsGlobalWindowOuter::MatchMediaOuter dom/base/nsGlobalWindowOuter.cpp:3582
8 xul.dll nsGlobalWindowInner::MatchMedia dom/base/nsGlobalWindowInner.cpp:3452
9 xul.dll mozilla::dom::WindowBinding::matchMedia dom/bindings/WindowBinding.cpp:2915

=============================================================

this content crash is regressing in volume in firefox 60. it appears to hit 32bit users of firefox on windows mostly under memory pressure (in most reports the "System memory use percentage" value is in the 80s and 90s).
the first 60.0a1 nightly with a crash report like this was build 20180210100316

this problem is gonna be a bit hard to track, since it changes signature in each build. currently the signature is the #14 top content crash on 60.0b6, accounting for 0.74% of all content crashes. this query should catch all the reports:

https://crash-stats.mozilla.com/search/?signature=%24style%3A%3Amedia_queries%3A%3Aparse_media_query_list%3CT%3E&build_id=%3E%3D20180210100316&date=%3E%3D2017-12-24T01%3A00%3A00.000Z&date=%3C2018-03-26T15%3A33%3A24.000Z&_sort=-date&_facets=signature&_facets=version&_facets=user_comments&_facets=build_id&_facets=platform_pretty_version&_facets=useragent_locale&_facets=release_channel&_facets=cpu_arch&_facets=system_memory_use_percentage&_facets=available_virtual_memory&_facets=available_physical_memory#crash-reports
Emilio, is this maybe related to bug 1447433?
Flags: needinfo?(emilio)
I don't think this is related to bug 1447433, although some of the signatures linked there seem to belong this bug.

The crashes seem to show EXCEPTION_ILLEGAL_INSTRUCTION and it happens on a line seemingly innocent: https://searchfox.org/mozilla-central/rev/57bbc1ac58816dc054df242948f3ecf75e12df5f/servo/components/style/media_queries.rs#265

Given this is 32bit windows only, I suspect that the rust compiler may miscompile or generate something not compitable with those CPU settings.
Flags: needinfo?(emilio)
dmajor, do you have any suggestion here?

I guess we may need to know what the problematic instruction is, and check whether it should be there at all, but I don't know what's the best way to inspect that.

If that is indeed an illegal instruction which is not supposed to be there, this is a rustc bug we may need to workaround somehow.
Flags: needinfo?(dmajor)
When dealing with Rust code, EXCEPTION_ILLEGAL_INSTRUCTION is almost always `ud2` which is the mechanism of Rust panics. (Analogous to the EXCEPTION_BREAKPOINT used by MOZ_CRASH())
Flags: needinfo?(dmajor)
Hmm, but we compile with panic=abort, and we're supposed to have the panic message in MOZ_Crash_Reason: https://crash-stats.mozilla.com/report/index/f4bc372f-5382-4c7a-a926-d63c60180330

Is the compiler jumping to the wrong offset or something?
I see an "unreachable" in parse_media_query_list. What kind of assembly does that turn into? Does it generate a panic message? Or just straight up `ud2`?
It does generate a panic with "Entered unreachable code" as a message.
(See https://crash-stats.mozilla.com/report/index/d7796b23-6a54-424a-aedb-267a30180330 for an example, though that's another crash I cannot make any sense of)
Also it's suspicious that we got a nonsense top frame. A large fraction of these have dangerous (<200M) amounts of AvailableVirtualMemory. It's possible that the lack of memory is interfering with our collection of the crash report.
In that case it's possible that it's just Rust OOMing when pushing to the vector in:

  https://searchfox.org/mozilla-central/rev/42b34ba1961e37e0d2236deafdd126a0ba21b9ec/servo/components/style/media_queries.rs#247

Or while consuming the input, which can allocate.
Crash Signature: style::media_queries::parse_media_query_list<T>] → style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3fa973 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3faad3 | style::media_queries::parse_media_query_list<T>]
Crash Signature: style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3fa973 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3faad3 | style::media_queries::parse_media_query_list<T>] → style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3fa973 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3faad3 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3faa63 | style::media_queries::parse_media_quer…
Needs a prioritization.
Flags: needinfo?(emilio)
See comment 9 / comment 10, at this point it's not clear to me whether it's OOM or a rust miscompilation for win32. If the first, we should have a rust panic message as the MOZ_CRASH_REASON...

It'd be nice to confirm that the rust panic infrastructure works in Win32, but I can't do so locally because I have no Win32 machine at all.
Flags: needinfo?(emilio)
(In reply to Emilio Cobos Álvarez [:emilio] from comment #12)
> See comment 9 / comment 10, at this point it's not clear to me whether it's
> OOM or a rust miscompilation for win32. If the first, we should have a rust
> panic message as the MOZ_CRASH_REASON...
> 
> It'd be nice to confirm that the rust panic infrastructure works in Win32,
> but I can't do so locally because I have no Win32 machine at all.

Milan: do we have a Win32 machine (not VM) in Toronto we can use to confirm that OOMs in Rust code correctly unwind on panic?
Flags: needinfo?(milan)
We may be able to dig something up, running Windows 7 on Pentium 4
Flags: needinfo?(milan) → needinfo?(a.beingessner)
Alex Crichton on the Rust team has a win32 machine setup for exactly this kind of thing; what's the test-case we should ask him to try?
Flags: needinfo?(a.beingessner)
Crash Signature: style::media_queries::parse_media_query_list<T>] → style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9b53 | style::media_queries::parse_media_query_list<T>]
(For comment 15.)
Flags: needinfo?(emilio)
I chatted with Alex over IRC and tried to crash release, and received the panic message fine. I didn't have a nightly crasher ATM, so couldn't test on Nightly Win32 :(.
Flags: needinfo?(emilio)
(In reply to Emilio Cobos Álvarez [:emilio] from comment #17)
> I chatted with Alex over IRC and tried to crash release, and received the
> panic message fine. I didn't have a nightly crasher ATM, so couldn't test on
> Nightly Win32 :(.

Was going to suggest https://addons.mozilla.org/en-US/firefox/addon/crash-me-now-simple/, but I guess it isn't compatible with the current Firefox.
setting devtools.chrome.enabled to true and pasting the following lines into the browser console will still crash the browser: https://github.com/luser/crashme-simple/blob/master/bootstrap.js#L116-L120
Triage/Assign?
Flags: needinfo?(mreavy)
Crash Signature: style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9b53 | style::media_queries::parse_media_query_list<T>] → style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9b53 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9b13 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9a53 | style::media_queries::parse_media_quer…
Mike - I recall you mentioned a Rust unwinding issue. Is this bug related to that?
Flags: needinfo?(mh+mozilla)
I've asked Anthony as manager to drive this investigation to get us past the Rust unwind issue.  Then we should have a better idea of what's going on here.  It's likely an OOM we can't do anything about, but I'd feel A LOT better knowing that given the crash rate we're seeing in Beta.  (Thanks, Anthony and Mike!)
Flags: needinfo?(mreavy) → needinfo?(ajones)
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #21)
> Mike - I recall you mentioned a Rust unwinding issue. Is this bug related to
> that?

Probably not, since this bug was reported before we upgraded to rustc 1.25 (which we have now reverted to 1.24).

Did someone look at what the actual assembly looked like at xul.dll@0x3faa53 ?
Flags: needinfo?(mh+mozilla)
It is an OOM. There is a code path from frame 1 -> a failed HeapAlloc -> frame 0.

https://crash-stats.mozilla.com/report/index/13847536-9839-4ed9-b63e-7765e0180421#tab-rawdump

"frame": 1, 
"function": "style::media_queries::parse_media_query_list<style::error_reporting::NullReporter>",  "module": "xul.dll", 
"module_offset": "0xbe994",

0:000> u xul+0xbe994-4
xul!alloc::vec::Vec<style::media_queries::MediaQuery>::push+0x22a [z:\build\build\src\servo\components\style\media_queries.rs @ 265] [inlined in xul!style::media_queries::parse_media_query_list<style::error_reporting::NullReporter>+0x7a0 [z:\build\build\src\servo\components\style\media_queries.rs @ 265]]:
100be990 e88b740100      call    xul!alloc::raw_vec::RawVec<style::media_queries::MediaQuery, alloc::heap::Heap>::double<style::media_queries::MediaQuery,alloc::heap::Heap> (100d5e20)

0:000> u 100d5e20 L100
...
100d5e66 e84112a700      call    xul!HeapAlloc (10b470ac)
100d5e6b 85c0            test    eax,eax
100d5e6d 742f            je      xul!alloc::raw_vec::RawVec<style::media_queries::MediaQuery, alloc::heap::Heap>::double<style::media_queries::MediaQuery,alloc::heap::Heap>+0x7e (100d5e9e)
...
100d5e9e 8d4de4          lea     ecx,[ebp-1Ch]
100d5ea1 e83a3b3200      call    xul!core::char_private::check+0xf0 (103f99e0)

0:000> u 103f99e0 L30
xul!core::char_private::check+0xf0:
...
103f9a4b e88072ffff      call    xul!core::fmt::write (103f0cd0)
103f9a50 83c404          add     esp,4
103f9a53 0f0b            ud2

which matches the crash report (only the last five hex digits matter):
"frame": 0, 
"module": "xul.dll", 
"module_offset": "0x3f9a53",
Note that for the function where we ultimately crashed, my debugger gave me a bogus name ("
xul!core::char_private::check") and no source line information, so I think the Rust debug info here is missing/broken.
In the disassembly, just before the fmt::write, I found the address of this string: 

0:000> dd 127b8d0c L2
127b8d0c  127b8d20 00000015

0:000> da 127b8d20 L15
127b8d20  "fatal runtime error: "

Which means the terminal function (with the broken debug info) is https://dxr.mozilla.org/rust/rev/1515cded744d9d3597564ce3c08f6205f5816821/src/libstd/sys_common/util.rs#29
Thank you, David, for analyzing this for us.  And thanks, Anthony, for driving this work forward.  I feel a lot better knowing this is an OOM.  (This is almost certainly a dup of OOM_SMALL.)  Marking as P3 for now.
Flags: needinfo?(ajones)
Priority: -- → P3
See Also: → 1458161
Crash Signature: style::media_queries::parse_media_query_list<T>] → style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9a13 | style::media_queries::parse_media_query_list<T>]
Crash Signature: style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9a13 | style::media_queries::parse_media_query_list<T>] → style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9a13 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x40ca22 | style::media_queries::parse_media_query_list<T>]
Crash Signature: style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9a13 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x40ca22 | style::media_queries::parse_media_query_list<T>] → style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9a13 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x40ca22 | static struct style::media_queries::MediaList style::media_queries::parse_media_query_list<T>]
Crash Signature: style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9a13 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x40ca22 | static struct style::media_queries::MediaList style::media_queries::parse_media_query_list<T>] → style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9a13 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x40ca22 | static struct style::media_queries::MediaList style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3…
Duplicate of this bug: 1473014
Crash Signature: xul.dll@0x3f9bc3 | style::media_queries::parse_media_query_list<T> ] [@ xul.dll@0x412e22 | static struct style::media_queries::MediaList style::media_queries::parse_media_query_list<T> ] [@ xul.dll@0x3f9b63 | style::media_queries::parse_media_query_lis… → xul.dll@0x3f9bc3 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x412e22 | static struct style::media_queries::MediaList style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9b63 | style::media_queries::parse_media_query_list<…
Crash Signature: style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x412e72 | static struct style::media_queries::MediaList style::media_queries::parse_media_query_list<T>] → style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x412e72 | static struct style::media_queries::MediaList style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9743 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3…
I get the crash signature from duplicate bug 1473014 when trying to scroll down on http://www.makeitmagic.ru/ with webrender enabled.

Does not occur if webrender is disabled.

bp-5e440243-dd5c-4987-97df-1a6340180723
2018-07-23T17:22:55: INFO : Narrowed inbound regression window from [34322841, 2a228292] (4 builds) to [38cfaf35, 2a228292] (2 builds) (~1 steps left)
2018-07-23T17:22:55: DEBUG : Starting merge handling...
2018-07-23T17:22:55: DEBUG : Using url: https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?changeset=2a2282923b5abdf8b95bad5a59e694ac8e744883&full=1
2018-07-23T17:22:56: DEBUG : Found commit message:
Bug 1450015. Enable blob invalidation by default. r=gankro


2018-07-23T17:22:56: DEBUG : Did not find a branch, checking all integration branches
2018-07-23T17:22:56: INFO : The bisection is done.
2018-07-23T17:22:56: INFO : Stopped
That is not this bug... Kats, is comment 30 / comment 31 tracked somewhere?
Flags: needinfo?(bugmail)
I get a crash signature for duplicate bug 1473014. So I posted follow up to that bug here as well since duplicate bug 1473014 is marked as resolved duplicate of this bug.
Bug 1450015 is a regression that induces:

Bug 1468020 (instant tab crash on http://myphoneandme.vodafone.com.tr/ with webrender enabled)
Bug 1473014 (crash occurs when trying to scroll down on http://www.makeitmagic.ru/ with webrender enabled)
Firefox 62.0b10 when trying to scroll down on http://www.makeitmagic.ru/ with webrender enabled
bp-4eb64c50-747b-41c9-826c-6dfa00180723
We should probably update the signature generation of these OOM crashes to include an extra stack frame or two, so that we can distinguish different sources of OOMs. I've filed bug 1477726 for that. And I've filed bug 1477727 for the OOM crash that get_logan was seeing.
Flags: needinfo?(bugmail)
Will, any update on bug 1477726? Did that change land in production?

Kats: on the OOM crash, bug 1477727 eventually leads to bug 1456555. Do you want to keep this issue open or maybe close it and track it in bug 1456555?
Flags: needinfo?(willkg)
Flags: needinfo?(kats)
Liz: Yes, the fixes for bug #1477726 landed and are in production. That happened in socorro 327: https://bugzilla.mozilla.org/show_bug.cgi?id=1481226
Flags: needinfo?(willkg)
This issue and bug 1456555 have different root causes, so we should keep both.
Flags: needinfo?(kats)
Adding a few signatures. We no longer have crashes with these signatures starting with 62, so either it got fixed or the signature morphed. The volume today is very low on only affects ESR so I am marking it as worksforme since the bug has been stalled for several months.
Status: NEW → RESOLVED
Crash Signature: xul.dll@0x3f9b33 | style::media_queries::parse_media_query_list<T>] → xul.dll@0x3f9b33 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3fcc03 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x3f9983 | style::media_queries::parse_media_query_list<T>] [@ xul.dll@0x40ca22 | style::media_querie…
Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.