Closed
Bug 1451741
Opened 8 years ago
Closed 7 years ago
Crash in _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::cascade_style_and_visited::h01fb6becc6636cd4 on Snapdraon 820/821
Categories
(Core :: CSS Parsing and Computation, defect, P2)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox59 | --- | unaffected |
firefox60 | + | wontfix |
firefox61 | --- | ? |
People
(Reporter: marcia, Unassigned)
References
Details
(Keywords: crash, regression, topcrash)
Crash Data
This bug was filed from the Socorro interface and is
report bp-4f442901-4009-4e71-8fdd-453e50180405.
=============================================================
New crash in Fennec B9: https://bit.ly/2q6OLJl.
Changes between B7-B9: https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=FENNEC_60_0b7_RELEASE&tochange=d2564a0c25e5084dbd064cf546d98a9f72eed1f3&full=1
https://mzl.la/2uFHvcP
Top 10 frames of crashing thread:
0 @0xaaa8a050
1 libxul.so _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::cascade_style_and_visited::h01fb6becc6636cd4 servo/components/style/style_resolver.rs:304
2 libxul.so Gecko_AttrEquals dom/base/Element.h:1498
3 libxul.so wcsrtombs
4 libxul.so wcsrtombs
5 libxul.so _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::resolve_style::hd220abc1e26e8639 servo/components/style/style_resolver.rs:383
6 libxul.so style::traversal::compute_style::h63ed2411e2498e09 servo/components/style/style_resolver.rs:268
7 libxul.so wcsrtombs
8 libxul.so wcsrtombs
9 libxul.so wcsrtombs
=============================================================
Reporter | ||
Comment 2•8 years ago
|
||
This is now the top crash in Fennec B9. APIs from 23-27 are affected.
Here are some URLs:
*http://m.ign.com/wikis/the-legend-of-zelda-breath-of-the-wild/Trial_of_the_Sword
*https://arstechnica.com/gadgets/2018/04/intel-drops-plans-to-develop-spectre-microcode-for-ancient-chips/?comments=1
*http://himasoku.com/archives/52033529.html
*https://www.reddit.com/r/news/comments/89phlf/theyve_given_us_a_raise_and_nothing_for_students/?utm_content=comments&utm_medium=hot&utm_source=reddit&utm_name=frontpage
Highest percentage of crashes on device ONEPLUS A3003.
Keywords: topcrash
Comment 3•8 years ago
|
||
The reports I looked at don't have a sensible stack either :(
status-firefox59:
--- → unaffected
status-firefox-esr52:
--- → unaffected
Comment 4•8 years ago
|
||
The summary page doesn't make sense either... How can 158 crash reports come from 181 installations?
Reporter | ||
Comment 5•8 years ago
|
||
This is currently the top crash in Beta 9 - we shall see if it shows up in Beta 11 as well.
Comment 6•8 years ago
|
||
(In reply to Julien Cristau [:jcristau] from comment #3)
> The reports I looked at don't have a sensible stack either :(
crash reporter won't generate better stack well due to https://github.com/rust-lang/rust/issues/49867.
:bholley - just keeping this on your radar
Flags: needinfo?(bobbyholley)
Comment 8•8 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #7)
> :bholley - just keeping this on your radar
I don't think anyone is actively working on this.
I had a look at the regression range, and none of the patches touch the style system. The crashing thread stacks are all, garbage, as are most of the main thread ones. I did find one main-thread stack that was not garbage, and it ran through AndroidDynamicToolbarAnimator::UpdateFrameMetrics, which would presumably not be doing a style flush.
So I think this is in the category of things where android crash diagnostic expertise is more relevant than style system expertise.
Flags: needinfo?(bobbyholley)
Comment 9•8 years ago
|
||
This is the #2 top crash on beta this week. I'm concerned that we're shipping stylo-android in 60 and it sounds from comment 6 that we're going blind wrt crashes.
Updated•8 years ago
|
Flags: needinfo?(m_kato)
Comment 10•8 years ago
|
||
Does anyone modify rust, or servo code between 60.0b7 to 60.0b9? I cannot make sense this crash.... LTO?
Comment 11•8 years ago
|
||
Another crash signature that looks similar and also started showing up in 60.0b9
Crash Signature: [@ _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::cascade_style_and_visited::h01fb6becc6636cd4] → [@ _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::cascade_style_and_visited::h01fb6becc6636cd4]
[@ style::parallel::traverse_nodes::h59de8f1a2ce4849c]
Updated•8 years ago
|
Crash Signature: [@ _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::cascade_style_and_visited::h01fb6becc6636cd4]
[@ style::parallel::traverse_nodes::h59de8f1a2ce4849c] → [@ _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::cascade_style_and_visited::h01fb6becc6636cd4]
[@ style::parallel::traverse_nodes::h59de8f1a2ce4849c]
[@ style::stylist::Stylist::cascade_st…
Flags: needinfo?(m_kato)
Updated•8 years ago
|
Flags: needinfo?(m_kato)
Comment 12•8 years ago
|
||
interesting point is that most (all?) devices are Snapdragon 820/821.
Comment 13•8 years ago
|
||
snorp, do you know any errata information for snapdragon 820/821 that is memory synchronization issue? This crash is only on snapdragon 820/821, but no crash on 835 and others
Also, I found a chrome bug that is cache line bug of Kryo. (https://bugs.chromium.org/p/chromium/issues/detail?id=656693)
Flags: needinfo?(m_kato) → needinfo?(snorp)
Updated•8 years ago
|
Crash Signature: style::stylist::Stylist::cascade_style_and_visited::hc986e55be637888c] → style::stylist::Stylist::cascade_style_and_visited::hc986e55be637888c]
[@ @0x0 | wcsrtombs]
Comment 14•8 years ago
|
||
"@0x0 | wcsrtombs" is related, but not all.
Crash Signature: style::stylist::Stylist::cascade_style_and_visited::hc986e55be637888c]
[@ @0x0 | wcsrtombs] → style::stylist::Stylist::cascade_style_and_visited::hc986e55be637888c]
Updated•7 years ago
|
Summary: Crash in _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::cascade_style_and_visited::h01fb6becc6636cd4 → Crash in _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::cascade_style_and_visited::h01fb6becc6636cd4 on Snapdraon 820/821
Comment 15•7 years ago
|
||
(In reply to Julien Cristau [:jcristau] from comment #9)
> This is the #2 top crash on beta this week. I'm concerned that we're
> shipping stylo-android in 60 and it sounds from comment 6 that we're going
> blind wrt crashes.
I think this is a reasonable concern. We probably should ask the Rust team to prioritize rust-lang/rust#49867. Even if it would miss 60, we still want this to be available as soon as possible. It would be even better if Rust can issue a patch version for the compiler we are using for 60.
Alex, could you help on this?
Flags: needinfo?(acrichton)
Comment 16•7 years ago
|
||
(hmm, this crash may not occur on b13...)
![]() |
||
Comment 17•7 years ago
|
||
Sure yeah I can try to help with this!
I could definitely see how https://github.com/rust-lang/rust/issues/49867 is the underlying issue here, but I'd want to understand a few things first.
* What version of Rust are y'all using for 60?
* If you compile with `-C panic=unwind` instead of `-C panic=abort`, do better stack traces happen?
* How can you tell, inspecting the binary, whether it's built correctly?
* What target are you using for rustc for this platform?
Flags: needinfo?(acrichton)
(In reply to Makoto Kato [:m_kato] from comment #13)
> snorp, do you know any errata information for snapdragon 820/821 that is
> memory synchronization issue? This crash is only on snapdragon 820/821, but
> no crash on 835 and others
Sorry, no idea. This is a fascinating problem, though!
Flags: needinfo?(snorp)
Comment 19•7 years ago
|
||
(In reply to Alex Crichton [:acrichto] from comment #17)
> Sure yeah I can try to help with this!
>
> I could definitely see how https://github.com/rust-lang/rust/issues/49867 is
> the underlying issue here, but I'd want to understand a few things first.
>
> * What version of Rust are y'all using for 60?
The minimum requirement is 1.24 but it's not clear whether we can use a newer version for the official build, at least for Android. This would be a question for build team.
> * If you compile with `-C panic=unwind` instead of `-C panic=abort`, do
> better stack traces happen?
> * How can you tell, inspecting the binary, whether it's built correctly?
> * What target are you using for rustc for this platform?
These are questions for Kato-san.
Updated•7 years ago
|
Flags: needinfo?(m_kato)
Comment 20•7 years ago
|
||
(In reply to Alex Crichton [:acrichto] from comment #17)
> Sure yeah I can try to help with this!
>
> I could definitely see how https://github.com/rust-lang/rust/issues/49867 is
> the underlying issue here, but I'd want to understand a few things first.
>
> * What version of Rust are y'all using for 60?
> * If you compile with `-C panic=unwind` instead of `-C panic=abort`, do
> better stack traces happen?
Yes, panic=unwind is expected result for unwind information generation.
> * How can you tell, inspecting the binary, whether it's built correctly?
Bug 1453220 is original bug. When using -C panic=abort, object file doesn't have unwind information.
When using readelf command, you can show unwind information. When using -C panic=abort, all functions are "cantunwind".
Unwind section '.ARM.exidx.text.main' at offset 0x1e0 contains 1 entry:
0x0 <main>: 0x1 [cantunwind]
But using -C panic=unwind,
Unwind section '.ARM.exidx.text.main' at offset 0x1e0 contains 1 entry:
0x0 <main>: 0x80018408
Compact model index: 0
0x01 vsp = vsp + 8
0x84 0x08 pop {r7, r14}
> * What target are you using for rustc for this platform?
armv7-linux-androideabi.
Flags: needinfo?(m_kato)
Reporter | ||
Comment 21•7 years ago
|
||
[@ @0x0 | _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::cascade_style_and_visited::h334d4cb73837e28a ] is another signature showing up on Nightly, but it looks more like a different chipset (looks as if the devices are mostly the 820)
Reporter | ||
Comment 22•7 years ago
|
||
(In reply to Marcia Knous [:marcia - needinfo? me] from comment #21)
> [@ @0x0 |
> _$LT$style..style_resolver..
> StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$
> $GT$::cascade_style_and_visited::h334d4cb73837e28a ] is another signature
> showing up on Nightly, but it looks more like a different chipset (looks as
> if the devices are mostly the 820)
I added the signature since I missed the 820 in the subject line.
Crash Signature: [@ _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::cascade_style_and_visited::h01fb6becc6636cd4]
[@ style::parallel::traverse_nodes::h59de8f1a2ce4849c]
[@ style::stylist::Stylist::cascade_st… → [@ _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20$$u27$le$C$$u20$E$GT$$GT$::cascade_style_and_visited::h01fb6becc6636cd4]
[@ @0x0 | _$LT$style..style_resolver..StyleResolverForElement$LT$$u27$a$C$$u20$$u27$ctx$C$$u20…
![]() |
||
Comment 23•7 years ago
|
||
Ok thanks for the info I've opened a PR to fix this in rust-lang/rust at https://github.com/rust-lang/rust/pull/50093. If that lands and you'd like we can also backport it to be released with 1.26. Unfortunately releasing it in 1.24 or 1.25 is probably not going to happen.
Comment 24•7 years ago
|
||
Do you think we can use a patched Rust 1.26 (maybe 1.26.1 or something?) for Firefox 60 release? Or at least for Fennec?
Flags: needinfo?(mh+mozilla)
Comment 25•7 years ago
|
||
Most certainly not. 1.26 is not even going to release before Firefox 60, and we're still trying to cope with the regressions from 1.25.
Flags: needinfo?(mh+mozilla)
Comment 26•7 years ago
|
||
Oh, 1.26 is still in beta... hmmm... Sounds like there is no way we would be able to get reasonable stack of style system for Fennec 60 :(
Comment 27•7 years ago
|
||
(In reply to Makoto Kato [:m_kato] from comment #6)
> (In reply to Julien Cristau [:jcristau] from comment #3)
> > The reports I looked at don't have a sensible stack either :(
>
> crash reporter won't generate better stack well due to
> https://github.com/rust-lang/rust/issues/49867.
Come to think of it, I don't think that's true. Crash reports get their stack trace using breakpad info, which is derived from the DWARF info in the binaries, which is independent from the unwind EHABI data. However, considering how many times we've had crashes in llvm-dsymutil on mac builds because of malformed DWARF data generated by the rust compiler, I wouldn't be entirely surprised if the DWARF data is broken in the first place.
Updated•7 years ago
|
Flags: needinfo?(m_kato)
Comment 28•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #27)
> (In reply to Makoto Kato [:m_kato] from comment #6)
> > (In reply to Julien Cristau [:jcristau] from comment #3)
> > > The reports I looked at don't have a sensible stack either :(
> >
> > crash reporter won't generate better stack well due to
> > https://github.com/rust-lang/rust/issues/49867.
>
> Come to think of it, I don't think that's true. Crash reports get their
> stack trace using breakpad info, which is derived from the DWARF info in the
> binaries, which is independent from the unwind EHABI data. However,
> considering how many times we've had crashes in llvm-dsymutil on mac builds
> because of malformed DWARF data generated by the rust compiler, I wouldn't
> be entirely surprised if the DWARF data is broken in the first place.
Hmm, does it mean that is another bug of LLVM?
Flags: needinfo?(m_kato)
Comment 29•7 years ago
|
||
Actually, it might or might not, but it might also not be relevant. I took a look at two of the crashes. The first thing I looked for is how the frames were found, and they all come from stack scanning. Which means they may just well be garbage. And the reason they come from stack scanning is that in the top frame, the instruction pointer, which is also the address of the crash, doesn't actually point in executable code. The return pointer does, however, point to the style code, but that may or may not be a red herring. In the two crashes I looked at, the crash address is suspiciously close to the stack pointer, which led me to look at a third, where it wasn't (but still not in executable code).
This means the lr pointer should be right, but when comparing its value to what the crash reports say the frame is at, that doesn't match (it doesn't even align to an instruction). If I manually look where lr points back to, that points back to code that did a `bl` (function call), but it does so with an immediate value, not a pointer from a vtable or otherwise, meaning there's no way /that/ `bl` call led to the crash itself. Something else along the way did a jump that didn't set lr.
Reporter | ||
Comment 30•7 years ago
|
||
It looks as if for 60 this was last seen in b12, but not seen in subsequent betas (for all three signatures).
Updated•7 years ago
|
Priority: -- → P2
Comment 31•7 years ago
|
||
It seems like this went away.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
Updated•7 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•