Closed Bug 1581121 Opened 5 years ago Closed 4 years ago

Crash in [@ Servo_CounterStyleRule_Debug]

Categories

(Core :: CSS Parsing and Computation, defect, P2)

69 Branch
x86
Linux
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr68 --- unaffected
firefox69 --- wontfix
firefox70 + wontfix
firefox71 --- wontfix
firefox72 --- wontfix
firefox73 --- wontfix
firefox74 --- wontfix
firefox76 --- wontfix
firefox77 --- wontfix
firefox78 --- wontfix

People

(Reporter: marcia, Assigned: emilio)

References

Details

(Keywords: crash, csectype-uaf, sec-high)

Crash Data

This bug is for crash report bp-91022245-f376-48e2-9bc1-136980190913.

Linux specific crash spike which was picked up in the email spike report: https://bit.ly/2kI644w. At least one of the signatures I saw had a potential UAF so marking this security sensitive as well.

The spike in this signature started on 9-12.

Top 10 frames of crashing thread:

0 libxul.so Servo_CounterStyleRule_Debug 
1 libxul.so Servo_CounterStyleRule_Debug 
2 libxul.so Servo_CounterStyleRule_Debug 
3 libxul.so Servo_SupportsRule_AddRef 
4 libxul.so std::_Rb_tree_node<std::pair<unsigned long long const, unsigned int> >* std::_Rb_tree<unsigned long long, std::pair<unsigned long long const, unsigned int>, std::_Select1st<std::pair<unsigned long long const, unsigned int> >, std::less<unsigned long long>, std::allocator<std::pair<unsigned long long const, unsigned int> > >::_M_copy<std::_Rb_tree<unsigned long long, std::pair<unsigned long long const, unsigned int>, std::_Select1st<std::pair<unsigned long long const, unsigned int> >, std::less<unsigned long long>, std::allocator<std::pair<unsigned long long const, unsigned int> > >::_Reuse_or_alloc_node> 
5 libxul.so std::_Rb_tree_node<std::pair<unsigned long long const, unsigned int> >* std::_Rb_tree<unsigned long long, std::pair<unsigned long long const, unsigned int>, std::_Select1st<std::pair<unsigned long long const, unsigned int> >, std::less<unsigned long long>, std::allocator<std::pair<unsigned long long const, unsigned int> > >::_M_copy<std::_Rb_tree<unsigned long long, std::pair<unsigned long long const, unsigned int>, std::_Select1st<std::pair<unsigned long long const, unsigned int> >, std::less<unsigned long long>, std::allocator<std::pair<unsigned long long const, unsigned int> > >::_Reuse_or_alloc_node> 
6 libxul.so std::_Rb_tree_node<std::pair<unsigned long long const, unsigned int> >* std::_Rb_tree<unsigned long long, std::pair<unsigned long long const, unsigned int>, std::_Select1st<std::pair<unsigned long long const, unsigned int> >, std::less<unsigned long long>, std::allocator<std::pair<unsigned long long const, unsigned int> > >::_M_copy<std::_Rb_tree<unsigned long long, std::pair<unsigned long long const, unsigned int>, std::_Select1st<std::pair<unsigned long long const, unsigned int> >, std::less<unsigned long long>, std::allocator<std::pair<unsigned long long const, unsigned int> > >::_Reuse_or_alloc_node> 
7 libxul.so std::_Rb_tree_node<std::pair<unsigned long long const, unsigned int> >* std::_Rb_tree<unsigned long long, std::pair<unsigned long long const, unsigned int>, std::_Select1st<std::pair<unsigned long long const, unsigned int> >, std::less<unsigned long long>, std::allocator<std::pair<unsigned long long const, unsigned int> > >::_M_copy<std::_Rb_tree<unsigned long long, std::pair<unsigned long long const, unsigned int>, std::_Select1st<std::pair<unsigned long long const, unsigned int> >, std::less<unsigned long long>, std::allocator<std::pair<unsigned long long const, unsigned int> > >::_Reuse_or_alloc_node> 
8 libxul.so std::_Rb_tree_node<std::pair<int const, int> >* std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_copy<std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_Alloc_node> 
9 libxul.so std::_Rb_tree_node<std::pair<int const, int> >* std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_M_copy<std::_Rb_tree<int, std::pair<int const, int>, std::_Select1st<std::pair<int const, int> >, std::less<int>, std::allocator<std::pair<int const, int> > >::_Alloc_node> 

That stack is nonsensical, yet other crash reports I look at have similarly odd stacks.

Another crash that looks a bit similar and is 69 and Linux only: https://bit.ly/2kwoAgp

Here are some correlations:

(98.28% in signature vs 43.00% overall) startup_crash = 0 [42.18% vs 174.76% if process_type = null]
(100.0% in signature vs 02.34% overall) Module "libfreeblpriv3.so" = true
(100.0% in signature vs 02.48% overall) Module "libwayland-cursor.so.0.0.0" = true
(100.0% in signature vs 02.54% overall) Module "libnss3.so" = true
(100.0% in signature vs 02.54% overall) platform = Linux
(99.66% in signature vs 02.21% overall) Module "libogg.so.0.8.2" = true
(99.49% in signature vs 02.14% overall) Module "libdatrie.so.1.3.3" = true
(90.05% in signature vs 01.81% overall) Module "libmozavutil.so" = true
(88.51% in signature vs 01.57% overall) Module "libgsm.so.1.0.12" = true
(88.51% in signature vs 01.63% overall) Module "libsoxr.so.0.1.1" = true
(85.25% in signature vs 01.88% overall) Addon "langpack-en-CA@firefox.mozilla.org" = true
(84.56% in signature vs 02.25% overall) Addon "langpack-en-GB@firefox.mozilla.org" = true

Hi Andrew, our telemetry based crash data is showing an uptick on linux crash incidence with release 69. There is a similar trend seen in crash-stats as mentioned in this bug. Can you help someone from engineering take a look? Thanks!

Flags: needinfo?(overholt)

I would have needinfoed heycam or svoisen but heycam's already commented here so I'll ask Sean :)

Flags: needinfo?(overholt) → needinfo?(svoisen)

i'll add the [@ gtk_xtbin_new] signature, it really looks similar and both often have mozilla::layers::AsyncPanZoomController on the stack.

Crash Signature: [@ Servo_CounterStyleRule_Debug] → [@ Servo_CounterStyleRule_Debug] [@ gtk_xtbin_new]

I'm not sure here. And emilio and heycam are out this week.

Looking through the stacks I do also see mozilla::layers::AsyncPanZoomController commonly on both. For @Servo_CounterStyleRule_Debug I also see mozilla::AnimationEventInfo. Not sure that there's any correlation but maybe Botond or Hiro have ideas here.

Flags: needinfo?(svoisen)
Flags: needinfo?(hikezoe.birchill)
Flags: needinfo?(botond)

I had a look at a couple of stack traces involving AsyncPanZoomController (e.g. this one or this one), but I can't make heads or tails of them.

For example, in this stack, we have:

  • In stack frame 11, a function that looks like a helper in the standard library's implementation of std::vector<std::string>::insert(begin, end) or a similar range insertion function.
  • In stack frames 8-10, functions that look like helpers in the implementation of a std::map<int, int> method.
  • In stack frames 4-7, functions that look like helpers in the implementation of a std::vector<RefPtr<AsyncPanZoomController>> or nsTArray<RefPtr<AsyncPanZoomController>> method.

There's no reason for any one of these sets of functions to be calling into any other.

There also isn't any indication in the stack trace where in our code we may be calling into these standard library functions.

Flags: needinfo?(botond)

I did also look a bunch of the reports, all call stacks make no sense to me. That's being said, I suspect this is related to animation stuff. There are other Linux only crash reports (since 69?), that is crash at MatrixForTransformFunction. I can't recall which bug landed in 68 time frame. CCing Boris, he could recall it. (Also keep NI to me for now)

The only one bug I could find is bug 1555548, but it shouldn't affect release builds since it's for individual transform properties which is pref-ed off on release builds.

Marcia, can you see suspicious URLs which are able to reproduce this crash locally?

Flags: needinfo?(hikezoe.birchill) → needinfo?(mozillamarcia.knous)

I don't see any real strange URLs - there are a mix of youtube, news sites such as NY Times, Facebook, etc. Looks to be as if people are just doing normal browsing operations.

Here are some comments:

*mozilla crashed while using internet banking with ANZ bank in Australia
*Crashed during DuckDuckGo search
*On Face Book surfing posts

Flags: needinfo?(mozillamarcia.knous)

curiously the crashes with the signatures currently attached to this bug have mostly stopped during the bast 24-48 hours, but there is another batch of signatures going through mozilla::layers::CompositorBridgeParent::CompositeToTarget which are spiking up:
https://bit.ly/2mn7PEN

Setting P3 for now since this is no actionable at this moment, and it seems that the crash has stopped in 69.0.1?

Priority: -- → P3

All these broken stacks are in Linux x86 builds... Is there anything wrong with the x86 stack walking or crash reporting code?

See Also: → 1581871

These crashes should all come from distro-packaged builds of Firefox on Ubuntu, Debian and possibly Arch. We didn't have symbols for those builds until a couple of weeks ago when I made scripts that would scrape those symbols directly from the various downstream packages. Unfortunately during the first iteration of this scraping some of the symbol and CFI information generated was bogus which lead to a few false signatures such as this ones.

So basically all this crashes used to be in the libxul.so+<address> form before my changes and now they are properly symbolicated but for a couple of days this didn't work as expected.

All recent crashes should have proper signatures and stacks - possibly including source line numbers.

From bug 1581871:

The platform version aggregation of those crashes suggest that this affects 32-bit Ubuntu-packaged builds of Firefox 69.0 and 69.0.1 across all their releases (there's kernel versions ranging from 3.13.x to 5.0.x). These all look like downstream builds because the modules aggregation shows four different builds of libxul.so, none of which correspond to ours. There's something very very worrying about these crashes: a lot of them have the memory poison pattern as the crashing address so these are UAFs. If we're not seeing these crashes in our builds then we should get in touch with Ubuntu's Firefox maintainer and warning them about this.

If I'm reading https://packages.ubuntu.com/bionic/firefox right they use the same build as debian?

Sylvestre, is there something interesting that Debian / Ubuntu do at build time for x86? This crash looks scary but architecture specific, so maybe a rustc bug or such? What's the difference between our build system configuration and Debian's / Ubuntu's?

Flags: needinfo?(sledru)
Crash Signature: [@ Servo_CounterStyleRule_Debug] [@ gtk_xtbin_new] → [@ Servo_CounterStyleRule_Debug] [@ gtk_xtbin_new] [@ free | _ZN4core3ptr18real_drop_in_place17hcf92816063beee25E]

Ubuntu, until Ubuntu moves to Snap for Firefox ( https://bugzilla.mozilla.org/show_bug.cgi?id=1297513#c6 ), Ubuntu still builds Firefox from sources.
So, it will use the Rustc + llvm + gcc from the Ubuntu archive (and, in some cases, use the library packages separately instead of our in-tree version - This is what Marcia saw in comment #3) . The easiest solution is to read at the log to figure out the versions used.
For example:
https://launchpadlibrarian.net/443140606/buildlog_ubuntu-disco-i386.firefox_69.0.1+build1-0ubuntu0.19.04.1_BUILDING.txt.gz
found from:
https://launchpad.net/~ubuntu-mozilla-security/+archive/ubuntu/ppa/+build/17782610

Gabriele also published an update on the stability ML that the debug symbols from Ubuntu packages should be available "Improved crash reports for Ubuntu/Debian".

chrisccoulson@ubuntu.com is the maintainer of this package.

Crash Signature: [@ Servo_CounterStyleRule_Debug] [@ gtk_xtbin_new] [@ free | _ZN4core3ptr18real_drop_in_place17hcf92816063beee25E] → [@ Servo_CounterStyleRule_Debug] [@ gtk_xtbin_new] [@ free | _ZN4core3ptr18real_drop_in_place17hcf92816063beee25E]
Flags: needinfo?(sledru)
See Also: → 1585383
Crash Signature: [@ Servo_CounterStyleRule_Debug] [@ gtk_xtbin_new] [@ free | _ZN4core3ptr18real_drop_in_place17hcf92816063beee25E] → [@ Servo_CounterStyleRule_Debug] [@ gtk_xtbin_new] [@ free | _ZN4core3ptr18real_drop_in_place17hcf92816063beee25E] [@ MatrixForTransformFunction ] [@ free | core::ptr::real_drop_in_place | servo_arc::Arc<T>::drop_slow ] [@ core::ptr::real_drop_in_pla…

I'll try to contact the maintainer.

Flags: needinfo?(emilio)

Ok, let's try ni? first...

Hi Chris, we're receiving a fair amount of reports of linux x86-specific crashes which seem to all come from Ubuntu-packaged builds. The crashes are very weird, but they al seem to have to do with Rust code calling into C++ or vice-versa.

From what I can tell reading the build logs, ubuntu builds firefox from source with gcc and rustc 1.36 atm... Are you aware of any particular part of the build that could be causing this?

I'm thinking of lto configurations, or rust-related build-system hacks, or such...

I tried to come up with at test-case blindly with what I thought could be the case (the generics used in extern "C" functions for which gcc warns about), but I couldn't make it break (see the commits in https://github.com/emilio/ubuntu-x86-fun for things I tried)...

In particular, what could explain all these crashes is this function call or related not working as expected and releasing aResult twice, or the rust code not bumping the refcount as intended, or something... I don't expect it to be just garbage pointers since all the crashes seem point to double-free of some sort. But that could also be a red herring...

Glandium, you've probably have seen more of this kind of bizarre stuff than me... Any idea or suggestion about how to approach this if Chris doesn't have any better idea?

Flags: needinfo?(mh+mozilla)
Flags: needinfo?(emilio)
Flags: needinfo?(chrisccoulson)

FYI bug 1581121 is most likely another instance of this and it's pretty high volume (100 crashes per day), see bug 1591974 comment 1. Did we open an issue on Ubuntu's tracker? They're not responding to our NI? but I found they're usually responsive on their own tracker.

Also NI?ing Olivier Tilloy who responded to previous bugs I filed on Launchpad. See comment 22.

Flags: needinfo?(olivier)

Starting with firefox 70, Ubuntu packages are built with clang (version 8 in Ubuntu 18.04), not gcc any longer.
I'm curious whether you're still seeing those crashes with these builds?

Flags: needinfo?(olivier)

OK, I may have a clue about what's happening: I can't find crashes for versions of Ubuntu newer than 18.04. All the crashes are coming from either 18.04 or some older version of which we don't have the version number in the platform string. You can still tell they're older than 18.04 because the kernels are older (all the way to 3.13.x, yay!).

Olivier, Firefox for 18.04 and older arebuilt with a different toolchain than the newer versions, right? Is that clang-6 maybe? It's possible that might be the issue; IIRC we have a bunch of patches on top of vanilla clang-6 to make it work correctly for Firefox.

Flags: needinfo?(olivier)

Yes that's right. Firefox packages in Ubuntu are currently built for all supported releases:

  • 16.04 (clang 6.0)
  • 18.04 (clang 8.0)
  • 19.04 (clang 8.0)
  • 19.10 (clang 9.0)

[@ core::ptr::real_drop_in_place | core::ptr::real_drop_in_place | servo_arc::Arc<T>::drop_slow ] signature is #27 overall on 70.0.1 release. Marking 70 as affected.

(In reply to Olivier Tilloy from comment #28)

  • 16.04 (clang 6.0)

Would it be possible for you to use packages from https://apt.llvm.org/ to build on older ubuntu?
clang 6.0 has some miscompilation issues

18.04 seems also affected and that's using clang 8.0. I suspect they might just be missing one of the local fixes we had applied to clang when doing the builds ourselves. Also we need a fix real quick, these are the top 10 crashers on Linux for this week:

Rank Signature Count %
1 core::ptr::real_drop_in_place | core::ptr::real_drop_in_place | servo_arc::Arc<T>::drop_slow 493 6.19 %
2 core::ptr::real_drop_in_place | core::ptr::real_drop_in_place | style::gecko::wrapper::get_animation_rule 457 5.74 %
3 <style_traits::owned_slice::OwnedSlice<T> as style::values::computed::ToComputedValue>::from_computed_value 394 4.95 %
4 MatrixForTransformFunction 371 4.66 %
5 free | core::ptr::real_drop_in_place | style::gecko::wrapper::get_animation_rule 273 3.43 %
6 nv30_fp_state_bind 191 2.40 %
7 free | core::ptr::real_drop_in_place | servo_arc::Arc<T>::drop_slow 130 1.63 %
8 __pthread_mutex_lock | free | core::ptr::real_drop_in_place | style::gecko::wrapper::get_animation_rule 120 1.51 %
9 __pthread_mutex_lock | free | core::ptr::real_drop_in_place | servo_arc::Arc<T>::drop_slow 118 1.48 %
10 __pthread_mutex_lock 117 1.47 %

With the exception of 6 which is a mesa bug all the other entries are instances of this bug. It's our top crasher by far ATM.

CC'ing :dmajor who might help out.

Tracking for 70 since this looks high volume in socorro.

Changing the priority to p1 as the bug is tracked by a release manager for the current release.
See What Do You Triage for more information

Priority: P3 → P1

Couldn't it be, rather than clang, the llvm used by rust? (which rust patches, but linux distros usually use system llvm)

Flags: needinfo?(mh+mozilla)

Is there a way I can inspect a copy of the crashing libxul.so's on a non-Linux machine? In other words, is there an archive of builds somewhere?

These are the affected builds:

I believe there are tools to unpack .deb packages on Windows (7-zip maybe?). Once unpacked you'll find libxul.so under usr/lib/firefox/

Crash Signature: style::gecko::wrapper::get_animation_rule ] → style::gecko::wrapper::get_animation_rule ] [@ __pthread_mutex_lock] [@ __pthread_mutex_lock | free | core::ptr::real_drop_in_place | <style::gecko::wrapper::GeckoElement as style::dom::TElement>::needs_transitions_update::{{closure}}] [@ __pthread_mutex…

On second thought, if this is a miscompilation, it could be well before the crash and I'm unlikely to find anything by static inspection of the binary.

Given that >50% of our 32-bit Ubuntu crashes contain drop_in_place, we might have good odds for seeing it locally. Could we get someone to try to catch this in rr?

Crash Signature: style::gecko::wrapper::get_animation_rule ] [@ __pthread_mutex_lock] [@ __pthread_mutex_lock | free | core::ptr::real_drop_in_place | <style::gecko::wrapper::GeckoElement as style::dom::TElement>::needs_transitions_update::{{closure}}] [@ __pthread_mutex… → style::gecko::wrapper::get_animation_rule ] [@ __pthread_mutex_lock] [@ __pthread_mutex_lock | free | core::ptr::real_drop_in_place | <style::gecko::wrapper::GeckoElement as style::dom::TElement>::needs_transitions_update::{{closure}}] [@ __pthread_mu…

(In reply to :dmajor from comment #39)

On second thought, if this is a miscompilation, it could be well before the crash and I'm unlikely to find anything by static inspection of the binary.

Given that >50% of our 32-bit Ubuntu crashes contain drop_in_place, we might have good odds for seeing it locally. Could we get someone to try to catch this in rr?

I'll try installing Ubuntu 18.04/i386 in a VM today and see if I can repro. I've inspected both rustc packages for 16.04 and 18.04 and both are version 1.36 based on LLVM 8.0. I couldn't find the exact LLVM version but it seems to be the one that came with rust 1.36 stable release, they're not using the system LLVM for rustc.

(In reply to Sylvestre Ledru [:Sylvestre] from comment #30)

Would it be possible for you to use packages from https://apt.llvm.org/ to build on older ubuntu?
clang 6.0 has some miscompilation issues

Not directly, because build packages need to be in the Ubuntu archive, and apt.llvm.org offers packages for amd64 and i386 only (Ubuntu packages are also built on armhf, arm64, ppc64el, s390x).
However we could consider backporting a more recent version of clang to xenial.

Flags: needinfo?(olivier)

We're starting to see crashes in 71.0 beta and those are using rustc 1.37: https://crash-stats.mozilla.org/report/index/49c069b0-e997-461c-9bdd-f225f0191119

However we could consider backporting a more recent version of clang to xenial.

a dget from apt.llvm.org / dput should probably be enough

I just found another signature. Sifting through the crashes I can confirm that 71.0 beta (built with rustc 1.37) is also affected on both Ubuntu (18.04 and older) and Debian

Crash Signature: | free | core::ptr::real_drop_in_place | servo_arc::Arc<T>::drop_slow] [@ __pthread_mutex_lock | free | core::ptr::real_drop_in_place | style::gecko::wrapper::get_animation_rule] → | free | core::ptr::real_drop_in_place | servo_arc::Arc<T>::drop_slow] [@ __pthread_mutex_lock | free | core::ptr::real_drop_in_place | style::gecko::wrapper::get_animation_rule] [@ OOM | large | NS_ABORT_OOM | <name omitted> | DoDatabaseWork]

Good news, my symbol scraper just grabbed the last 70.0.1 build for Xenial (16.04) and it uses rust 1.38 so chances are this issue will be going away soon.

All supported Ubuntu releases now have firefox 71.0, built with clang 8.0 or 9.0, and rust 1.37.

It seems we're not out of the woods yet. This is a recent 72.0 beta build made with rustc 1.37 and it's still exhibiting the issue:

https://crash-stats.mozilla.org/report/index/222121c1-4a87-4558-9335-106e10191229

Unfortunately this is still a top crasher in Linux, is there any chance for builds to be switched to rustc 1.38? As I mentioned in comment 47 I noticed that at least one build had been switched to rustc 1.38 but all the others still seem to be on 1.37.

All beta builds are now done with rustc 1.39. When 73.0 becomes stable, the issue should go away.
It is not planned to publish and use rustc 1.38 until then, as firefox is what drives rustc updates in Ubuntu and the requirement wasn't bumped to 1.38 for 72.0.

That's excellent news Olivier, thanks! Clearing the NI since this should go away when we reach 73.

Flags: needinfo?(chrisccoulson)

FYI, the dup just added here is a Double-Free on linux that started in FF69.

Seems like Firefox 72 x86 builds of Ubuntu are also affected :(

bp-4bdff03c-bf49-4414-9f0b-38b6f0200204 is from a 73.0b12 build with rustc 1.39.

That's bad. That's terrible actually. I think we need to sit back and have another hard look at those crashes to figure out why they're happening.

This is the only data-points I've gathered while looking at the reports so far:

  • This is happening only on Ubuntu-packaged 32-bit builds running on Ubuntu 18.04 and older. IIRC there should be no newer release of Ubuntu for 32-bit x86, is this correct?
  • The only thing the stacks seem to have in common is the traversal of graphics structures that hold rust objects (possibly crossing from C++ to Rust)
  • The crashes appear to be double-frees, we often see poison patterns in the crash address so even if we're not double-free'ing objects we're accessing already dead ones
  • The uptime in the crashes varies from a few seconds to multiple hours

We've assumed that the issue was caused by interaction between rustc and clang but if the generated code was broken we should see this issue more often. The fact that the uptimes are all over the place make this smell like some kind of race, possibly with a thread destroying an object while another one still has a reference to it.

The only hypothesis I have is that we're missing some kind of memory barrier / atomic operation that's leading to the crash. Why it's not there I have no idea. Could this be coming from system header files instead of from the compiler? That would explain the difference between Mozilla and Ubuntu's builds given that we're using the same compilers at this point but not the same headers.

The relevant tricky barrier is here: https://searchfox.org/mozilla-central/rev/9855c6722fcfa96831d613d3062fdea87060a86d/servo/components/servo_arc/lib.rs#542.

But that looks about right (plus regular memory access is aquire/release in x86 if I recall correctly).

Plus if that were the code to blame we would be seeing issues all over the place, not only with AnimationValue...

Another potential issue is that LTO merged incorrectly some AddRef / Release symbols or such. This happened on mac when we tried to enable cross-language LTO for the first time (see bug 1486042).

Gabriele do you know how can I get the binary that's crashing from the crash report in comment 58, and what's the relevant binary from us?

Relevant symbols that we should look into are Servo_AnimationValue_AddRef and Servo_AnimationValue_Release, probably along some of the related ones: https://searchfox.org/mozilla-central/rev/9855c6722fcfa96831d613d3062fdea87060a86d/servo/components/style/gecko/arc_types.rs#62

Also are we really sure they're really using a fixed clang to build Firefox? Though I agree that I'd expect a miscompilation to be more frequent, these values are pretty dynamic...

Flags: needinfo?(gsvelto)

Yes, I'll get you the executable for both, they should be available on our symbol servers.

Here you go. The Ubuntu libxul.so is unstripped because I could re-integrate the debug information into it. Our libxul.so is stripped because I couldn't find the appropriate taskcluster job with the debug information. I'll keep looking though.

Flags: needinfo?(gsvelto)

(In reply to Gabriele Svelto [:gsvelto] from comment #58)

  • This is happening only on Ubuntu-packaged 32-bit builds running on Ubuntu 18.04 and older. IIRC there should be no newer release of Ubuntu for 32-bit x86, is this correct?

That's correct.

Thanks!

Flags: needinfo?(emilio)
Crash Signature: | free | core::ptr::real_drop_in_place | servo_arc::Arc<T>::drop_slow] [@ __pthread_mutex_lock | free | core::ptr::real_drop_in_place | style::gecko::wrapper::get_animation_rule] [@ OOM | large | NS_ABORT_OOM | <name omitted> | DoDatabaseWork] → | free | core::ptr::real_drop_in_place | servo_arc::Arc<T>::drop_slow] [@ __pthread_mutex_lock | free | core::ptr::real_drop_in_place | style::gecko::wrapper::get_animation_rule] [@ OOM | large | NS_ABORT_OOM | <name omitted> | DoDatabaseWork] [@ core:…

So I looked at the builds that Gabriele sent me, but unfortunately I can't use any of the debuginfo on our build (I gets weird GDB / Dwarf errors). I poked a bit at the stripped builds and I didn't see anything particularly suspicious, though I may have missed something... :/

I also tried resurrecting a bunch of linux32-debug jobs but they don't show any hint of stuff that may be going wrong.

Removing some of the signatures that are unrelated to this problem (DoDatabaseWork is only tangentially related, that's bug 1601707).

Crash Signature: | free | core::ptr::real_drop_in_place | servo_arc::Arc<T>::drop_slow] [@ __pthread_mutex_lock | free | core::ptr::real_drop_in_place | style::gecko::wrapper::get_animation_rule] [@ OOM | large | NS_ABORT_OOM | <name omitted> | DoDatabaseWork] [@ core:… → | free | core::ptr::real_drop_in_place | servo_arc::Arc<T>::drop_slow] [@ __pthread_mutex_lock | free | core::ptr::real_drop_in_place | style::gecko::wrapper::get_animation_rule] [@ core::ptr::real_drop_in_place | core::ptr::real_drop_in_place | servo_…
Flags: needinfo?(emilio)

David, do you know how can we make more progress on this? Lacking a repro and a debuggable build I'm a bit stuck on this :(

Flags: needinfo?(dmajor)

does ubuntu ship any prefs that could be different from our builds or something?

I still think trying to get someone (whether that's developers, QA, users...) to capture this under rr is worthwhile.

There's a good number of users who have hit this issue 10+ times and provided their email address. We might try contacting them to see if they could run under rr. Asking users to run strange tools isn't normally very successful, but with Linux users I'd say we have better odds than most. https://crash-stats.mozilla.org/search/?cpu_arch=x86&platform_version=~Ubuntu&signature=~drop_in_place&cpu_info=~Intel&product=Firefox&platform=Linux&date=%3E%3D2020-02-09T22%3A14%3A00.000Z&date=%3C2020-03-09T22%3A14%3A00.000Z&_facets=signature&_facets=email&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-email (Filtered on Intel CPU for rr capability)

Flags: needinfo?(dmajor)

Assigning to Emilio for now to reflect reality, though this is currently somewhat stalled.

Assignee: nobody → emilio
Crash Signature: servo_arc::Arc<T>::drop_slow | Servo_AnimationValue_Release] → servo_arc::Arc<T>::drop_slow | Servo_AnimationValue_Release] [@ style::properties::animated_properties::AnimationValue::from_computed_values]
Crash Signature: servo_arc::Arc<T>::drop_slow | Servo_AnimationValue_Release] [@ style::properties::animated_properties::AnimationValue::from_computed_values] → servo_arc::Arc<T>::drop_slow | Servo_AnimationValue_Release] [@ style::properties::animated_properties::AnimationValue::from_computed_values]

Hi,
I tried to reproduce this crash on my machine Ubuntu 16. I installed clang 6. I installed Firefox Release 74 from the terminal or downloaded directly from the firefox page and after I run Firefox I didn't receive any crash. Maybe if you can help with more details, it would be great.
Thank you!

It needs to be a 32-bit version of Firefox / Ubuntu.

Sorry, I didn't mention but it was on Ubuntu 16 x32 and Firefox Release 74 x32

(In reply to Raluca from comment #74)

I tried to reproduce this crash on my machine Ubuntu 16. I installed clang 6. I installed Firefox Release 74 from the terminal or downloaded directly from the firefox page and after I run Firefox I didn't receive any crash. Maybe if you can help with more details, it would be great.

This happens only with the Ubuntu-packaged version of Firefox, our build doesn't seem to be affected. You also need to use it for a while: most crashes have at least a few minutes of uptime with almost half of them having over an hour before the crash occurred.

According to the crash report signature, this issue occurred mostly on Firefox ESR 68 x32 on Linux x32 and mostly on this build ID: 20200305175243. I can't attempt reproduction since I don't have the system requirements. Furthermore, it appears that there are many new reports in the last days.

Do the last reports suggest anything about the cause of the crash?

Flags: needinfo?(svoisen)

(In reply to Bodea Daniel [:danibodea] from comment #78)

Do the last reports suggest anything about the cause of the crash?

Unfortunately, no. Comments have not been helpful and URLs are all over the map :(

Flags: needinfo?(svoisen)

I was unable to reproduce it on Linux x32 with ESR canonical x32. I had one tab open with Facebook logged in where I scrolled through various content and a second tab where I searched for a few terms using the DuckDuckGo as the default search engine, after which I left our Firefox ESR on overnight.

This looks like a new signature:

[@ <style::values::specified::font::VariantAlternatesList as core::clone::Clone>::clone]

Though I'm not 100% sure because the failure is now happening in clone(). Still an UAF though.

I recently added symbols for Linux Mint builds for Firefox and even though they're separate from the Ubuntu ones they suffer from the same issue. Also we landed bug 1616194 which now makes it very easy to figure out if a crash is distro-specific.

Crash Signature: servo_arc::Arc<T>::drop_slow | Servo_AnimationValue_Release] [@ style::properties::animated_properties::AnimationValue::from_computed_values] → servo_arc::Arc<T>::drop_slow | Servo_AnimationValue_Release] [@ style::properties::animated_properties::AnimationValue::from_computed_values] [@ core::ptr::real_drop_in_place | core::ptr::real_drop_in_place | core::ptr::real_drop_in_place | servo_arc::…
Crash Signature: servo_arc::Arc<T>::drop_slow | Servo_AnimationValue_Release] [@ <style_traits::owned_slice::OwnedSlice<T> as style::values::computed::ToComputedValue>::from_computed_value] → servo_arc::Arc<T>::drop_slow | Servo_AnimationValue_Release] [@ <style_traits::owned_slice::OwnedSlice<T> as style::values::computed::ToComputedValue>::from_computed_value] [@ MatrixForTransformFunction]

FWIW: It's on my todo list to try and get in touch with some users who have had this as a recurring issue and see if any of them are willing to run rr. Seems like the only possible next step to get some traction on this.

Haven't had much luck on comment 84, but the crash rate seems to have dropped off precipitously around July 2 or so.

(In reply to Sean Voisen (:svoisen) from comment #85)

Haven't had much luck on comment 84, but the crash rate seems to have dropped off precipitously around July 2 or so.

It could be a signature change, I'll sift through the Linux crash reports later today and report back. NI? me so I don't forget.

Flags: needinfo?(gsvelto)

The crashes seem to have disappeared starting with version 78. I don't know what happened but they're all but gone from Linux top crashers list - which they dominated for a long time. Our new top Linux crasher is completely unrelated.

Flags: needinfo?(gsvelto)

That's pretty good news, even if not entirely satisfactory to not have identified the root cause.

(In reply to Gabriele Svelto [:gsvelto] from comment #87)

The crashes seem to have disappeared starting with version 78. I don't know what happened but they're all but gone from Linux top crashers list - which they dominated for a long time. Our new top Linux crasher is completely unrelated.

Thanks for checking on that Gabriele. Deprioritizing and marking stalled given the trend since 78. Will leave open for now.

Severity: critical → S2
Keywords: stalled
Priority: P1 → P2

There have been no recent instances of this crash. The reports that still appear under the signature are not related to the original problem and are most likely caused by bad hardware / corrupted installations. There's no more UAFs showing up in the crash reports so I'd call this fixed but given there were no changes on our side I think WFM is more appropriate.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME

Since the bug is closed, the stalled keyword is now meaningless.
For more information, please visit auto_nag documentation.

Keywords: stalled
Group: layout-core-security
You need to log in before you can comment on or make changes to this bug.