In bug 1128255, we got reports of crashes that look related to updating telemetry in Necko. Here are some crashstats links: https://crash-stats.mozilla.com/report/index/bp-b137ac89-1966-4259-9990-e82912150402 https://crash-stats.mozilla.com/report/index/5e67fba4-a4ec-4283-8a98-ae67a2150305 More details and STR can be found in bug 1128255.
Stack, posted by Patrick in bug 1128255 comment 19: Frame Module Signature Source 0 libxul.so base::Histogram::Add(int) ipc/chromium/src/base/histogram.cc 1 libxul.so mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int) toolkit/components/telemetry/Telemetry.cpp 2 libxul.so mozilla::net::CacheStorageService::TelemetryRecordEntryRemoval(mozilla::net::CacheEntry const*) netwerk/cache2/CacheStorageService.cpp 3 libxul.so mozilla::net::CacheStorageService::UnregisterEntry(mozilla::net::CacheEntry*) netwerk/cache2/CacheStorageService.cpp 4 libxul.so mozilla::net::CacheEntry::Purge(unsigned int) netwerk/cache2/CacheEntry.cpp 5 libxul.so mozilla::net::CacheStorageService::MemoryPool::PurgeByFrecency(bool&, unsigned int) netwerk/cache2/CacheStorageService.cpp 6 libxul.so mozilla::net::CacheStorageService::MemoryPool::PurgeOverMemoryLimit() netwerk/cache2/CacheStorageService.cpp 7 libxul.so mozilla::net::CacheStorageService::PurgeOverMemoryLimit() netwerk/cache2/CacheStorageService.cpp 8 libxul.so nsRunnableMethodImpl<void (mozilla::net::CacheStorageService::*)(), void, true>::Run() xpcom/glue/nsThreadUtils.h 9 libxul.so
I don't see anything wrong in the cache code. The telemetry is not protected with the lock but it's called only on one thread. On crash-stats I see a lot of crashes in base::Histogram::Add(int) called from other than cache code, so why is this considered as a bug in cache instead of a bug in telemetry?
I have no idea where the fault lies. Vladan, does this seem like it's a bug in the Telemetry code?
From those 0x5a5a5a92 addresses, it looks like the histograms (or their buckets) have been freed. That sounds bad. What could cause that?
could it be a shutdown ordering thing?
(In reply to Patrick McManus [:mcmanus] from comment #5) > could it be a shutdown ordering thing? I don't think so; the STR in bug 1128255 don't involve shutdown.
Since it's on Linux and has STR, I bet rr will have the answer.
dmajor does that mean you can take this to debug with rr?
Not really. My rr VM was super dusty and got lost to spring cleaning. According to the docs, reverse-execution (which is what we need here) needs real hardware anyway.
(In reply to David Major [:dmajor] from comment #7) > Since it's on Linux and has STR, I bet rr will have the answer. I don't think we actually have STR here. I tried to reproduce based on the descriptions in that bug on my linux machine and I didn't get any crash.
(In reply to Timothy Nikkel (:tn) from comment #10) > (In reply to David Major [:dmajor] from comment #7) > > Since it's on Linux and has STR, I bet rr will have the answer. > > I don't think we actually have STR here. I tried to reproduce based on the > descriptions in that bug on my linux machine and I didn't get any crash. Same here. Vladan set me up with a VNC linux machine and it didn't crash either.
Ryan, can you help David and Timothy reproduce this crash?
I will try in the next few days. I am still using aurora 35 since the bug has has affected every release since 36. As I said before in the other thread it takes between 5 to 20 crashes before the bug reporter even pops up.
http://officialfan.proboards.com/thread/519714/divas-pics-thread-bella-edition?page=35 http://officialfan.proboards.com/thread/516101/wwe-pics-gifs-vigilante-thread?page=72 I crash when going to above threads, clicking the page number to move along the pages within the thread and then scrolling to view more pictures. Most of the time I will crash or hard freeze by viewing a page or two or three of those threads. Those pages work fine under Stable and Nightly Firefox on Windows and Firefox 35 under Linux.
I still haven't seen the crash, but it's possible that my remote session may be interfering with the scrolling. Timothy does it crash for you?
Created attachment 8595039 [details] stack Hmm, so I was able to get three crashes using dev edition official builds. Two of them just said "Fatal IO error 11 (Resource temporarily unavailable) on X server :0. One of them dumped some hex addresses (I'll attach). I tried for quite some time in my own m-c build with debug+opt under gdb or not under gdb but I never got a single crash. I even tried tagging my build as official and specifically enabling telemetry in my mozconfig. I'm not sure what to try.
Any ideas on how I can use official builds (I'm assuming try build will also reproduce the crash) to track this down (get a stack or something)?
I made a try build with some printfs at the crashing site from comment 0 but I couldn't get it to crash.
I don't have much experience debugging on Linux so I don't really know what the options are. Can you take the build that does crash, and have gdb auto-attach when it crashes? Or run it under gdb from the start? Does it crash if you run under rr?
Running under gdb it crashes but it never breaks in gdb. One session ended with this crash: [NPAPI 15507] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-aurora-l64-ntly-000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1597 [NPAPI 15507] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-aurora-l64-ntly-000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1597 [Inferior 1 (process 15418) exited with code 01] I guess I'll try rr next.
rr does not seem to like official builds of firefox.
Um... that's not good. Talk to roc, I'm sure he'll want to know!
https://github.com/mozilla/rr/wiki/Building-And-Installing says that one needs to --disable-gstreamer in the build, so that is probably why.
There does not seem to be anymore crashing or freezing on those web pages with the few 40.0a2 builds I tested over the past couple weeks.
Thanks for the update, Ryan. I'm going to resolve this but please re-open if you run into the issue again.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.