Closed
Bug 1151548
Opened 9 years ago
Closed 9 years ago
Crash while updating telemetry in Necko
Categories
(Core :: Networking, defect)
Core
Networking
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: seth, Unassigned)
References
Details
Attachments
(1 file)
14.35 KB,
text/plain
|
Details |
In bug 1128255, we got reports of crashes that look related to updating telemetry in Necko. Here are some crashstats links: https://crash-stats.mozilla.com/report/index/bp-b137ac89-1966-4259-9990-e82912150402 https://crash-stats.mozilla.com/report/index/5e67fba4-a4ec-4283-8a98-ae67a2150305 More details and STR can be found in bug 1128255.
Reporter | ||
Comment 1•9 years ago
|
||
Stack, posted by Patrick in bug 1128255 comment 19: Frame Module Signature Source 0 libxul.so base::Histogram::Add(int) ipc/chromium/src/base/histogram.cc 1 libxul.so mozilla::Telemetry::Accumulate(mozilla::Telemetry::ID, unsigned int) toolkit/components/telemetry/Telemetry.cpp 2 libxul.so mozilla::net::CacheStorageService::TelemetryRecordEntryRemoval(mozilla::net::CacheEntry const*) netwerk/cache2/CacheStorageService.cpp 3 libxul.so mozilla::net::CacheStorageService::UnregisterEntry(mozilla::net::CacheEntry*) netwerk/cache2/CacheStorageService.cpp 4 libxul.so mozilla::net::CacheEntry::Purge(unsigned int) netwerk/cache2/CacheEntry.cpp 5 libxul.so mozilla::net::CacheStorageService::MemoryPool::PurgeByFrecency(bool&, unsigned int) netwerk/cache2/CacheStorageService.cpp 6 libxul.so mozilla::net::CacheStorageService::MemoryPool::PurgeOverMemoryLimit() netwerk/cache2/CacheStorageService.cpp 7 libxul.so mozilla::net::CacheStorageService::PurgeOverMemoryLimit() netwerk/cache2/CacheStorageService.cpp 8 libxul.so nsRunnableMethodImpl<void (mozilla::net::CacheStorageService::*)(), void, true>::Run() xpcom/glue/nsThreadUtils.h 9 libxul.so
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(michal.novotny)
Comment 2•9 years ago
|
||
I don't see anything wrong in the cache code. The telemetry is not protected with the lock but it's called only on one thread. On crash-stats I see a lot of crashes in base::Histogram::Add(int) called from other than cache code, so why is this considered as a bug in cache instead of a bug in telemetry?
Flags: needinfo?(michal.novotny)
Reporter | ||
Comment 3•9 years ago
|
||
I have no idea where the fault lies. Vladan, does this seem like it's a bug in the Telemetry code?
Flags: needinfo?(vdjeric)
Updated•9 years ago
|
Flags: needinfo?(vdjeric)
From those 0x5a5a5a92 addresses, it looks like the histograms (or their buckets) have been freed. That sounds bad. What could cause that?
Comment 5•9 years ago
|
||
could it be a shutdown ordering thing?
Reporter | ||
Comment 6•9 years ago
|
||
(In reply to Patrick McManus [:mcmanus] from comment #5) > could it be a shutdown ordering thing? I don't think so; the STR in bug 1128255 don't involve shutdown.
Comment 8•9 years ago
|
||
dmajor does that mean you can take this to debug with rr?
Flags: needinfo?(dmajor)
Not really. My rr VM was super dusty and got lost to spring cleaning. According to the docs, reverse-execution (which is what we need here) needs real hardware anyway.
Flags: needinfo?(dmajor)
Comment 10•9 years ago
|
||
(In reply to David Major [:dmajor] from comment #7) > Since it's on Linux and has STR, I bet rr will have the answer. I don't think we actually have STR here. I tried to reproduce based on the descriptions in that bug on my linux machine and I didn't get any crash.
Comment 11•9 years ago
|
||
(In reply to Timothy Nikkel (:tn) from comment #10) > (In reply to David Major [:dmajor] from comment #7) > > Since it's on Linux and has STR, I bet rr will have the answer. > > I don't think we actually have STR here. I tried to reproduce based on the > descriptions in that bug on my linux machine and I didn't get any crash. Same here. Vladan set me up with a VNC linux machine and it didn't crash either.
Comment 12•9 years ago
|
||
Ryan, can you help David and Timothy reproduce this crash?
Flags: needinfo?(yixxt)
Comment 13•9 years ago
|
||
I will try in the next few days. I am still using aurora 35 since the bug has has affected every release since 36. As I said before in the other thread it takes between 5 to 20 crashes before the bug reporter even pops up.
Comment 14•9 years ago
|
||
http://officialfan.proboards.com/thread/519714/divas-pics-thread-bella-edition?page=35 http://officialfan.proboards.com/thread/516101/wwe-pics-gifs-vigilante-thread?page=72 I crash when going to above threads, clicking the page number to move along the pages within the thread and then scrolling to view more pictures. Most of the time I will crash or hard freeze by viewing a page or two or three of those threads. Those pages work fine under Stable and Nightly Firefox on Windows and Firefox 35 under Linux.
Flags: needinfo?(yixxt)
Comment 15•9 years ago
|
||
I still haven't seen the crash, but it's possible that my remote session may be interfering with the scrolling. Timothy does it crash for you?
Flags: needinfo?(tnikkel)
Comment 16•9 years ago
|
||
Hmm, so I was able to get three crashes using dev edition official builds. Two of them just said "Fatal IO error 11 (Resource temporarily unavailable) on X server :0. One of them dumped some hex addresses (I'll attach). I tried for quite some time in my own m-c build with debug+opt under gdb or not under gdb but I never got a single crash. I even tried tagging my build as official and specifically enabling telemetry in my mozconfig. I'm not sure what to try.
Flags: needinfo?(tnikkel)
Comment 17•9 years ago
|
||
Any ideas on how I can use official builds (I'm assuming try build will also reproduce the crash) to track this down (get a stack or something)?
Flags: needinfo?(dmajor)
Comment 18•9 years ago
|
||
I made a try build with some printfs at the crashing site from comment 0 but I couldn't get it to crash.
Comment 19•9 years ago
|
||
I don't have much experience debugging on Linux so I don't really know what the options are. Can you take the build that does crash, and have gdb auto-attach when it crashes? Or run it under gdb from the start? Does it crash if you run under rr?
Flags: needinfo?(dmajor)
Comment 20•9 years ago
|
||
Running under gdb it crashes but it never breaks in gdb. One session ended with this crash: [NPAPI 15507] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-aurora-l64-ntly-000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1597 [NPAPI 15507] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-aurora-l64-ntly-000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1597 [Inferior 1 (process 15418) exited with code 01] I guess I'll try rr next.
Comment 21•9 years ago
|
||
rr does not seem to like official builds of firefox.
Comment 22•9 years ago
|
||
Um... that's not good. Talk to roc, I'm sure he'll want to know!
Comment 23•9 years ago
|
||
https://github.com/mozilla/rr/wiki/Building-And-Installing says that one needs to --disable-gstreamer in the build, so that is probably why.
Comment 24•9 years ago
|
||
There does not seem to be anymore crashing or freezing on those web pages with the few 40.0a2 builds I tested over the past couple weeks.
Comment 25•9 years ago
|
||
Thanks for the update, Ryan. I'm going to resolve this but please re-open if you run into the issue again.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•