Closed Bug 722545 Opened 8 years ago Closed 8 years ago

Telemetry deserialization causes debug startup crashes, presumably due to corrupt data on disk

Categories

(Toolkit :: Telemetry, defect, critical)

x86_64
Linux
defect
Not set
critical

Tracking

()

RESOLVED INVALID
Tracking Status
firefox12 + unaffected

People

(Reporter: justin.lebar+bug, Unassigned)

References

Details

(Keywords: crash, regression)

Attachments

(1 file)

I get the following stack trace when I start up my debug build.  I have no patches applied.

I presume this happens because my Telemetry data was corrupted.  Perhaps I killed Firefox in the middle of an async write.

But we should not crash like this when bad data is found.  Instead, we should just ignore the data on disk.  In fact, a checksum should be written along with the data so we can catch other forms of corruption.

#0  0x00007ffff7fe84df in TouchBadMemory () at ../../../src/memory/mozalloc/mozalloc_abort.cpp:68
#1  0x00007ffff7fe8532 in mozalloc_abort (msg=0x7fffffffab60 "###!!! ABORT: file ../../../src/ipc/chromium/src/base/histogram.cc, line 778") at ../../../src/memory/mozalloc/mozalloc_abort.cpp:89
#2  0x00007ffff4c70db6 in Abort (aMsg=0x7fffffffab60 "###!!! ABORT: file ../../../src/ipc/chromium/src/base/histogram.cc, line 778") at ../../../src/xpcom/base/nsDebugImpl.cpp:388
#3  0x00007ffff4c70cba in NS_DebugBreak_P (aSeverity=3, aStr=0x0, aExpr=0x0, aFile=0x7ffff59703c8 "../../../src/ipc/chromium/src/base/histogram.cc", aLine=778) at ../../../src/xpcom/base/nsDebugImpl.cpp:345
#4  0x00007ffff4cb088a in mozilla::Logger::~Logger (this=0x7fffffffb040, __in_chrg=<optimized out>) at ../../../src/ipc/chromium/src/base/logging.cc:47
#5  0x00007ffff35992e4 in mozilla::LogWrapper::~LogWrapper (this=0x7fffffffb040, __in_chrg=<optimized out>) at ../../../src/ipc/chromium/src/base/logging.h:57
#6  0x00007ffff4caa3b9 in base::Histogram::SampleSet::Deserialize (this=0x120cc98, iter=0x7fffffffb1c8, pickle=...) at ../../../src/ipc/chromium/src/base/histogram.cc:778
#7  0x00007ffff478b4c3 in (anonymous namespace)::TelemetrySessionData::DeserializeHistogramData (this=0x1202d40, pickle=..., iter=0x7fffffffb1c8) at ../../../../src/toolkit/components/telemetry/Telemetry.cpp:824
#8  0x00007ffff478b793 in (anonymous namespace)::TelemetrySessionData::LoadFromDisk (file=0xe7aee0, ptr=0x7fffffffb240) at ../../../../src/toolkit/components/telemetry/Telemetry.cpp:876
#9  0x00007ffff478be0c in (anonymous namespace)::LoadHistogramEvent::Run (this=0xe7b9c0) at ../../../../src/toolkit/components/telemetry/Telemetry.cpp:994
#10 0x00007ffff4c620cb in nsThread::ProcessNextEvent (this=0x501af0, mayWait=false, result=0x7fffffffb34f) at ../../../src/xpcom/threads/nsThread.cpp:657
#11 0x00007ffff4bf6c82 in NS_ProcessNextEvent_P (thread=0x501af0, mayWait=false) at nsThreadUtils.cpp:245
#12 0x00007ffff4abc1ac in mozilla::ipc::MessagePump::Run (this=0x4e7120, aDelegate=0x4e6e10) at ../../../src/ipc/glue/MessagePump.cpp:110
#13 0x00007ffff4cb1469 in MessageLoop::RunInternal (this=0x4e6e10) at ../../../src/ipc/chromium/src/base/message_loop.cc:208
#14 0x00007ffff4cb13fa in MessageLoop::RunHandler (this=0x4e6e10) at ../../../src/ipc/chromium/src/base/message_loop.cc:201
#15 0x00007ffff4cb13d3 in MessageLoop::Run (this=0x4e6e10) at ../../../src/ipc/chromium/src/base/message_loop.cc:175
#16 0x00007ffff4959ee8 in nsBaseAppShell::Run (this=0x7b1540) at ../../../src/widget/xpwidgets/nsBaseAppShell.cpp:189
#17 0x00007ffff46a5e40 in nsAppStartup::Run (this=0x7a1060) at ../../../../src/toolkit/components/startup/nsAppStartup.cpp:220
#18 0x00007ffff358b82a in XRE_main (argc=3, argv=0x7fffffffe008, aAppData=0x407c40) at ../../../src/toolkit/xre/nsAppRunner.cpp:3537
#19 0x0000000000401c85 in do_main (exePath=0x7fffffffcf00 "/home/jlebar/code/moz/ff-git/debug/dist/bin/", argc=4, argv=0x7fffffffe008) at ../../../src/browser/app/nsBrowserApp.cpp:205
#20 0x0000000000401eec in main (argc=4, argv=0x7fffffffe008) at ../../../src/browser/app/nsBrowserApp.cpp:295
Ah, like an idiot, I just deleted my savedHistograms file!

Sorry, that was pretty dumb.
Blocks: 707320
Severity: normal → critical
No longer depends on: 707320
Keywords: crash, regression
I see the source of the bug.  Yes, something was corrupt and the chromium code wants to abort in debug builds versus just returning failure in opt builds.

Deleting the appropriate debug-only checks would be one way to fix this.

Having a datafile to look at would be interesting, though.
> Deleting the appropriate debug-only checks would be one way to fix this.

Or at least making them non-fatal, yes.

But there's a bigger problem, which is that data can be corrupted in ways that won't be caught by such assertions.  We need to catch that too, because otherwise we're polluting our datastore.  But we could do so in a separate bug.
Duplicate of this bug: 722214
We'll need a fix for this startup crasher on mozilla-aurora prior to our first build. Please make this a priority.
This got backed out on 12.
Going to declare this fixed after the relanding of bug 707320; we've made the checking more robust in the face of bad data.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.