Closed Bug 961860 Opened 8 years ago Closed 8 years ago

Intermittent null-deref in Seer::Observe()

Categories

(Core :: Networking, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla29
Tracking Status
firefox27 --- unaffected
firefox28 --- unaffected
firefox29 --- fixed
firefox-esr24 --- unaffected

People

(Reporter: n.nethercote, Assigned: u408661)

References

()

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Valgrind catches this.

https://tbpl.mozilla.org/php/getParsedLog.php?id=33224002&tree=Mozilla-Inbound

> ==20525== Invalid read of size 8
> ==20525==    at 0x80FC00F: mozilla::net::Seer::Observe(nsISupports*, char const*, char16_t const*) (Seer.cpp:355)
> ==20525==    by 0x80AC7EF: nsTimerImpl::Fire() (nsTimerImpl.cpp:559)
> ==20525==    by 0x80AC8C9: nsTimerEvent::Run() (nsTimerImpl.cpp:635)
> ==20525==    by 0x80A880C: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:637)
> ==20525==    by 0x80580F4: NS_ProcessNextEvent(nsIThread*, bool) (nsThreadUtils.cpp:263)
> ==20525==    by 0x80AD128: nsThread::Shutdown() (nsThread.cpp:488)
> ==20525==    by 0x8E79DE3: mozilla::dom::workers::RuntimeService::NoteIdleThread(mozilla::dom::workers::RuntimeService::WorkerThread*) (RuntimeService.cpp:2198)
> ==20525==    by 0x8E79F21: (anonymous namespace)::WorkerThreadPrimaryRunnable::FinishedRunnable::Run() (RuntimeService.cpp:2597)
> ==20525==    by 0x80A880C: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:637)
> ==20525==    by 0x80580F4: NS_ProcessNextEvent(nsIThread*, bool) (nsThreadUtils.cpp:263)
> ==20525==    by 0x80AD128: nsThread::Shutdown() (nsThread.cpp:488)
> ==20525==    by 0x8CC2E86: nsHtml5ParserThreadTerminator::Observe(nsISupports*, char const*, char16_t const*) (nsHtml5Module.cpp:100)
> ==20525==    by 0x808726B: nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) (nsObserverList.cpp:96)
> ==20525==    by 0x8087327: nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) (nsObserverService.cpp:302)
> ==20525==    by 0x805B69F: mozilla::ShutdownXPCOM(nsIServiceManager*) (nsXPComInit.cpp:708)
> ==20525==    by 0x805BB87: NS_ShutdownXPCOM (nsXPComInit.cpp:656)
> ==20525==    by 0x97B4512: ScopedXPCOMStartup::~ScopedXPCOMStartup() (nsAppRunner.cpp:1163)
> ==20525==    by 0x97BAB55: XREMain::XRE_main(int, char**, nsXREAppData const*) (nsAppRunner.cpp:4116)
> ==20525==    by 0x97BAD9E: XRE_main (nsAppRunner.cpp:4331)
> ==20525==    by 0x404933: do_main(int, char**, nsIFile*) (nsBrowserApp.cpp:280)
> ==20525==  Address 0x0 is not stack'd, malloc'd or (recently) free'd
> ==20525==

This is a simple NULL-deref:

    gSeer->mIOThread->Dispatch(event, NS_DISPATCH_NORMAL);

Either |gSeer| or |gSeer->mIOThread| is NULL. This causes a segfault, and we enter Breakpad, and Breakpad does a clone() call that Valgrind can't handle.

This would also cause a crash if it happened when not running under Valgrind. It's likely that it manifested under Valgrind because thread scheduling is different to native runs.
Blocks: 961614
i believe nick is away for a few days
Assignee: nobody → hurley
I don't disappear to the mountains until Friday :)

It's almost certainly mIOThread that's null - gSeer only gets set to null when the dtor runs, so unless there's something incredibly crazy going on, there's no way gSeer will be null (since the seer is a singleton). That said, there's no need to be referencing gSeer in Seer::Observe, anyway (it's likely an artifact of when I had a separate observer object), so it's probably worth cleaning that up when I add the check to make sure we don't do anything stupid like deref a null pointer :)
Attached patch patchSplinter Review
Here's a thing that should do what we want (along with an extra sanity check for threading in Seer::Observe)

Running through try for valgrind (will retrigger a few times for reasonable certainty that it's fixed): https://tbpl.mozilla.org/?tree=Try&rev=3484a34b04aa
Attachment #8363839 - Flags: review?(mcmanus)
> Running through try for valgrind (will retrigger a few times for reasonable
> certainty that it's fixed):

It's sufficiently rare -- it has happened only two or three times in the past few days -- that you'd need a *lot* of retriggers to be confident. But looking at the patch it seems pretty likely that you've fixed it.
Attachment #8363839 - Flags: review?(mcmanus) → review+
https://hg.mozilla.org/mozilla-central/rev/b89f28d7f0f9
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
AFAICT, this affects 27/28 as well. Probably too late for an uplift to 27, but can we nominate it for 28 please?
Actually doesn't affect 27 or 28 - the code that included the null deref is new in 29. And even if it weren't, the seer is disabled on 27/28.
You need to log in before you can comment on or make changes to this bug.