Closed Bug 637461 Opened 10 years ago Closed 10 years ago

data race in StartupCache.cpp

Categories

(Core :: XPCOM, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla5

People

(Reporter: taras.mozilla, Assigned: taras.mozilla)

References

Details

Attachments

(1 file, 1 obsolete file)

==22869== Possible data race during write of size 8 at 0x162f6d10 by thread #1
==22869==    at 0x5739380: mozilla::scache::StartupCache::WriteTimeout(nsITimer*, void*) (StartupCache.cpp:415)
==22869==    by 0x63A37C4: nsTimerImpl::Fire() (nsTimerImpl.cpp:425)
==22869==    by 0x63A38A8: nsTimerEvent::Run() (nsTimerImpl.cpp:517)
==22869==    by 0x639FFFC: nsThread::ProcessNextEvent(int, int*) (nsThread.cpp:633)
==22869==    by 0x63604F9: NS_ProcessNextEvent_P(nsIThread*, int) (nsThreadUtils.cpp:250)
==22869==    by 0x6259882: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) (MessagePump.cpp:110)
==22869==    by 0x63E0ECA: MessageLoop::RunInternal() (message_loop.cc:219)
==22869==    by 0x63E0ED6: MessageLoop::RunHandler() (message_loop.cc:202)
==22869==    by 0x63E0F45: MessageLoop::Run() (message_loop.cc:176)
==22869==    by 0x6178787: nsBaseAppShell::Run() (nsBaseAppShell.cpp:192)
==22869==    by 0x5FE21D4: nsAppStartup::Run() (nsAppStartup.cpp:220)
==22869==    by 0x563AC11: XRE_main (nsAppRunner.cpp:3781)
==22869==    by 0x400F7E: main (nsBrowserApp.cpp:158)
 
==22869==  This conflicts with a previous read of size 8 by thread #26
==22869==    at 0x573939B: mozilla::scache::StartupCache::WaitOnWriteThread() (StartupCache.cpp:385)
==22869==    by 0x573960C: mozilla::scache::StartupCache::WriteToDisk() (StartupCache.cpp:326)
==22869==    by 0x57397DF: mozilla::scache::StartupCache::ThreadedWrite(void*) (StartupCache.cpp:398)
==22869==    by 0x7A43A1C: _pt_root (ptthread.c:189)
==22869==    by 0x4C2CBE7: mythread_wrapper (hg_intercepts.c:221)
==22869==    by 0x4E369C9: start_thread (pthread_create.c:300)
==22869==    by 0xB2DA70C: clone (clone.S:112)
This is a race on StartupCache::mWriteThread, which is accessed MT
without locking.  Am tracking it as HGANN 030 in the patch on bug
551155.
Attached patch locking (obsolete) — Splinter Review
Assignee: nobody → tglek
Attachment #516435 - Flags: review?(mwu)
Why not just return immediately from WaitOnWriteThread if NS_IsMainThread is false?

If we could avoid the write thread from calling WaitOnWriteThread completely, we could assert/abort on !NS_IsMainThread and ensure callers only use StartupCache on the main thread.
(In reply to comment #3)
> Why not just return immediately from WaitOnWriteThread if NS_IsMainThread is
> false?

Good idea, but I don't think we can rely on cache being used solely from main thread.

Afaik cache accesses are not happening frequently enough to worry about locking perf.
 
> If we could avoid the write thread from calling WaitOnWriteThread completely,
> we could assert/abort on !NS_IsMainThread and ensure callers only use
> StartupCache on the main thread.

It's convenient to be able to reopen the cache after writing on a separate thread.
(In reply to comment #4)
> (In reply to comment #3)
> > Why not just return immediately from WaitOnWriteThread if NS_IsMainThread is
> > false?
> 
> Good idea, but I don't think we can rely on cache being used solely from main
> thread.
> 
Currently, we can, as its only user will only use it on the main thread. As far as I can tell, StartupCache does not go out of its way to be thread safe. I suspect there are going to be far more severe bugs if you try to use StartupCache from another thread.

An alternate fix would be to not call WaitOnWriteThread on the write thread. There's no reason to call it in WriteToDisk, and LoadArchive doesn't need it because either its caller already called it (InvalidateCache) or its caller knows for sure that the thread isn't running (Init).
(In reply to comment #5)
> (In reply to comment #4)
> > (In reply to comment #3)
> > > Why not just return immediately from WaitOnWriteThread if NS_IsMainThread is
> > > false?
> > 
> > Good idea, but I don't think we can rely on cache being used solely from main
> > thread.
> > 
> Currently, we can, as its only user will only use it on the main thread. As far
> as I can tell, StartupCache does not go out of its way to be thread safe. I
> suspect there are going to be far more severe bugs if you try to use
> StartupCache from another thread.
> 
> An alternate fix would be to not call WaitOnWriteThread on the write thread.
> There's no reason to call it in WriteToDisk, and LoadArchive doesn't need it
> because either its caller already called it (InvalidateCache) or its caller
> knows for sure that the thread isn't running (Init).

WriteToDisk is called from the destructor too. In that case it should wait in case there is a write thread going on(but this can be fixed by waiting in destructor).

I'll see about changing code according to your suggestion, i'm just worried about causing fragility down the road when new features are added to the code.
At this point I think we should assert (or runtimeabort) that the startup cache clients only operate on the main thread. Trying to make the implementation completely threadsafe doesn't sound like a fruitful activity.
Attached patch no lockingSplinter Review
This adds assertions for running on non-main thread/process. no more WaitOnWrite thread on write thread
Attachment #516435 - Attachment is obsolete: true
Attachment #516435 - Flags: review?(mwu)
Attachment #516986 - Flags: review?(mwu)
Comment on attachment 516986 [details] [diff] [review]
no locking

Nice.
Attachment #516986 - Flags: review?(mwu) → review+
(In reply to comment #8)
> Created attachment 516986 [details] [diff] [review]

With this in place I can no longer detect the race.  +1 therefore.
http://hg.mozilla.org/mozilla-central/rev/2bee01c51ed6
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: fixed-in-cedar
Target Milestone: --- → mozilla2.2
You need to log in before you can comment on or make changes to this bug.