Closed Bug 828630 Opened 7 years ago Closed 7 years ago

data race on nsSocketTransportService::mOffline

Categories

(Core :: Networking, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla21
Tracking Status
firefox20 --- fixed
firefox21 --- fixed

People

(Reporter: jseward, Assigned: jaas)

References

Details

(Keywords: valgrind)

Attachments

(1 file)

nsSocketTransportService::mOffline and possibly mGoingOffline are
accessed from multiple threads but there is no lock that consistently
protects it.

nsSocketTransportService::Run() does protect them both with mLock.
But nsSocketTransportService::SetOffline(bool offline) accesses them
without any holding mLock.
STR: start up on desktop and let it idle.  Then quit.

Helgrind's complainery follows.

----------------------------------------------------------------

Lock at 0x4EDFDA0 was first observed
   at 0x4A0A0C9: pthread_mutex_init (hg_intercepts.c:429)
   by 0x50589FD: PR_NewLock (ptsynch.c:147)
   by 0x6BF5F07: nsSocketTransportService::nsSocketTransportService() (Mutex.h:49)
   by 0x6BCBECC: nsSocketTransportServiceConstructor(nsISupports*, nsID const&, void**) (in /home/sewardj/MOZ/MC-08-01-2013-HG/ff-opt/toolkit/library/libxul.so)
   by 0x7C3E49A: mozilla::GenericFactory::CreateInstance(nsISupports*, nsID const&, void**) (GenericFactory.cpp:16)
   by 0x7C6E930: nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) (nsComponentManager.cpp:1034)
   by 0x7C6FD0D: nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**) (nsComponentManager.cpp:1426)
   by 0x7C380DC: CallGetService(char const*, nsID const&, void**) (nsComponentManagerUtils.cpp:62)
   by 0x7C38367: nsGetServiceByContractIDWithError::operator()(nsID const&, void**) const (nsComponentManagerUtils.cpp:256)
   by 0x7C37105: nsCOMPtr_base::assign_from_gs_contractid_with_error(nsGetServiceByContractIDWithError const&, nsID const&) (nsCOMPtr.cpp:101)
   by 0x6BDCF13: nsIOService::InitializeSocketTransportService() (nsCOMPtr.h:682)
   by 0x6BDD03A: nsIOService::SetOffline(bool) (nsIOService.cpp:728)

Possible data race during write of size 1 at 0x4EDFD03 by thread #1
Locks held: none
   at 0x6BF570E: nsSocketTransportService::SetOffline(bool) (nsSocketTransportService2.cpp:536)
   by 0x6BDD153: nsIOService::SetOffline(bool) (nsIOService.cpp:714)
   by 0x6BDF24D: nsIOService::Observe(nsISupports*, char const*, unsigned short const*) (nsIOService.cpp:917)
   by 0x7C4CD16: nsObserverList::NotifyObservers(nsISupports*, char const*, unsigned short const*) (nsObserverList.cpp:99)
   by 0x7C4D210: nsObserverService::NotifyObservers(nsISupports*, char const*, unsigned short const*) (nsObserverService.cpp:161)
   by 0x6BB04A8: nsXREDirProvider::DoShutdown() (nsXREDirProvider.cpp:847)
   by 0x6BA9B42: ScopedXPCOMStartup::~ScopedXPCOMStartup() (nsAppRunner.cpp:1119)
   by 0x6BAF33B: XREMain::XRE_main(int, char**, nsXREAppData const*) (nsAppRunner.cpp:3915)
   by 0x6BAF5A6: XRE_main (nsAppRunner.cpp:4093)
   by 0x40208E: do_main(int, char**, nsIFile*) (nsBrowserApp.cpp:195)
   by 0x402195: main (nsBrowserApp.cpp:388)

This conflicts with a previous read of size 1 by thread #4
Locks held: 1, at address 0x4EDFDA0
   at 0x6BF6F81: nsSocketTransportService::Run() (nsSocketTransportService2.cpp:660)
   by 0x7C74686: nsThread::ProcessNextEvent(bool, bool*) (nsThread.cpp:627)
   by 0x7C3B1D0: NS_ProcessNextEvent_P(nsIThread*, bool) (nsThreadUtils.cpp:237)
   by 0x7C751F9: nsThread::ThreadFunc(void*) (nsThread.cpp:265)
   by 0x505DFD6: _pt_root (ptthread.c:158)
   by 0x4A09F6C: mythread_wrapper (hg_intercepts.c:219)
   by 0x30E2C07D13: start_thread (in /usr/lib64/libpthread-2.15.so)
   by 0x30E28F167C: clone (in /usr/lib64/libc-2.15.so)

Address 0x4EDFD03 is 83 bytes inside a block of size 168 alloc'd
   at 0x4A077BC: malloc (vg_replace_malloc.c:270)
   by 0x62A7078: moz_xmalloc (mozalloc.cpp:54)
   by 0x6BCBEC1: nsSocketTransportServiceConstructor(nsISupports*, nsID const&, void**) (mozalloc.h:200)
   by 0x7C3E49A: mozilla::GenericFactory::CreateInstance(nsISupports*, nsID const&, void**) (GenericFactory.cpp:16)
   by 0x7C6E930: nsComponentManagerImpl::CreateInstanceByContractID(char const*, nsISupports*, nsID const&, void**) (nsComponentManager.cpp:1034)
   by 0x7C6FD0D: nsComponentManagerImpl::GetServiceByContractID(char const*, nsID const&, void**) (nsComponentManager.cpp:1426)
   by 0x7C380DC: CallGetService(char const*, nsID const&, void**) (nsComponentManagerUtils.cpp:62)
   by 0x7C38367: nsGetServiceByContractIDWithError::operator()(nsID const&, void**) const (nsComponentManagerUtils.cpp:256)
   by 0x7C37105: nsCOMPtr_base::assign_from_gs_contractid_with_error(nsGetServiceByContractIDWithError const&, nsID const&) (nsCOMPtr.cpp:101)
   by 0x6BDCF13: nsIOService::InitializeSocketTransportService() (nsCOMPtr.h:682)
   by 0x6BDD03A: nsIOService::SetOffline(bool) (nsIOService.cpp:728)
   by 0x6BDE479: nsIOService::InitializeNetworkLinkService() (nsIOService.cpp:278)

----------------------------------------------------------------
Keywords: valgrind
Taking this in order to find an owner.
Assignee: nobody → joshmoz
this is probably from adam's patch..
(In reply to Patrick McManus [:mcmanus] from comment #3)
> this is probably from adam's patch..

I don't know what this is a reference to.
(In reply to Josh Aas (Mozilla Corporation) from comment #4)
> (In reply to Patrick McManus [:mcmanus] from comment #3)
> > this is probably from adam's patch..
> 
> I don't know what this is a reference to.

bug 87717
Blocks: 87717
Attached patch fix v1.0Splinter Review
Maybe the fix is this simple? I don't know this code, just trying to help move things along.
Attachment #700706 - Flags: review?(unusualtears)
Comment on attachment 700706 [details] [diff] [review]
fix v1.0

Thanks Josh.  That should do it.  I am building a patched, valgrind-enabled build to verify it with helgrind, but I don't really have any doubt.  

The traces in comment 1 are clear: the main thread toggles |mGoingOffline| while another thread may be locking it in |Run|.
Attachment #700706 - Flags: review?(unusualtears)
Attachment #700706 - Flags: review?(mcmanus)
Attachment #700706 - Flags: review+
Getting helgrind to behave nicely is more work than I anticipated (building suppression file and all).  I don't have the patience at the moment for it.  But I stand by my comment that Josh's patch to grab the lock in |SetOffline| will eliminate the race for |mGoingOffline|.
Getting H to behave nicely requires marking up the source in a few critical
places; see bug 551155.

Anyway, with the patch in place I am no longer seeing the race.  So, +1 from me.
Attachment #700706 - Flags: review?(mcmanus) → review+
Comment on attachment 700706 [details] [diff] [review]
fix v1.0

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 87717
User impact if declined: This was found by a tool, not a user. I guess Firefox could get confused about its offline state. Generally races like this are pretty problematic, hard to diagnose issues because they can be hard to reproduce.
Testing completed (on m-c, etc.): on m-c now
Risk to taking this patch (and alternatives if risky): doesn't seem risky
String or UUID changes made by this patch: none
Attachment #700706 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/6e56521bea69
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla21
Comment on attachment 700706 [details] [diff] [review]
fix v1.0

We'll take a low risk fix on Aurora to prevent the need to diagnose amorphous networking bugs. Approving.
Attachment #700706 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.