Closed Bug 551155 Opened 15 years ago Closed 6 months ago

mark up Fx codebase with race-detector annotations

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: jseward, Assigned: jseward)

References

(Depends on 3 open bugs)

Details

(Keywords: sec-want, Whiteboard: [sg:want])

Attachments

(4 files, 9 obsolete files)

patch as referred to in comment 1 (WIP) 15 years ago Julian Seward [:jseward] 18.67 KB, patch		Details \| Diff \| Splinter Review
list of remaining errors as referred to in comment 1 15 years ago Julian Seward [:jseward] 57.19 KB, text/plain		Details
WIP rebased to m-c 40484:523d0f8e0926 15 years ago Julian Seward [:jseward] 18.54 KB, patch		Details \| Diff \| Splinter Review
WIP patch against M-C of 24 Feb 2011 14 years ago Julian Seward [:jseward] 37.06 KB, patch		Details \| Diff \| Splinter Review
WIP patch against M-C of 24 Feb 2011 (revised) 14 years ago Julian Seward [:jseward] 40.95 KB, patch		Details \| Diff \| Splinter Review
suppressions file referred to in comment 17 14 years ago Julian Seward [:jseward] 1.82 KB, application/octet-stream		Details
WIP patch against M-C of 23 Mar 2011 14 years ago Julian Seward [:jseward] 45.22 KB, patch		Details \| Diff \| Splinter Review
Suppressions to go with markup in comment #19 (for Ubuntu 10.04 64 bit) 14 years ago Julian Seward [:jseward] 2.68 KB, text/plain		Details
WIP patch against M-C of 6 Apr 2011 14 years ago Julian Seward [:jseward] 43.52 KB, patch		Details \| Diff \| Splinter Review
WIP patch against M-C of 23 Apr 2012 13 years ago Julian Seward [:jseward] 44.12 KB, patch		Details \| Diff \| Splinter Review
Re-created patch against later C-C mozilla subdirectory (with USEHELGRIND macro) 13 years ago ISHIKAWA, Chiaki 41.70 KB, text/plain		Details
Re-created patch against later C-C mozilla subdirectory (with USEHELGRIND macro) 13 years ago ISHIKAWA, Chiaki 41.70 KB, patch		Details \| Diff \| Splinter Review
A minimalist starting-off patch, for consideration 13 years ago Julian Seward [:jseward] 6.88 KB, patch	cjones : feedback+	Details \| Diff \| Splinter Review

Julian Seward [:jseward]

Assignee

Description

•

15 years ago

It would be nice to be able to use Valgrind/Helgrind and possibly other run-time race detectors to routinely find and remove data races in Fx, in the same way that Valgrind/Memcheck and Purify are currently used to find and remove memory errors. The ultimate aim is to ship a codebase with no detectable threading errors, in the style of bug 549224. For this to be viable requires race detectors to be able to "see" all inter-thread synchronisation events. Most such tools are limited to intercepting pthread_* calls, and require markup of the to-be-tested application do deal with any other custom lock primitives, etc. Annotations need to be added in two sets of places: around implementations of custom threading primitives (eg, JSThinLock) at allocation points (constructors) for memory areas which are known to be harmlessly raced on (eg, JSRuntime::gcPoke). These annotations tell race detectors to ignore (don't track) such memory areas, at least until they are deallocated. bug 547736 makes a start, with annotations for Valgrind/Helgrind for jsengine.

Julian Seward [:jseward]

Assignee

Updated

•

15 years ago

Depends on: 547736

Julian Seward [:jseward]

Assignee

Updated

•

15 years ago

Assignee: nobody → jseward

Julian Seward [:jseward]

Assignee

Comment 1

•

15 years ago

I ran the simplest full-system test I could think of, which is to start Fx, wait for a blank page, then do File->Quit. What follows is a rough partial triage and commentary of findings. This is: m-c, 11 Mar 2010, x86_64-Linux, -g -O2, release build, running on valgrind --tool=helgrind, trunk svn r11089 or later. DISPLAY is a VNC X server, 960x720x16, one xterm, no window manager. I started with the SpiderMonkey markup patch at bug 547736 comment 15. Even with that patch, hundreds of races are reported. As per this triage I extended it by adding annotations to mark & hide obviously-harmless races. Result is that Fx startup/exit now shows only 21 races or other errors (lock ordering problems). Patch (very much WIP) is attached. Also a file showing the remaining 21 complaints. --- Not-obviously-harmless races ------------------------------ jsengine gc as per bug 551739, JSContext::requestDepth is raced on is probably harmless as per comments by Brendan & Jason (but rules about why this is OK are complex) xpcom/threads/TimerThread.cpp:262 vs xpcom/threads/nsTimerImpl.cpp:572 nsTimerImpl::mTimeout is accessed while mLock is held at TimerThread.cpp:262, but mLock is not held during access at nsTimerImpl.cpp:572. Intentional? I don't know. 262: if (!TIMER_LESS_THAN(now, timer->mTimeout + mTimeoutAdjustment)) { 572: mTimeout = now; --- Harmless races -------------------------------------------- nsprpub/pr/src/pthreads/ptthread.c: PRThread::id is raced on (line 185 vs 521) Harmless and commented-on in the source ANNOTATED so we won't see this again nsprpub/pr/src/pthreads/ptsynch.c: PRMonitor::owner is raced on, at least in PR_EnterMonitor if (!pthread_equal(mon->owner, self)) vs _PT_PTHREAD_COPY_THR_HANDLE(self, mon->owner); Harmless, I'm pretty sure. ANNOTATED so we won't see this again xpcom/glue/nsISupportsImpl.h (deep in the innards of ref-counting) NS_IMETHODIMP_(nsrefcnt) _class::Release(void) which implements threadsafe 'Release' (drop-a-reference) for whatever class, writes to mRefCnt without a lock once it has become zero. This is harmless because no other thread can access it once the refcount is zero. ANNOTATED so we won't see this again xpcom/ds/nsRecyclingAllocator.cpp Block::freeNode is accessed MT without a lock. As per comments in the source, this is harmless. ANNOTATED so we won't see this again xpcom/ds/nsRecyclingAllocator.cpp nsRecyclingAllocator::Malloc(PRSize bytes, PRBool zeroit) hands out chunks of memory which previously "belonged" to other threads. We need to tell race checkers that the memory now belongs to the new owner. ANNOTATED accordingly jsengine structures as per bug 547736, the following fields have harmless races JSRuntime::gcPoke JSRuntime::protoHazardShape JSContext::operationCallbackFlag JSGCStats::arenaStats (debug builds only) The patch incorporates annotations from 547736 to handle them.

Julian Seward [:jseward]

Assignee

Comment 2

•

15 years ago

Attached patch patch as referred to in comment 1 (WIP) (obsolete) — Details — Splinter Review

Contains a couple of things that should be split out into separate bugs: * nsprpub/pr/src/md/unix/os_Linux_x86_64.s: add .type/.size pseudo ops so the functions names defined in this file appear in stack traces * js_InitTitle(JSContext *cx, JSTitle *title): actually call js_InitLock on title->lock; don't just memset-0 it

Julian Seward [:jseward]

Assignee

Comment 3

•

15 years ago

Attached file list of remaining errors as referred to in comment 1 (obsolete) — Details

Julian Seward [:jseward]

Assignee

Comment 4

•

15 years ago

Attached patch WIP rebased to m-c 40484:523d0f8e0926 (obsolete) — Details — Splinter Review

Attachment #432142 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 5

•

15 years ago

More races triaged: * XPCPerThreadData::~XPCPerThreadData, bug 557586. Due to the particular order of thread exiting, is harmless. * nsTArray_base::IncrementLength(PRUint32 n) will non-atomically add zero to the mLength field of nsTArray_base::sEmptyHdr, which is a special header for zero-length arrays, in the case where a zero-length array is appended to an existing zero-length array. This gives a R-M-W race against reads from other threads of nsTArray_base::sEmptyHdr.mLength (presumably for arrays which are also empty). So is harmless. One fix is to change IncrementLength thusly: void IncrementLength(PRUint32 n) { NS_ASSERTION(mHdr != &sEmptyHdr || n == 0, "bad data pointer"); if (n > 0) mHdr->mLength += n; } or can just annotate to say, don't track sEmptyHdr.mLength. (w/ thanks to cjones and timeless_mbp for helping me make sense of this)

Julian Seward [:jseward]

Assignee

Comment 6

•

15 years ago

Possible vtable-overwrite races in xpcom Pretty mind-bending stuff. I don't know yet if these are harmful or not, or even if they are real -- perhaps some synchronisation event is being missed by the checker. Imagine we have a base class A with a virtual destructor, and class B that inherits from it: class A { ... virtual ~A() { ... } } class B : public A { ... virtual ~B() { ... } } When an object of (dynamic) type B is destroyed, we enter ~B. This does any B-specific finalisation, and then calls onwards to ~A. However, in order that ~A runs with the correct vtable, ~B must first overwrite the first word of the object -- the vtable pointer -- with the address of the vtable for A. If a second thread is meanwhile doing virtual function calls on the object, then it will read the vtable pointer and so there is a race. You might well ask, in that case how can the second thread avoid segfaulting in the case where ~A deallocates the memory holding the object and then it is munmapped? Well, by causing ~A to wait for a signal of some kind from the second thread before deallocating (viz, returning). Such code is still thread-unsafe, though, because the second thread will have done virtual function calls on an object whose vtable pointer is being changed by the first thread. I have observed various races in xpcom along these lines. They are also exhibited more simply by xpcom's unit test, xpcom/tests/TestThreads.cpp. They are all characterised by: * one thread does a write, the other a read * the writing thread is claimed to be at the opening brace of a virtual destructor for some class C * the raced-on address is the first word of an object of class C, being destroyed by the writing thread * the old value (that is just about to be overwritten) is a pointer to C's vtable eg: Possible data race during write of size 8 at 0xc6c31b0 by thread #1 at 0x52CC66B: nsThreadStartupEvent::~nsThreadStartupEvent() (nsThread.cpp:173) by 0x528E696: nsRunnable::Release() (nsThreadUtils.cpp:55) by 0x52CBB4F: nsThread::Init() (nsThread.cpp:342) by 0x52CCA61: nsThreadManager::NewThread(unsigned int, nsIThread**) (nsThreadManager.cpp:245) by 0x528E4FE: NS_NewThread_P(nsIThread**, nsIRunnable*) (nsThreadUtils.cpp:74) by 0x400C62: TestThreads() (TestThreads.cpp:14760) by 0x400CEC: main (TestThreads.cpp:14787) This conflicts with a previous read of size 8 by thread #2 at 0x528831B: nsCOMPtr_base::assign_assuming_AddRef(nsISupports*) (nsCOMPtr.h:456) by 0x52CC5D8: nsCOMPtr<nsIRunnable>::operator=(nsIRunnable*) (nsCOMPtr.h:640) by 0x52CBC45: nsThread::ThreadFunc(void*) (nsThread.cpp:250) by 0x5B56FC2: _pt_root (ptthread.c:230) by 0x4C2A5D8: mythread_wrapper (hg_intercepts.c:213) by 0x4E34A03: start_thread (pthread_create.c:300) by 0x67F880C: clone (clone.S:112) Address 0xc6c31b0 is 0 bytes inside a block of size 32 alloc'd at 0x4C25B08: malloc (vg_replace_malloc.c:236) by 0x5524E8B: moz_xmalloc (mozalloc.cpp:75) by 0x52CC5FA: nsThreadStartupEvent::Create() (mozalloc.h:222) by 0x52CBAAF: nsThread::Init() (nsThread.cpp:315) by 0x52CCA61: nsThreadManager::NewThread(unsigned int, nsIThread**) (nsThreadManager.cpp:245) by 0x528E4FE: NS_NewThread_P(nsIThread**, nsIRunnable*) (nsThreadUtils.cpp:74) by 0x400C62: TestThreads() (TestThreads.cpp:14760) by 0x400CEC: main (TestThreads.cpp:14787) The old value at address 0xc6c31b0 was 0x551de70 and 0x551de70 is symbol "vtable for nsThreadStartupEvent+16"

Julian Seward [:jseward]

Assignee

Comment 7

•

15 years ago

(In reply to comment #6) > Possible vtable-overwrite races in xpcom These were analysed and found to be false positives in bug 558402, now closed-as-invalid. Issue is Helgrind does not understand that the last Release-r of an object becomes its exclusive owner and may arbitrarily mess with it without any locking during its destruction. This is trivially fixed by annotating NS_IMPL_THREADSAFE_RELEASE.

Julian Seward [:jseward]

Assignee

Comment 8

•

15 years ago

Update of current status: About 2/3 of the races reported for "start browser to blank page; wait for a while; quit" have been triaged. Outcome is: two false positives to do with reference counting, and a number of races that are harmless. It's not clear that all of the latter were known about or intended by the code's authors. Currently there remain untriaged race reports in the following methods: NS_SetGlobalThreadObserver(nsIThreadObserver*) nsTimerImpl::SetDelayInternal(unsigned int) TimerThread::UpdateFilter TimerThread::Run() JSBackgroundThread::busy() JSBackgroundThread::work() nsSocketTransport::OnLookupComplete nsHttpTransaction::~nsHttpTransaction() nsHttpTransaction::WritePipeSegment nsDiskCacheOutputStream::Close() base::MessagePumpLibevent::~MessagePumpLibevent() There's also a possible lock order problem which looks like it might have to do with /usr/lib/libORBit-2.so.0.1.0.

Robert Sayre

Comment 9

•

15 years ago

(In reply to comment #8) > > nsDiskCacheOutputStream::Close() oooh, please look into that one first.

Julian Seward [:jseward]

Assignee

Comment 10

•

15 years ago

> (In reply to comment #8) > > nsDiskCacheOutputStream::Close() > oooh, please look into that one first. The race isn't always reported, only on about 50% of startup-then-idle runs. Relevant source file is netwerk/cache/src/nsDiskCacheStreams.cpp. It appears nsDiskCacheOutputStream::mClosed is accessed by multiple threads without locking. I'm seeing nsDiskCacheOutputStream::Close racing against nsDiskCacheOutputStream::Write and also (I suspect) against itself: 232 NS_IMETHODIMP 233 nsDiskCacheOutputStream::Close() 234 { 235 if (!mClosed) { 236 mClosed = PR_TRUE; 237 // tell parent streamIO we are closing 238 mStreamIO->CloseOutputStream(this); 239 } 240 return NS_OK; 241 } 253 NS_IMETHODIMP 254 nsDiskCacheOutputStream::Write(const char *buf, PRUint32 count, PRUint32 *bytesWritten) 255 { 256 if (mClosed) return NS_BASE_STREAM_CLOSED; 257 return mStreamIO->Write(buf, count, bytesWritten); 258 } * Write at line 236 races against read at line 256 (::Write vs ::Close) * Write at line 236 races against read at line 235 (::Close vs ::Close) * Write at line 236 races against write at line 236 (::Close vs ::Close) So imagine 2 threads enter ::Close in lockstep, both find mClosed == PR_FALSE, so mStreamIO->CloseOutputStream(this) is done twice (and without locking) for the same stream. Is that ok? Also (probably more serious) imagine two threads that want to do ::Write and then ::Close. Initially mClose == PR_FALSE. T1 enters ::Write, does its write, then enters ::Close, does ::mClosed = PR_TRUE, then stalls. T2 enters ::Write, sees mClosed == PR_TRUE, returns, hence its data is never written.

Julian Seward [:jseward]

Assignee

Comment 11

•

15 years ago

Hmm, possibly is more of the same thread-safe-reference-counting confuses Helgrind problem. Needs further analysis.

Julian Seward [:jseward]

Assignee

Comment 12

•

15 years ago

Making sense of race reports is way more difficult than making sense of memory error reports. I'll need help from the module owners. Am adopting the strategy of filing what I have as separate bugs, marking them UNCONFIRMED for the time being. If they turn out to be bogus or harmless they can be closed-as-invalid-ed. AFAICS triaging a race report can have 3 outcomes: (1) it's a real race, which is harmful. This is the only "useful" outcome. (2) it's a real race, but is not dangerous. Options are to fix it anyway, or add an annotation to the source directing Helgrind to ignore it. At a minimum we need to ensure the source contains a comment making it clear that the race is known to exist, and (preferably) come up with a rigorous argument why the race is harmless. (3) it's not a race. This happens when Helgrind fails to see some inter-thread synchronisation event. The "fix" is again to annotate the Fx source to describe what is going on. A good example of (3) are Release methods as described in comment 7.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 13

•

14 years ago

Hi Julian, are you still working on this or has this moved to the far back burner?

Julian Seward [:jseward]

Assignee

Comment 14

•

14 years ago

Working actively on this -- see also bug 547736 comment 17. This week I've fixed enough scaling problems in Helgrind to make it feasible to run substantial chunks of mochitests in one go. Now looking at redoing this patch, including redoing annotations of thread safe ref counting (nsCOMPtr et al), which I realise now I did incorrectly in previous attempts.

Julian Seward [:jseward]

Assignee

Comment 15

•

14 years ago

Attached patch WIP patch against M-C of 24 Feb 2011 (obsolete) — Details — Splinter Review

WIP patch, suitable for Helgrinding SpiderMonkey. Also greatly reduces the noise level from a browser startup/quit. General idea is for the patch to contain a mix of - annotations which are essential to describe to Helgrind, sync operations which it doesn't otherwise understand (thread safe refcounting, mostly) - annotations to hide detected races. Idea is for the patch to serve as a record of the detected races. Races that are considered harmful should get filed as bugs and eventually fixed, in which case their hiding-annotations can be removed from future versions of the patch. All annotations are marked with a HGANN comment line, to make them easily greppable. Currently there are 26 annotations, of which 6 describe thread-safe refcounting, 7 describe other sync events, 11 hide known-harmless races, and 2 hide non-harmless races. The complete list is below. n-i-bz means not-in-bugzilla. // HGANN 001 BEGIN 551155 HIDES_HARMLESS sEmptyHdr::mLength += 0 // HGANN 001 END /* HGANN 002 BEGIN REQD_ANN thread-safe-ref-counting */ \ /* HGANN 002 END */ \ /* HGANN 003 BEGIN REQD_ANN thread-safe-ref-counting */ \ /* HGANN 003 END */ \ // HGANN 004 BEGIN REQD_ANN thread-safe-ref-counting // HGANN 004 END /* HGANN 005 BEGIN REQD_ANN thread-safe-ref-counting */ /* HGANN 005 END */ // HGANN 006 BEGIN n-i-bz HIDES_HARMLESS on JSRuntime::various_fields // HGANN 006 END // HGANN 007 BEGIN n-i-bz HIDES_HARMLESS on JSThreadData::requestDepth // HGANN 007 END // HGANN 008 BEGIN n-i-bz HIDES_HARMLESS on JSContext::outstandingRequests // HGANN 008 END // HGANN 009 BEGIN n-i-bz HIDES_HARMLESS on LocalTZA (global var) // HGANN 009 END // HGANN 010 BEGIN REQD_ANN js_InitLock // HGANN 010 END // HGANN 011 BEGIN REQD_ANN js_FinishLock // HGANN 011 END // HGANN 012 BEGIN REQD_ANN ThinLock // HGANN 012 END // HGANN 013 BEGIN REQD_ANN ThinUnlock // HGANN 013 END // HGANN 014 BEGIN n-i-bz HIDES_HARMLESS on Assembler::nHints (global) // HGANN 014 END // HGANN 015 BEGIN n-i-bz HIDES_HARMLESS on Assembler::nHints (global) // HGANN 015 END // HGANN 016 BEGIN 559927 REQD_FIX unclear-if-real-bug-nor-if-correct-fix // HGANN 016 END // HGANN 017 BEGIN n-i-bz HIDES_HARMLESS on nsSocketTransport::mResolving // HGANN 017 END // HGANN 018 BEGIN REQD_ANN thread-safe-ref-counting // HGANN 018 END // HGANN 019 BEGIN 560579 REQD_FIX avoid bitfield race in nsHttpTransaction // HGANN 019 END // HGANN 020 BEGIN REQD_ANN PRCondVar-send (CHECKME: REQUIRED ??) // HGANN 020 END // HGANN 021 BEGIN REQD_ANN PRCondVar-recv (CHECKME: REQUIRED ??) // HGANN 021 END // HGANN 022 BEGIN n-i-bz HIDES_HARMLESS on PRMonitor::owner (RECHECK) // HGANN 022 END // HGANN 023 BEGIN n-i-bz HIDES_HARMLESS on PRThread::id (RECHECK) // HGANN 023 END // HGANN 024 BEGIN n-i-bz HIDES_HARMLESS on nsRecyclingAllocator::mFreeList // HGANN 024 END // HGANN 025 BEGIN REQD_ANN nsRecyclingAllocator::Malloc // HGANN 025 END /* HGANN 026 BEGIN REQD_ANN thread-safe-ref-counting */ /* HGANN 026 END */

Attachment #437280 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 16

•

14 years ago

Above results achieveable with Valgrind-3.6.1 or trunk. Trunk is much preferable due to significant scalability improvements in Helgrind. In either case the as-yet-uncommitted patch below is needed for correct operation. Index: helgrind/hg_main.c =================================================================== --- helgrind/hg_main.c (revision 11576) +++ helgrind/hg_main.c (working copy) @@ -3207,7 +3207,8 @@ so = map_usertag_to_SO_lookup_or_alloc( usertag ); tl_assert(so); - libhb_so_send( thr->hbthr, so, True/*strong_send*/ ); + //libhb_so_send( thr->hbthr, so, True/*strong_send*/ ); + libhb_so_send( thr->hbthr, so, False/*!strong_send*/ ); } static

Julian Seward [:jseward]

Assignee

Comment 17

•

14 years ago

Attached patch WIP patch against M-C of 24 Feb 2011 (revised) (obsolete) — Details — Splinter Review

Adds description and hiding-annotations of 6 further races shown below. Together with a small suppressions file to hide complainage to do with Gnome libraries and to do with glibc's DNS lookups, this allows a clean (no-reported-races) startup to blank page, then exit, of the browser. Note this doesn't fix any of the races. It merely continues the process of cataloguing them. // HGANN 027 BEGIN n-i-bz HIDES_UNASSESSED on ::sGlobalObserver (global var) // HGANN 027 END // HGANN 028 BEGIN n-i-bz HIDES_UNASSESSED on ns[..]::mInitialized (d-c locking) // HGANN 028 END // HGANN 029 BEGIN n-i-bz HIDES_UNASSESSED on Canary::sOutputFD (global var) // HGANN 029 END // HGANN 030 BEGIN 637461 HIDES_UNASSESSED on StartupCache::mWriteThread // HGANN 030 END // HGANN 031 BEGIN 559793 HIDES_UNASSESSED on TimerThread::mTimeoutAdjustment // HGANN 031 END // HGANN 032 BEGIN 559793 HIDES_UNASSESSED on nsTimerImpl::mArmed // HGANN 032 END

Attachment #516080 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 18

•

14 years ago

Attached file suppressions file referred to in comment 17 (obsolete) — Details

For use w/ valgrind trunk on x64-linux (Ubuntu 10.04.2)

Julian Seward [:jseward]

Assignee

Comment 19

•

14 years ago

Attached patch WIP patch against M-C of 23 Mar 2011 (obsolete) — Details — Splinter Review

Attachment #432143 - Attachment is obsolete: true

Attachment #516246 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Updated

•

14 years ago

Attachment #516248 - Attachment is obsolete: true

Julian Seward [:jseward]

Assignee

Comment 20

•

14 years ago

Attached file Suppressions to go with markup in comment #19 (for Ubuntu 10.04 64 bit) — Details

Julian Seward [:jseward]

Assignee

Comment 21

•

14 years ago

Attached patch WIP patch against M-C of 6 Apr 2011 (obsolete) — Details — Splinter Review

Attachment #521198 - Attachment is obsolete: true

Christian Holler (:decoder)

Comment 22

•

13 years ago

What's required to get this into the tree? It would be very nice to have this also from a security standpoint.

Whiteboard: [sg:want]

Christian Holler (:decoder)

Updated

•

13 years ago

Keywords: sec-want

Julian Seward [:jseward]

Assignee

Comment 23

•

13 years ago

Attached patch WIP patch against M-C of 23 Apr 2012 — Details — Splinter Review

Attachment #524238 - Attachment is obsolete: true

ISHIKAWA, Chiaki

Comment 24

•

13 years ago

Attached file Re-created patch against later C-C mozilla subdirectory (with USEHELGRIND macro) (obsolete) — Details

While I am investigating strange usage of uninitialized value (Bug 803816 - Valgrind warnings about uninitialised memory use (Thunderbird)), I tried to run thunderbird under helgrind. After trying to apply the patch automatically and saw there were many FAILED hunks, I re-created the patch against ./mozilla subdirectory of comm-central. The version of mozilla I used is (in comm-central/mozilla subdirectory) $ hg identify a517f7ea5bef+ tip I enclosed the changes into #ifdef USEHELGRIND ... #endif appropriately as much as possible. In order to enable the if-defed parts, when I ran configure and make I added -DUSHELGRIND=1 to CC and CXX macros manually. (Someone might be able to figure out how to add these to compiler flags automagically). E.g., like this in a shell script: export CC="/usr/bin/gcc -DDEBUG_4GB_CHECK -DUSEHELGRIND=1" export CXX="/usr/bin/g++ -DDEBUG_4GB_CHECK -DUSEHELGRIND=1" time make -f client.mk \ CC="/usr/bin/gcc -DDEBUG_4GB_CHECK -DUSEHELGRIND=1" \ CXX="/usr/bin/g++ -DDEBUG_4GB_CHECK -DUSEHELGRIND=1" \ $* For creating the patch, I found a few issues: However, there is one header file which is not amenable to a small patch using #if/#else/#endif like this. Also, there was an assembler source file which I left as is. For one file, already a patch has been incorporated into the repository to include the suggested mlock and so I did not bother to do anything for this file. >patching file netwerk/base/src/nsAsyncStreamCopier.cpp >Hunk #1 FAILED at 272. >1 out of 1 hunk FAILED -- saving rejects to file netwerk/base/src/nsAsyncStreamCopier.cpp.rej The following file may be a problem. I found that no code remains at all in the header file that looks like the ones that was patched. (???) So this change is not in the re-created patch at all. I have no idea what is going on here. >patching file netwerk/dns/nsHostResolver.h >Hunk #1 FAILED at 62. >1 out of 1 hunk FAILED -- saving rejects to file netwerk/dns/nsHostResolver.h.rej >Assembly Source for 64 bits:? >patching file nsprpub/pr/src/md/unix/os_Linux_x86_64.s >Hunk #1 succeeded at 26 (offset -32 lines). >Use of IF/ENDIF may not be possible in macro definition. >patching file xpcom/glue/nsISupportsImpl.h Hope this may help the inclusion of the code into the main repository. Someone ought to check the operation of thunderbird and firefox occasionally under valgrind (memgrind, helgrind, etc.)

ISHIKAWA, Chiaki

Comment 25

•

13 years ago

Attached patch Re-created patch against later C-C mozilla subdirectory (with USEHELGRIND macro) — Details — Splinter Review

(I am uploading the patch again as "patch" so that "diff" display feature can be used.) I am obsoleting the previous upload. TIA

Attachment #678282 - Attachment is obsolete: true

ISHIKAWA, Chiaki

Comment 26

•

13 years ago

I think that the portion of the patch to xpcom/base/nsAtomicRefcnt.h below got the type wrong: i.e., nsrefcnt needs to be replaced with int32_t in two places. I am running with such modification for now. -inline int32_t +#ifndef USEHELGRIND +inline nsrefcnt +#else +__attribute__((noinline)) static nsrefcnt +#endif NS_AtomicDecrementRefcnt(int32_t &refcnt) { return PR_ATOMIC_DECREMENT(&refcnt); }

Julian Seward [:jseward]

Assignee

Comment 27

•

13 years ago

Attached patch A minimalist starting-off patch, for consideration — Details — Splinter Review

This is a minimal starting-off patch, for consideration. It is the first of a series of patches which makes Gecko race-detector friendly. The patch describes the synchronisation effects caused by threadsafe ref counting, as this is not something tools can themselves infer. In effect it says that the thread that causes a 1->0 refcount transition acquires exclusive ownership of the object, so that subsequent accesses to it (to destruct it) can legitimately be done without locking. This is for the classes: base::subtle::RefCountedBase, nsHttpTransaction and (most importantly) nsISupports. The patch also tightens up the relevant feature test (--enable-valgrind) to ensure all the required headers are present. The annotations are commented in a particular way, that makes it possible to get a summary of all annotations in the code base by grepping for the string HGANN. This is important because there will eventually be about 30 ish annotations needed, most of which hide known races.

Attachment #699268 - Flags: feedback?(benjamin)

Benjamin Smedberg

Comment 28

•

13 years ago

Comment on attachment 699268 [details] [diff] [review] A minimalist starting-off patch, for consideration This is fine by me, but I think cjones is probably the best person to provide feedback, since he's my expert on concurrent programming.

Attachment #699268 - Flags: feedback?(benjamin) → feedback?(jones.chris.g)

Chris Jones [:cjones] inactive; ni?/f?/r? if you need me

Comment 29

•

13 years ago

Comment on attachment 699268 [details] [diff] [review] A minimalist starting-off patch, for consideration The approach looks fine. Do helgrind/VEX have any provisions to run the HB/HA client requests atomically with the atomic memory mutation?

Attachment #699268 - Flags: feedback?(jones.chris.g) → feedback+

Julian Seward [:jseward]

Assignee

Comment 30

•

13 years ago

(In reply to Chris Jones [:cjones] [:warhammer] from comment #29) > Do helgrind/VEX have any provisions to run the HB/HA client requests > atomically with the atomic memory mutation? No, it doesn't. In this case (to do with threadsafe refcounting) I don't think it matters much if we say that the HB event happens slightly before the atomic op and the HA happens after, so long as it doesn't report a race on the atomically modified location (mRefCnt) itself, which it indeed doesn't.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 31

•

12 years ago

Julian, let's get the minimal patch in. Is cjones' f+ enough here?

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

12 years ago

Depends on: 847764

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 32

•

12 years ago

NSPR changes moved to bug 847764.

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

12 years ago

Depends on: 848303

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 33

•

12 years ago

MFBT patch in bug 848303.

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

12 years ago

Depends on: 848305

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 34

•

12 years ago

XPCOM refcounting changes pushed to bug 848305.

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

12 years ago

Depends on: 848312

BMO Automation

Updated

•

3 years ago

Severity: normal → S3

Jens Stutte [:jstutte]

Comment 36

•

1 year ago

Hi :jseward! Am I wrong in assuming that the original goal of this effort has been largely reached by having adopted tsan ? Is there anything specific left we want to do here?

Severity: S3 → N/A

Type: defect → enhancement

Component: General → Sanitizers

Flags: needinfo?(jseward)

Jens Stutte [:jstutte]

Updated

•

6 months ago

Status: NEW → RESOLVED

Closed: 6 months ago

Flags: needinfo?(jseward)

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.