Closed Bug 1153409 Opened 9 years ago Closed 3 years ago

TSan: data race db/sqlite3/src/sqlite3.c:50742 walTryBeginRead

Categories

(Toolkit :: Storage, defect, P3)

x86_64
Linux
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: froydnj, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tsan])

Attachments

(1 file)

The attached logfile shows a thread/data race detected by TSan (ThreadSanitizer).

* Specific information about this bug

This race seems to be a classic read/write between two different threads without proper locking.  I don't understand the stacks here; it looks like the previous write on the thread is actually doing things with the database while a different statement is executing on a different thread?  That seems dodgy.

What I don't know is whether this is an sqlite bug (insufficient internal locking) or a storage bug (improperly locked usage of sqlite), so hopefully somebody can look more closely at this.  ni?'ing Andrew.

* General information about TSan, data races, etc.

Typically, races reported by TSan are not false positives, but it is possible that the race is benign. Even in this case though, we should try to come up with a fix unless this would cause unacceptable performance issues. Also note that seemingly benign races can possibly be harmful (also depending on the compiler and the architecture) [1][2].

If the bug cannot be fixed, then this bug should be used to either make a compile-time annotation for blacklisting or add an entry to the runtime blacklist.

[1] http://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
[2] _How to miscompile programs with "benign" data races_: https://www.usenix.org/legacy/events/hotpar11/tech/final_files/Boehm.pdf
Flags: needinfo?(bugmail)
We know about this race, and we think that it is benign.

There is structure stored in shared memory (the *-shm file) that is read without taking any type of read-lock. So a write may occur at the same time as the read. We think this is safe because:

1) The structure has a 64-bit checksum attached to it.

2) There are two identical copies of the structure maintained in the shared-memory. 

3) When reading, SQLite reads the first copy, invokes a memory-barrier operation, then reads the second. A read is only valid if both copies are identical and the checksums match the rest of the structure. If a concurrent write is happening, the reader does not care if it sees the "old" or "new" version of the structure - only that it gets a valid version.

4) When the structure is written, the writer writes the first copy, invokes a memory-barrier operation, then writes the second.
Thanks for the explanation.

I don't have enough compute power to try to verify that investigation at the moment.  It does sound like it describes bug 1153415 better than this bug, though; the comments and code in walIndexTryHdr sound exactly like the above description.

The accesses for this bug are for WalCkptInfo::aReadMark, and the comments prior to that structure say, in part:

** We assume that 32-bit loads are atomic and so no locks are needed in
** order to read from any aReadMark[] entries.

which at least TSan is unhappy about.
(clearing my needinfo since RyanVM did the right thing and handed this off to the most excellent sqlite developers.)
Flags: needinfo?(bugmail)
I've made a mistake in comment #1. That explanation does actually apply to bug 1153415, not to this one. Thanks for picking up on that!

Anyway, we think this one is benign as well.

The issue in this case is an entry in the aReadMark[] array - an array of integers also stored in shared memory. Each member of this array is logically protected by a posix file lock - SHARED for reading and EXCLUSIVE for writing. In order to open a read-transaction on the database, SQLite needs to take a SHARED lock on an aReadMark[] array entry that contains an appropriate integer value.

To do this, SQLite first checks the values of all the aReadMark[] slots to see if any of them already contain a suitable value. It does this without obtaining any locks (this is the bit Tsan doesn't like). Here:

  http://www.sqlite.org/src/artifact/878c8e1a51cb2?ln=2240

If one of the values does look suitable, then SQLite takes the SHARED lock and then checks that the value has not changed in the meantime. If it has, the whole process starts over. A comment describing this is here, the actual code is just above and just below the comment:

  http://www.sqlite.org/src/artifact/878c8e1a51cb2?ln=2273-2275

So the unsafe read is really just an optimization. Even if we do read a random value from aReadMark[i] at that point the second check, made safely after the SHARED lock is obtained, should ensure that things work properly.

We could of course fix this by protecting line 2240 linked above with a SHARED lock on the aReadMark[] slot. But this part of the code is on the critical path and the posix locks we use are quite expensive.
I agree this comment is "interesting": ( http://www.sqlite.org/src/artifact/878c8e1a51cb2?ln=361-362 )

** We assume that 32-bit loads are atomic and so no locks are needed in
** order to read from any aReadMark[] entries.

It relates to code here:

  http://www.sqlite.org/src/artifact/878c8e1a51cb2?ln=1733

The code is assuming that when it reads a 32-bit value from aReadMark[] it will see a value that was valid at some point since the most recent memory-barrier operation. i.e. if some other client has changed the value from 1 to 2, we might read 1, or might read 2, but will not see 0.

If this assumption does not hold, a database reader might see a spurious SQLITE_CORRUPT report.

This particular piece of code is not as performance critical as that discussed above. So if this is a real problem, it's possible that we could fix it by protecting the read with the posix SHARED lock. That will take some analysis though.

Any opinions on this?
32-bit loads are atomic, but compilers are free to store temporary values anywhere as long as such an optimization wouldn't affect the behavior of a "race-free" program. So you might see 1, 2, or anything... It just depends on what kind of tricks the compiler gets up to.
(Check the links at the end of comment 0... They're very depressing.)
They're always accessed as part of a "volatile" structure, FWIW. i.e.

    volatile WalCkptInfo *pInfo = walCkptInfo(pWal);

    /* Unprotected read of aReadMark[] */
    if( pInfo->aReadMark[0] > mxFrame ) { ...

I think our reasoning was that the "volatile" keywords should stop the compiler from reordering things, and the memory-barrier operations limit how out-of-date a value we might read as a result of hardware effects.
(In reply to Dan Kennedy from comment #8)
> I think our reasoning was that the "volatile" keywords should stop the
> compiler from reordering things, and the memory-barrier operations limit how
> out-of-date a value we might read as a result of hardware effects.

Where are the memory barrier operations coming from?  I guess this is supposed to be from unixShmLock:

http://www.sqlite.org/src/info/bd7b5374a279f8fc0f0cc7d827709117e4293268?txt=1&ln=5292

and the mutex lock and unlock operations are implicitly providing memory barriers:

http://www.sqlite.org/src/info/bd7b5374a279f8fc0f0cc7d827709117e4293268?txt=1&ln=5319
http://www.sqlite.org/src/info/bd7b5374a279f8fc0f0cc7d827709117e4293268?txt=1&ln=5392

?

Also, comment 5 makes it sound like these values could be written outside of locked regions:

> The code is assuming that when it reads a 32-bit value from aReadMark[] it will see a value that was
> valid at some point since the most recent memory-barrier operation. i.e. if some other client has 
> changed the value from 1 to 2, we might read 1, or might read 2, but will not see 0.

At least, the "if some other client has changed the value..." can be interpreted as if the other client has changed the value after the memory barrier has completed.  Is that what you meant?  Or do writes always occur under locks and the reads are assumed to be OK with stale data (e.g. if the write has completed but the memory barrier hasn't yet executed)?
> Where are the memory barrier operations coming from?

In practice, from sqlite3_mutex_enter(), as you say. wal.c assumes any call to sqlite3OsLock(), ShmLock() or ShmBarrier() is a memory barrier.

> Also, comment 5 makes it sound like these values could be written outside of locked regions:
> ...
> At least, the "if some other client has changed the value..." can be interpreted as if the other
> client has changed the value after the memory barrier has completed.  Is that what you meant? 
> Or do writes always occur under locks and the reads are assumed to be OK with stale data (e.g. if
> the write has completed but the memory barrier hasn't yet executed)?

The latter - writes to the aReadMark[] array are always protected by a posix write-lock.

However, SQLite reads from the aReadMark[] array without a corresonding posix read-lock in two cases:

1) In walTryBeginRead(). This is the case that Tsan has caught in the stack trace attached to this bug. The explanation is in comment 4 - the unsafe read is an optimistic optimization. The same read is repeated safely after the read-lock is obtained and if the results don't match the earlier unsafe read, the process starts over.

2) In walCheckpoint() here:

  http://www.sqlite.org/src/artifact/878c8e1a51cb2?ln=1733

That block of code needs to set mxSafeFrame to a value that is equal to or less than all values stored in the aReadMark[] array. Cutting out the bit where it attempts to lock and set the aReadMark[] slot, the code is equivalent to:

    mxSafeFrame = pWal->hdr.mxFrame;
    for(i=i; i<WAL_READER; i++){
      u32 y = pInfo->aReadMark[i];
      if( y<mxSafeFrame ) mxSafeFrame = y;
    }

When this code is executed, the client is holding POSIX locks that ensure that values stored in the aReadMark[] array may not be decreased by other clients. They may be increased though. So we think this code is safe assuming that all values read from the pInfo->aReadMark[] array are "real" values - either the value that was in the aReadMark[] slot when the POSIX locks were obtained, or a value written by a client after that point.

But, if we were to read some arbitrary value store in aReadMark[] by a compiler on the basis that it would not bother a race-free program as suggested in comment 6, then we would have problems.
> I think our reasoning was that the "volatile" keywords should stop the compiler from reordering things

The linked-to documents in comment 0 make it clear that you can't trust "volatile" to give you any help at all at avoiding data races, alas.
Priority: -- → P3
See Also: → 1471041

Per this comment in the suppression list, these races are probably benign and are permanent. If this becomes an issue we would want to file an upstream bug.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: