Closed Bug 581946 Opened 14 years ago Closed 14 years ago

Firefox segfaulting shortly after startup [@ sqlite3VdbeExec] in sqlite3.c:54321

Categories

(Core :: SQLite and Embedded Database Bindings, defect)

1.9.2 Branch
x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED

People

(Reporter: chris.sherlock79, Unassigned)

References

Details

(Keywords: crash)

Crash Data

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.7) Gecko/20100715 Ubuntu/10.04 (lucid) Firefox/3.6.7
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2.7) Gecko/20100715 Ubuntu/10.04 (lucid) Firefox/3.6.7

I've actually reported this on Ubuntu Launchpad at https://bugs.launchpad.net/ubuntu/+source/firefox/+bug/610039.

Basically, I'm finding that every three minutes or so I'm finding that Firefox is segfaulting unexpectedly. Now I'm really not sure the exact cause of this, but I enabled Apport and installed the debug symbols and the stacktrace top that is reported to Launchpad by apport is:

StacktraceTop:
 sqlite3VdbeExec (p=<value optimized out>) at sqlite3.c:54321
 sqlite3_step (pStmt=0xadd454e8) at sqlite3.c:50603
 mozilla::storage::AsyncExecuteStatements::executeStatement (
 mozilla::storage::AsyncExecuteStatements::executeAndProcessStatement (this=0xaeaea4c0, aStatement=0xadd454e8, aLastStatement=false)
 mozilla::storage::AsyncExecuteStatements::bindExecuteAndProcessStatement (this=0xaeaea4c0, aData=@0xacdf52e8, aLastStatement=false)

Interestingly, a few days ago I noticed I couldn't post any links to Facebook, as once I typed in the URL into the URL bar in the Facebook status update and clicked on the Attach button it would submit the link, but then it wouldn't go any further but it DID change the URL in the address bar. 

I couldn't work out what was causing this, so I decided to clear all my cached settings using the normal ctrl+shift+delete (chose everything). After that I started getting segfaults every few minutes.

To get around this issue, as it appears to be something is going badly wrong in sqlite, I moved ~/.mozilla to ~/.mozilla.backup, and restarted Firefox. This has now stopped occuring. 

I'm including the Stacktrace that apport gathered in case this is useful - unlike many stacktraces I've captured in the past, this one actually has symbols :-) Hope this is helpful!

Reproducible: Always
Attached file Apport stacktrace
Attached file Thread stack trace
Version: unspecified → 1.9.2 Branch
Keywords: crash
Is your build using the sqlite that comes with mozilla itself or the system sqlite ?

(about:buildconfig should contain that info)
Seems to be the mozilla one, as the configure arguments include --disable-system-sqlite

Configure arguments:

--build=i486-linux-gnu --prefix=/usr '--includedir=/usr/include' '--mandir=/usr/share/man' '--infodir=/usr/share/info' --sysconfdir=/etc --localstatedir=/var '--libexecdir=/usr/lib/firefox' --disable-maintainer-mode --disable-dependency-tracking --disable-silent-rules --srcdir=. --enable-optimize --enable-ipc --enable-tests --enable-mochitest --disable-system-cairo --disable-system-sqlite --without-system-nspr --without-system-nss --disable-debug --with-user-appdir=.mozilla --without-system-jpeg --without-system-zlib --enable-system-myspell --disable-crashreporter --disable-composer --disable-elf-dynstr-gc --disable-gtktest --disable-install-strip --disable-installer --disable-ldap --disable-mailnews --disable-profilesharing --disable-strip --disable-strip-libs --disable-tests --disable-mochitest --disable-updater --disable-xprint --enable-application=browser --enable-canvas --enable-default-toolkit=cairo-gtk2 --enable-gnomevfs --enable-pango --enable-postscript --enable-svg --enable-mathml --enable-xft --enable-xinerama --enable-extensions=default,-reporter --enable-safe-browsing --enable-single-profile --with-distribution-id=com.ubuntu --enable-startup-notification --enable-official-branding
Incidentally, I downloaded the source package and in ./db/sqlite3/src/sqlite3.c it says it's 3.6.22. 

When I look at the line of code it seems to be faulting on, it's:

   /* The following assert is true in all cases accept when
    ** the database file has been corrupted externally.
    **    assert( u.am.zRec!=0 || u.am.avail>=u.am.payloadSize || u.am.avail>=9 ); */
    u.am.szHdr = getVarint32((u8*)u.am.zData, u.am.offset);

Sorry if I'm just spamming up the works here, btw!
If I had to guess, this would be the places database segfaulting. :(
Anything special about the location of your profile?  Is it on a network drive?
Nope, I'm running on a laptop!

Do you want me to provide a data file?
It could be useful, but we'll have you hold off on that for now.
So reading through the sqlite3.c file, basically it seems to be reading in the datafile header? 

The union exposes a struct:

	am = {
    	payloadSize = 9862, 
		payloadSize64 = 819804799658280, 
		p1 = 1, 
		p2 = 2, 
    	pC = 0xad7cbae8, 
		zRec = 0x0, 
		pCrsr = 0xad7cbb60, 
		aType = 0xad7cbb40, 
    	aOffset = 0xad7cbb50, 
		nField = 4, 
		len = 3, 
		i = 2, 
		zData = 0x0, 
	    pDest = 0xad8f54a8, 
		sMem = {
			u = {
				i = 0, 
				nZero = 0, 
				pDef = 0x0, 
		        pRowSet = 0x0, 
				pFrame = 0x0
			}, 
			r = 0, 
			db = 0x0, 
			z = 0x0, 
			n = 0, 
			flags = 0, 
			type = 0 '\0', 
			enc = 0 '\0', 
			xDel = 0, 
			zMalloc = 0x0
		}, 
		zIdx = 0xb426e29d "", 
		zEndHdr = 0xb426e29d "", 
		offset = 3, 
		offset64 = 14, 
    		szHdr = 1, 
		avail = 0, 
		pReg = 0xb4238618
	}, 

Now when I look at the area this is segfaulting, it is:

    /* Figure out how many bytes are in the header */
    if( u.am.zRec ){
      u.am.zData = u.am.zRec;
    }else{
      if( u.am.pC->isIndex ){
        u.am.zData = (char*)sqlite3BtreeKeyFetch(u.am.pCrsr, &u.am.avail);
      }else{
        u.am.zData = (char*)sqlite3BtreeDataFetch(u.am.pCrsr, &u.am.avail);
      }
      /* If KeyFetch()/DataFetch() managed to get the entire payload,
      ** save the payload in the u.am.pC->aRow cache.  That will save us from
      ** having to make additional calls to fetch the content portion of
      ** the record.
      */
      assert( u.am.avail>=0 );
      if( u.am.payloadSize <= (u32)u.am.avail ){
        u.am.zRec = u.am.zData;
        u.am.pC->aRow = (u8*)u.am.zData;
      }else{
        u.am.pC->aRow = 0;
      }
    }
    /* The following assert is true in all cases accept when
    ** the database file has been corrupted externally.
    **    assert( u.am.zRec!=0 || u.am.avail>=u.am.payloadSize || u.am.avail>=9 ); */
    u.am.szHdr = getVarint32((u8*)u.am.zData, u.am.offset);

Should u.am.zData ever be zero? It seems that in this case, it is for some reason. 

Even if the data file is corrupted (not sure how this is occuring!), shouldn't it handle this a little more gracefully?

Apologies if I've totally misread this code.
Severity: major → critical
Summary: Firefox segfaulting every 3 minutes or so at sqlite3VdbeExec in sqlite3.c:54321 → Firefox segfaulting every 3 minutes or so [@ sqlite3VdbeExec] in sqlite3.c:54321
I believe that Shawn is correct - it's places.sqlite that's having this issue. 

To test, what I did was to move my .mozilla folder to mozilla.bad, then I restarted firefox and allowed it to create the new profile. I waited about 3-4 minutes and surfed Facebook for a bit (not because of the site, but because Facebook is one of the greatest timewasters known to humankind), and it didn't segfault. 

Then I copied places.sqlite into my newly created profile - sure enough Firefox segfaulted. So I decided to do a timing test, and in fact it's crashing far earlier than 3 minutes. Also, the app seems to freeze before the crash. 

chris@ubuntu:~$ date && mozilla && date
Thu Jul 29 00:17:56 EST 2010
WARNING: pipe error (3): Connection reset by peer: file ./src/chrome/common/ipc_channel_posix.cc, line 404
Segmentation fault (core dumped)
chris@ubuntu:~$
OK, I've just fired up gdb and I'm examining some variables. 

For some unknown reason, p->db is 0x0. 

Can I suggest that we do a check to see if this is NULL right at the start, and if it is the raise an exception?
Summary: Firefox segfaulting every 3 minutes or so [@ sqlite3VdbeExec] in sqlite3.c:54321 → Firefox segfaulting shortly after startup [@ sqlite3VdbeExec] in sqlite3.c:54321
OK, I've sent in places.sqlite to one of the sqlite developers... rather not add it to this bug :-)
I've rebuilt firefox and have removed compiler optimization. 

Here's what I'm getting:


(gdb) backtrace
#0  0xb5a5c78c in sqlite3VdbeExec (p=0xa45fe168) at sqlite3.c:54321
#1  0xb5a58003 in sqlite3Step (p=0xa45fe168) at sqlite3.c:50603
#2  0xb5a581f8 in sqlite3_step (pStmt=0xa45fe168) at sqlite3.c:50662
#3  0xb77dbcd1 in mozilla::storage::AsyncExecuteStatements::executeStatement (this=0xad1af8c0, aStatement=0xa45fe168)
    at mozStorageAsyncStatementExecution.cpp:330
#4  0xb77dbbae in mozilla::storage::AsyncExecuteStatements::executeAndProcessStatement (this=0xad1af8c0, aStatement=0xa45fe168, aLastStatement=false)
    at mozStorageAsyncStatementExecution.cpp:280
#5  0xb77dbb10 in mozilla::storage::AsyncExecuteStatements::bindExecuteAndProcessStatement (this=0xad1af8c0, aData=..., aLastStatement=false)
    at mozStorageAsyncStatementExecution.cpp:262
#6  0xb77dc686 in mozilla::storage::AsyncExecuteStatements::Run (this=0xad1af8c0) at mozStorageAsyncStatementExecution.cpp:551
#7  0xb7b129f5 in nsThread::ProcessNextEvent (this=0xae8788d0, mayWait=1, result=0xae7ff26c) at nsThread.cpp:527
#8  0xb7ac1f2a in NS_ProcessNextEvent_P (thread=0xae8788d0, mayWait=1) at nsThreadUtils.cpp:250
#9  0xb7b11e94 in nsThread::ThreadFunc (arg=0xae8788d0) at nsThread.cpp:254
#10 0xb6761401 in _pt_root (arg=0xae843530) at ptthread.c:228
#11 0xb7fb296e in start_thread (arg=0xae7ffb70) at pthread_create.c:300
#12 0xb5b84a4e in clone () at ../sysdeps/unix/sysv/linux/i386/clone.S:130
(gdb) frame 0
#0  0xb5a5c78c in sqlite3VdbeExec (p=0xa45fe168) at sqlite3.c:54321
54321	    u.am.szHdr = getVarint32((u8*)u.am.zData, u.am.offset);
(gdb) print *p
$8 = {db = 0xb43b8118, pPrev = 0xa45fe248, pNext = 0xa45fd608, nOp = 38, nOpAlloc = 51, aOp = 0xa3d55008, nLabel = 6, nLabelAlloc = 6, aLabel = 0x0, 
  apArg = 0xa3d55350, aColName = 0xa9135198, pResultSet = 0x0, nResColumn = 4, nCursor = 4, apCsr = 0xa3d55358, errorAction = 2 '\002', okVar = 0 '\000', 
  nVar = 2, aVar = 0xa3d55300, azVar = 0xa3d55350, magic = 3186757027, nMem = 13, aMem = 0xa3a06be0, cacheCtr = 1, pc = 0, rc = 0, zErrMsg = 0x0, 
  explain = 0 '\000', changeCntOn = 0 '\000', expired = 0 '\000', minWriteFileFormat = 255 '\377', inVtabMethod = 0 '\000', usesStmtJournal = 0 '\000', 
  readOnly = 1 '\001', isPrepareV2 = 1 '\001', nChange = 0, btreeMask = 1, startTime = 0, aMutex = {nMutex = 0, aBtree = {0x0 <repeats 11 times>}}, 
  aCounter = {0, 0}, 
  zSql = 0xa46fea18 "SELECT h.url, v.visit_date, h.hidden, 0 AS whole_entry FROM moz_places h JOIN moz_historyvisits v ON h.id = v.place_id WHERE v.visit_date < :visit_date ORDER BY v.visit_date ASC LIMIT :max_expire", pFree = 0xa3a06c08, nFkConstraint = 0, nStmtDefCons = 0, iStatement = 0, pFrame = 0x0, 
  nFrame = 0, expmask = 0}
(gdb) print u.am
$9 = {payloadSize = 13, payloadSize64 = 819804719240248, p1 = 1, p2 = 2, pC = 0xa3a2ce08, zRec = 0x0, pCrsr = 0xa3a2ce80, aType = 0xa3a2ce60, 
  aOffset = 0xa3a2ce70, nField = 4, len = 3, i = 2, zData = 0x0, pDest = 0xa3a06ca8, sMem = {u = {i = 0, nZero = 0, pDef = 0x0, pRowSet = 0x0, 
      pFrame = 0x0}, r = 0, db = 0x0, z = 0x0, n = 0, flags = 0, type = 0 '\000', enc = 0 '\000', xDel = 0, zMalloc = 0x0}, zIdx = 0xb1089071 "", 
  zEndHdr = 0xb1089071 "", offset = 3, offset64 = 14, szHdr = 1, avail = 0, pReg = 0xb1039708}

(gdb) print p->zSql
$10 = 0xa46fea18 "SELECT h.url, v.visit_date, h.hidden, 0 AS whole_entry FROM moz_places h JOIN moz_historyvisits v ON h.id = v.place_id WHERE v.visit_date < :visit_date ORDER BY v.visit_date ASC LIMIT :max_expire"
(gdb) 

Now have been in discussions with one of the sqlite3 developers, and my places.sqlite has a corrupted index. Next step: run this in sqlite3 directly to see if I can repro.
This was a bug in SQLite where it failed to detect a corrupt index in a
database file, tried to use that index, and subsequently segfaulted.
Changes to SQLite to fix the problem can be seen here:

    http://www.sqlite.org/src/ci/83395a3d24

Note that this problem can only appear if an SQLite database file is
corrupted in a very specific way.  There is a very low probability of
hitting this bug, we believe, though if a database file does become
corrupt and the corruption takes the very specific form that is required
to express this bug, then the bug will be hit over and over.

The question of how the database file became corrupt in the first place
is a whole other issue.  In the absence of further information, the
SQLite developers will blame a power loss on Ext3 with barrier=0. :-)

The change above will appear in the 3.7.1 release of SQLite.  Prerelease
snapshots are available (if desired) from

    http://www.sqlite.org/draft/download.html

Many thanks to Chris Sherlock for coming up with a reproducible test case
to this problem!
More thanks to the sqlite team here than me :-) You guys rock!

The workaround for this issue, incidentally, was to fix the broken index, which
can be done as follows:

chris@ubuntu:~$ cd /home/chris/.mozilla/firefox/1u64q3v3.default/
chris@ubuntu:~/.mozilla/firefox/1u64q3v3.default$ sqlite3 places.sqlite 
SQLite version 3.6.22
Enter ".help" for instructions
Enter SQL statements terminated with a ";"
sqlite> reindex;
I've documented a way of reindex all the data files at http://randomtechnicalstuff.blogspot.com/2010/07/how-to-debug-segfaults-in-ubuntu.html 

for i in *.sqlite; do echo "Reindexing $i"; echo "reindex;" | sqlite3 $i; done
Status: UNCONFIRMED → NEW
Ever confirmed: true
Depends on: SQLite3.7.1
Status: NEW → UNCONFIRMED
Ever confirmed: false
Target Milestone: --- → mozilla2.0b3
undoing effects of firefox session restore.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Target Milestone: mozilla2.0b3 → ---
This is logged in the sqlite database at http://www.sqlite.org/src/info/168d0f7176
Fixed by bug 583611
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Crash Signature: [@ sqlite3VdbeExec]
Product: Toolkit → Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: