Last Comment Bug 523405 - (SQLite-OP_If) crash [@sqlite3VdbeExec ] [@sqlite3Step ] [@sqlite3DbMallocRaw ]
(SQLite-OP_If)
: crash [@sqlite3VdbeExec ] [@sqlite3Step ] [@sqlite3DbMallocRaw ]
Status: RESOLVED FIXED
: crash, topcrash
Product: Toolkit
Classification: Components
Component: Storage (show other bugs)
: Trunk
: All All
: -- critical (vote)
: mozilla1.9.3a1
Assigned To: Nobody; OK to take it and work on it
:
: Marco Bonardo [::mak]
Mentors:
http://crash-stats.mozilla.com/report...
: 524709 (view as bug list)
Depends on: SQLite3.6.16.1
Blocks:
  Show dependency treegraph
 
Reported: 2009-10-20 11:26 PDT by David Baron :dbaron: ⌚️UTC-8
Modified: 2011-06-09 14:58 PDT (History)
14 users (show)
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
beta2-fixed
.6-fixed


Attachments

Description David Baron :dbaron: ⌚️UTC-8 2009-10-20 11:26:30 PDT
The #5 trunk (3.7a1) topcrash is crashes in sqlite3VdbeExec.  It's present on 3.6b1, 3.5.*, and 3.0.* as well, although it's not as high on the list.  Crash query:

http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=sqlite3VdbeExec

The comments that are comprehensible all mention bookmarking a page.
Comment 1 David Baron :dbaron: ⌚️UTC-8 2009-10-20 11:32:52 PDT
Possibly related to bug 520541.
Comment 2 Shawn Wilsher :sdwilsh 2009-10-20 12:19:31 PDT
I think I should just backout bug 519270 at this point.  The next release of SQLite contains a fix for this crash.
Comment 3 D. Richard Hipp 2009-10-27 15:25:52 PDT
Many of the stack traces at http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5.3&query_search=signature&query_type=exact&query=sqlite3VdbeExec&date=&range_value=1&range_unit=weeks&do_query=1&signature=sqlite3VdbeExec show sqlite3VdbeExec() being called directly from nsSocketTransportService::DoPollIteration() through a pointer to a function.  See, for example http://crash-stats.mozilla.com/report/index/0f46c5d7-a838-4a85-b353-f00982091027

Even if DoPollIteration() had a reason to call SQLite, there ought to be several layers of intermediate functions before you get into sqlite3VdbeExec().  I'm guessing that the problem here is that DoPollIteration is invoking a faulty function pointer which is causing a jump into the middle of SQLite, resulting the crash.
Comment 4 Shawn Wilsher :sdwilsh 2009-10-27 15:44:33 PDT
(In reply to comment #3)
> Even if DoPollIteration() had a reason to call SQLite, there ought to be
> several layers of intermediate functions before you get into sqlite3VdbeExec().
>  I'm guessing that the problem here is that DoPollIteration is invoking a
> faulty function pointer which is causing a jump into the middle of SQLite,
> resulting the crash.
Indeed this is true.  Any idea how this is getting a bad pointer folks (bz, jduell)?
Comment 5 David Baron :dbaron: ⌚️UTC-8 2009-10-27 16:18:31 PDT
Frames can get omitted from stack traces, particularly when compilers are doing tail-call optimization, but also for other reasons.

Do these stacks make any more sense?
bp-d875c3a1-6c05-49b0-997c-c48232091021
bp-62206e5b-3214-4de4-b7eb-e27542091022
Comment 6 D. Richard Hipp 2009-10-27 16:25:10 PDT
Those latter traces do make sense.  Frames have been omitted, yes, but in ways that are explainable as "optimization" or "inlining".  But there are way too many inexplicably missing frames in http://crash-stats.mozilla.com/report/index/0f46c5d7-a838-4a85-b353-f00982091027 to blame it on the optimizer, I think.
Comment 7 Boris Zbarsky [:bz] (still a bit busy) 2009-10-27 17:40:11 PDT
The stack linked from comment 6 has some other issues too.  DoPollIteration could conceivably call into sqlite3VdbeExec if it had a dead object for that virtual function call.  But PR_GetThreadPrivate can't possibly be calling DoPollIteration....  That said, ProcessNextEvent calling DoPollIteration via nsSocketTransportService::OnProcessNextEvent does make sense...

That said, the ownership model for the SocketContext's mHandler is pretty simple: it's reference counted and the socket context holds a reference.  So unless someone else is screwing up the refcounting somewhere else, I wouldn't expect this object to be dead much.
Comment 8 Shawn Wilsher :sdwilsh 2009-10-28 10:18:51 PDT
3.5.4 was out last night, and it looks like this is still a crasher, but the stacks are different.

This is on SQLite 3.6.16 now.  URL field is updated.
Comment 9 Boris Zbarsky [:bz] (still a bit busy) 2009-10-28 10:35:14 PDT
Lots of stacks through nsNavHistory::PreparePlacesForVisitsDelete in there.
Comment 10 Marco Bonardo [::mak] 2009-10-28 10:55:35 PDT
are we sure this is not another way to hit bug 520541?
Comment 11 Shawn Wilsher :sdwilsh 2009-10-28 10:57:08 PDT
(In reply to comment #9)
> Lots of stacks through nsNavHistory::PreparePlacesForVisitsDelete in there.
Yeah - places is the biggest consumer of SQLite, so that doesn't really surprise me.
Comment 12 Shawn Wilsher :sdwilsh 2009-10-29 10:18:13 PDT
*** Bug 524709 has been marked as a duplicate of this bug. ***
Comment 13 Shawn Wilsher :sdwilsh 2009-10-29 10:20:20 PDT
bug 524709 is believed to be the same as this bug, but shows up as a different stack due to compiler optimizations being different.
Comment 14 Shawn Wilsher :sdwilsh 2009-10-29 11:04:42 PDT
From drh:
When user's try to delete more than 32K rows of history, a 16-bit integer in the SQLite code generator is overflowing.  This is the likely cause of the OP_If problem.

Sam - they can produce a build based off of 3.6.16 (currently shipping in 3.5.4) with a change that should make this crash go away (changing a 16 bit integer to a 32 bit one).  This should work around the issue, and they'll have a better fix in 3.6.20 which we can decided to take on 1.9.1 on a later date.  Do we want them to make this build?
Comment 15 Samuel Sidler (old account; do not CC) 2009-10-29 15:15:23 PDT
(In reply to comment #14)
> Do we want them to make this build?

Yes, please.

When you file the "upgrade sqlite" bug, please be sure to mention that this is the only thing that changed in the new version we're taking and that we're specifically taking it to fix this crash. If you have a diff, that'd help Dan feel even more comfortable. :)

Thanks a lot for looking into this! (You and drh, both.)
Comment 16 D. Richard Hipp 2009-10-29 20:03:04 PDT
The diff for our candidate change is here:

   http://www.sqlite.org/src/vinfo/65a1f1334

The above passes all of our regression tests and seems to clear the problem as I was able to reproduce it.  But we'd like to spend a little more time with it looking for related problems.  We'll tag the above as a release if we don't find anything wrong with it by noon (EDT) tomorrow (2009-10-30).
Comment 17 Samuel Sidler (old account; do not CC) 2009-10-29 20:13:44 PDT
Yeah, no hurry on the tag. We don't need it ASAP; code freeze is scheduled for November 10 and this is a pretty simple fix.
Comment 18 D. Richard Hipp 2009-10-30 06:46:20 PDT
SQLite version 3.6.16.1 is now available at

   http://www.sqlite.org/sqlite-amalgamation-3.6.16.1.tar.gz

Version 3.6.16.1 is version 3.6.16 with the simple patch shown above.  The
use of version 3.6.16.1 should resolve this issue.

SQLite version 3.6.20 will also resolve this issue, though in a different way.
Comment 20 Shawn Wilsher :sdwilsh 2009-10-30 13:19:35 PDT
Filed bug 525539 to upgrade to SQLite 3.6.16.1 on mozilla-central, 1.9.2, and 1.9.1.
Comment 21 Shawn Wilsher :sdwilsh 2009-10-30 14:02:54 PDT
Fixed in mozilla-central via bug 525539.
Comment 22 Shawn Wilsher :sdwilsh 2009-11-02 11:56:16 PST
Fixed in mozilla-1.9.2 via bug 525539 as well.
Comment 23 Shawn Wilsher :sdwilsh 2009-11-02 15:50:10 PST
Fixed in mozilla1.9.1 via bug 525539 too!

Note You need to log in before you can comment on or make changes to this bug.