Crashes after Firefox startup on OS/2, possible storage problem (regressed between Jan 3rd and Jan 7th)

RESOLVED FIXED

Status

()

Toolkit
Storage
--
critical
RESOLVED FIXED
11 years ago
11 years ago

People

(Reporter: Peter Weilbacher, Assigned: sdwilsh)

Tracking

({crash})

Trunk
x86
OS/2
crash
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

495.99 KB, application/x-gzip
Peter Weilbacher
: review+
Mike Schroepfer
: approval1.9+
Details
(Reporter)

Description

11 years ago
When starting up the OS/2 nightly with checkout date 2008-01-07 13:00 it crashes soon after startup in XUL.DLL. The nightly before was 2008-01-03 06:00 and didn't crash.

I now built a debug version (code checked out between 2008-01-10 11:17 and 11:24) and because that doesn't have libxul it crashes in TKITCMPS.DLL. I think this is actually a side effect, because when running I see NS_ENSURE_SUCCESS failures about opening SQLite files. I added a printf in mozStorageConnection::Initialize() to see the filename and with that the last few lines of console output are

aDatabaseFile=M:\MOZCOMPILE\FX_D\DIST\BIN\Mozilla\Firefox\Profiles\ssmtzlbo.default\places.sqlite
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file M:/mozcompile/mozilla/storage/src/mozStorageService.cpp, line 139
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file M:/mozcompile/mozilla/toolkit/components/places/src/nsNavHistory.cpp, line 524
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file M:/mozcompile/mozilla/toolkit/components/places/src/nsNavHistory.cpp, line 326
JavaScript error: , line 0: uncaught exception: [Exception... "Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIDocShellHistory.useGlobalHistory]"  nsresult: "0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: chrome://browser/content/browser.js :: prepareForStartup :: line 727"  data: no]
aDatabaseFile=M:\MOZCOMPILE\FX_D\DIST\BIN\Mozilla\Firefox\Profiles\ssmtzlbo.default\urlclassifier3.sqlite
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file M:/mozcompile/mozilla/storage/src/mozStorageService.cpp, line 139
WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005: file M:/mozcompile/mozilla/toolkit/components/url-classifier/src/nsUrlClassifierDBService.cpp, line 2034
###!!! ASSERTION: Unable to open database: 'Error', file M:/mozcompile/mozilla/toolkit/components/url-classifier/src/nsUrlClassifierDBService.cpp, line 1890
###!!! ASSERTION: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../../../dist/include/xpcom/nsCOMPtr.h, line 868

The only bug about storage/SQLite in the regression range is bug 408914, although I don't see why disabling async should have this effect on OS/2. I tried to step through this in the debugger but the symbols must be messed up in SQLite because it stepped the wrong lines...
(Reporter)

Comment 1

11 years ago
Indeed, backing out the two patches from bug 408914 fixes the issue, then disabling async again by patching attachment 293774 [details] [diff] [review] back in gives me the crash on startup.

Shawn, does the synchronous operation make use of SQLite features that the async version before didn't? Just to get a starting point to see what could be broken on OS/2.
(Assignee)

Comment 2

11 years ago
not that I know of
(Reporter)

Comment 3

11 years ago
OK, this seems to be a OS/2 problem that has to do with the thread implementation in SQLite. I found out that pTsdro->pBtree==NULL in sqlite3BtreeOpen(). Not sure what that means, too tired now to continue debugging. Perhaps building with SQLITE_OMIT_SHARED_CACHE (and deactivating any relevant calls in storage) could be a _temporary_ option.

Comment 4

11 years ago
Peter, FYI Andy MacIntyre (Python OS/2) offers a standalone sqlite OS/2 build on his homepage. He made a handful of patches to sqlite.{h,c} and shell.c. Maybe you want to have a look at it, if any of these patches would help us here or could be useful in general (the patches are contained in the zip file).
(Assignee)

Comment 5

11 years ago
Those changes need to be pushed upstream because I will quickly r- any patch to the sqlite files.

The only way I'd accept a patch is if the sqlite folks won't take the patch.
(Reporter)

Comment 6

11 years ago
Well, it would be difficult to apply any patches upstream, given that upstream has moved to 3.5.x now while we are stuck with 3.4.2 at the moment... In any case, those patches were already fixed upstream, but some of them only in 3.5.4. I don't think any of them is relevant to the problem at hand (it was mostly stuff to get building with other compilers).
(Assignee)

Comment 7

11 years ago
We upgraded to 3.5.4 last night ;)
(Reporter)

Comment 8

11 years ago
OK, so all my debugging was wasted time...
(Assignee)

Comment 9

11 years ago
Sorry - it was sorta a rushed decision to upgrade.  Hopefully the issue is fixed now.
(Reporter)

Comment 10

11 years ago
Well, now it crashes even before showing the window, saying

LIBC PANIC!!
_um_free_maybe_lock: Tried to free block twice - block=20143600 lock=0x1

I suspected something like that would happen because until now we haven't gotten the Tcl test system of SQLite running on OS/2. Of course some OS/2 fixes went into SQLite CVS _after_ the 3.5.4 release, but I don't think they are related to this. Will test that.
(Reporter)

Comment 11

11 years ago
Created attachment 296701 [details] [diff] [review]
merge post 3.5.4 OS/2 changes into in-tree SQLite

This patch corresponds to 4646, 4647, 4648 as documented in the SQLite Timeline at <http://www.sqlite.org/cvstrac/timeline> on Sunday, 2007-Dec-30 and somewhat surprisingly indeed fixes the Firefox startup problem. The part of the patch in os2GetTempname() actually does that but for completeness I would like all three parts in our tree.
Assignee: nobody → mozilla
Status: NEW → ASSIGNED
Attachment #296701 - Flags: review?(comrade693+bmo)
(Assignee)

Comment 12

11 years ago
FYI - sqlite will be making a branch for us with these changes.  See http://www.mail-archive.com/sqlite-users%40sqlite.org/msg30618.html and related messages in the thread.
(Reporter)

Comment 13

11 years ago
I don't really see how that helps. The amalgamation is not in their CVS. I also don't see why that's necessary. When you upgrade next time (and perhaps CC me on the relevant bug beforehand or at least discuss it in the agenda of a status meeting!) you just copy the files over the ones in your CVS tree and check them in. This isn't even something that you would need to put in README.MOZILLA...
(Assignee)

Comment 14

11 years ago
The amalgamation is part of their release.  The issues is that we tag the dll with a version number and other information.  If we patch it ourselves, nothing is correct with that.

I'm a student who actually has very little time to do Mozilla work.  Attending the meetings is next to impossible for me.  CC'ing you is possible, but I don't usually think of it.  Might I suggest you watch the QA contact for this component (it's very low traffic).
(Reporter)

Comment 15

11 years ago
(In reply to comment #14)
> The amalgamation is part of their release.

Yes, but you were only talking about a branch, seems quite strange to get the to produce a release which is changed only in sources that they neither you nor they build binaries for! It's just holding us up a few more days of getting the fix.

> The issues is that we tag the dll
> with a version number and other information.  If we patch it ourselves, nothing
> is correct with that.

What does that mean "nothing is correct"? You still have exactly 3.5.4 for all the other platforms, the changes are #ifdef'd out for everything but OS/2, so you won't hurt anybody (not even OS/2 users who unfortunately hardly use SQLite for anything else).

> I'm a student who actually has very little time to do Mozilla work.  Attending
> the meetings is next to impossible for me.

So it is for me. But you could still add it to the agenda of ongoing work.

> CC'ing you is possible, but I don't
> usually think of it.  Might I suggest you watch the QA contact for this
> component (it's very low traffic).

Nice suggestion, done.
(Assignee)

Comment 16

11 years ago
(In reply to comment #15)
> Yes, but you were only talking about a branch, seems quite strange to get the
> to produce a release which is changed only in sources that they neither you nor
> they build binaries for! It's just holding us up a few more days of getting the
> fix.
Sorry, but we don't want to ship anything that isn't official from the SQLite folks.  I'm sure that from your perspective it seems strange.

> So it is for me. But you could still add it to the agenda of ongoing work.
I believe this last decision was done after the meeting for the week, and it landed well before the next one, so it wouldn't have helped in this case.  In the past I've posted to m.dev.planning requesting for feedback and gotten nothing.  I stopped doing that because it seemed to me like nobody cared.
(Reporter)

Comment 17

11 years ago
(In reply to comment #16)
> Sorry, but we don't want to ship anything that isn't official from the SQLite
> folks.

Why not and who is we, i.e. who really makes this decision and how? And sure, if you cannot give a good reason for this, it seems strange indeed. This all seems like somebody it trying to cause extra work for you, me, and the SQLite folks for no good reason...
(Assignee)

Comment 18

11 years ago
we = myself and engineering.  I talked to schrep about how he wanted to handle this last night.  Mozilla actually has a support contract with the SQLite folks for this very reason.
(Reporter)

Comment 19

11 years ago
It's really weird that you need a support contract for fixes that _I_ do, unpaid in my spare time...

Do you have an update of how long this SQLite release will take? I would really like to produce some some usable nightlies before the next beta. Given that the freeze is supposed to be in 5 days or so this is somewhat urgent!
(Assignee)

Comment 20

11 years ago
working on it - once I get the build from them, I'll get a patch, get review from you, then we'll get approval.  If you can land it before me, that'll be fine.
(Reporter)

Comment 21

11 years ago
Shawn, any updates on SQLite 3.5.4.1?

[I just hope you have other communication channels with drh than the SQLite list, because the messages I saw there about this were not really clear to the point that you need a new release urgently, and I haven't seen anything more on that matter since last weekend.]
(Assignee)

Comment 22

11 years ago
Yes, I am talking with them privately via e-mail.  My e-mail was flakey at best for a short period though, so I pinged them again with my last set of questions in case it a) didn't get sent properly or b) I didn't get their reply.  I'll keep you updated as I get more information.
(Assignee)

Comment 23

11 years ago
Created attachment 298147 [details]
v1.0

upgrade to sqlite 3.5.4.1
Assignee: mozilla → sdwilsh
Attachment #296701 - Attachment is obsolete: true
Attachment #298147 - Flags: review?(mozilla)
Attachment #296701 - Flags: review?(sdwilsh)
(Assignee)

Updated

11 years ago
Attachment #298147 - Attachment is patch: false
Attachment #298147 - Attachment mime type: text/plain → application/x-gzip
(Reporter)

Comment 24

11 years ago
While I test this, can you tell me why there are so many changed lines? I had thought the few OS/2 lines, plus version numbers, and maybe some $Id...$ stuff but I see thousands of changes that I don't see a reason for (there's nothing yet in the SQLite timeline about 3.5.4.1).
(Reporter)

Comment 25

11 years ago
Comment on attachment 298147 [details]
v1.0

OK, I can at least give this a positive review regarding the OS/2 issue. The patch does contain the lines that we need and after rebuilding SQLite with the patch in tree, Firefox runs nicely.
Attachment #298147 - Flags: review?(mozilla) → review+
(Assignee)

Comment 26

11 years ago
Comment on attachment 298147 [details]
v1.0

(In reply to comment #25)
> (From update of attachment 298147 [details])
> OK, I can at least give this a positive review regarding the OS/2 issue. The
> patch does contain the lines that we need and after rebuilding SQLite with the
> patch in tree, Firefox runs nicely.
> 
This is what the SQLite folks gave us.  It does contain more than just what you need, but it's a stability update.

Once this gets approval, feel free to land it as I don't care about getting the blame for this.
Attachment #298147 - Flags: approval1.9?

Comment 27

11 years ago
(In reply to comment #26)
> (From update of attachment 298147 [details])
> (In reply to comment #25)
> > (From update of attachment 298147 [details] [details])
> > OK, I can at least give this a positive review regarding the OS/2 issue. The
> > patch does contain the lines that we need and after rebuilding SQLite with the
> > patch in tree, Firefox runs nicely.
> > 
> This is what the SQLite folks gave us.  It does contain more than just what you
> need, but it's a stability update.
> 
> Once this gets approval, feel free to land it as I don't care about getting the
> blame for this.
> 

Shawn what all is in the changelist for 3.5.4.1?
(Assignee)

Comment 28

11 years ago
It contains the following check-ins:
http://www.sqlite.org/cvstrac/chngview?cn=4636
http://www.sqlite.org/cvstrac/chngview?cn=4637
http://www.sqlite.org/cvstrac/chngview?cn=4638
http://www.sqlite.org/cvstrac/chngview?cn=4639
http://www.sqlite.org/cvstrac/chngview?cn=4640
http://www.sqlite.org/cvstrac/chngview?cn=4641
http://www.sqlite.org/cvstrac/chngview?cn=4642
http://www.sqlite.org/cvstrac/chngview?cn=4643
http://www.sqlite.org/cvstrac/chngview?cn=4644
http://www.sqlite.org/cvstrac/chngview?cn=4645
http://www.sqlite.org/cvstrac/chngview?cn=4646 (OS/2)
http://www.sqlite.org/cvstrac/chngview?cn=4647 (OS/2)
http://www.sqlite.org/cvstrac/chngview?cn=4648 (OS/2)

Relevant tickets (that *may* be important to us) are:
http://www.sqlite.org/cvstrac/tktview?tn=2854 ("BEGIN EXCLUSIVE" excludes other shared cache users from using the database.)
[no ticket in checkin comment] (Fix a race condition that can occur when reloading the database schema in shared-cache mode.)
[no ticket in checkin comment] (Mem3.c enhanced so that an allocation of N bytes only requires (N+11)&~7 bytes instead of (N+15)&~7 bytes of heap storage. Minimum heap usage per allocation is still 16 bytes. 8-byte alignment is preserved.)

Nothing super major that I see, and it passes all of our storage tests locally for me.

Comment 29

11 years ago
Comment on attachment 298147 [details]
v1.0

go for it - thanks for the details
Attachment #298147 - Flags: approval1.9? → approval1.9+
(Reporter)

Comment 30

11 years ago
Patch checked into trunk:

db/sqlite3/README.MOZILLA;
new revision: 1.21; previous revision: 1.20
db/sqlite3/src/sqlite3.c;
new revision: 1.12; previous revision: 1.11
db/sqlite3/src/sqlite3.h;
new revision: 1.17; previous revision: 1.16
Status: ASSIGNED → RESOLVED
Last Resolved: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.