Closed
Bug 1309696
Opened 8 years ago
Closed 8 years ago
Crash on Mac OS X [@ je_malloc | libsystem_malloc.dylib@0x...] from infinite recursion of sqlite3SelectDup
Categories
(Core :: SQLite and Embedded Database Bindings, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: esthermonchari, Unassigned)
References
(Depends on 1 open bug)
Details
(Keywords: crash)
Crash Data
This bug was filed from the Socorro interface and is report bp-10a90335-a476-4fb3-a15d-597942160929. ============================================================= Crash involves multiple users,Started with build id 20160929030426 and is still continuing. adding ni on:Carsten,could this be related with Bug 1011839 ? adding ni on:Stuart, any ideas on what could be causing the crash?
Flags: needinfo?(pavlov)
Flags: needinfo?(cbook)
Comment 1•8 years ago
|
||
Hi Esther, not sure if this is the same bug. Some Steps to reproduce like urls etc would useful i guess
Flags: needinfo?(cbook)
Updated•8 years ago
|
Flags: needinfo?(pavlov)
Comment 2•8 years ago
|
||
Crashes seem to be on 10.12 only: http://bit.ly/2e16ZYt. After reviewing the data, this seems to the same user crashing repeatedly.
Comment 3•8 years ago
|
||
(In reply to Esther Monchari from comment #0) > This bug was filed from the Socorro interface and is > report bp-10a90335-a476-4fb3-a15d-597942160929. > ============================================================= > > Crash involves multiple users,Started with build id 20160929030426 and is > still continuing. Looking at your crash history, I see this dates to well prior to the above build. See bp-dbfe9b53-72ff-4db7-bcdc-b54312160815 which is Build ID 20160815030201 At least 1/3 in first 20 seconds, and another ~1/4 in the first 120 seconds And in the past few days you crash on average several times per day [1]. Ditto for early for early November. But in the period between 2016-11-11 2016-12-1 you only have 2 crashes. So what happened in mid to late Noevember that might have led to so few crashes? Can you provide steps? Does starting in safe mode tend to reduce the startup crashes? [1] bp-2109115f-3a6e-44a6-ba68-aed902161202 2016-12-02 17:36:52 arena_avail_tree_remove | arena_run_split | arena_run_alloc | arena_malloc | je_malloc | libsystem_malloc.dylib@0x22d9 bp-4804a2ba-407d-4687-a836-cbfc82161202 2016-12-02 17:26:31 arena_malloc | je_malloc | libsystem_malloc.dylib@0x22d9 bp-c362b50a-ed06-4ed3-bf4f-26b892161202 2016-12-02 01:48:12 arena_avail_tree_remove | arena_run_split | arena_run_alloc | arena_malloc | je_malloc | libsystem_malloc.dylib@0x22d9 bp-0aa097d7-d3bf-49d1-a3e9-27c1f2161202 2016-12-02 01:44:06 arena_malloc | je_malloc | libsystem_malloc.dylib@0x22d9 bp-88c0da7e-831d-445f-a535-b98ed2161202 2016-12-02 01:32:22 arena_malloc | je_malloc | libsystem_malloc.dylib@0x22d9 bp-a0fe4285-88f4-4c40-b047-e47322161202 2016-12-02 01:32:01 arena_malloc | je_malloc | libsystem_malloc.dylib@0x22d9 bp-d057cd11-5e66-4dcc-a4e4-f48b72161202 2016-12-02 01:31:38 EMPTY: no crashing thread identified; ERROR_NO_THREAD_LIST
Severity: normal → critical
Flags: needinfo?(esthermonchari)
Comment 4•8 years ago
|
||
The crash stack in bp-10a90335-a476-4fb3-a15d-597942160929 is interesting: Frame Module Signature Source 0 libmozglue.dylib je_malloc memory/mozjemalloc/jemalloc.c:4323 Ø 1 libsystem_malloc.dylib libsystem_malloc.dylib@0x2331 Ø 2 libsystem_malloc.dylib libsystem_malloc.dylib@0x12af 3 libnss3.dylib sqlite3Malloc db/sqlite3/src/sqlite3.c:24067 4 libnss3.dylib dbMallocRawFinish db/sqlite3/src/sqlite3.c:24424 5 libnss3.dylib exprDup db/sqlite3/src/sqlite3.c:24492 6 libnss3.dylib sqlite3ExprListDup db/sqlite3/src/sqlite3.c:90468 7 libnss3.dylib sqlite3SelectDup db/sqlite3/src/sqlite3.c:90578 8 libnss3.dylib sqlite3SelectDup db/sqlite3/src/sqlite3.c:90585 9 libnss3.dylib sqlite3SelectDup db/sqlite3/src/sqlite3.c:90585 ... 1022 libnss3.dylib sqlite3SelectDup db/sqlite3/src/sqlite3.c:90585 1023 libnss3.dylib sqlite3SelectDup db/sqlite3/src/sqlite3.c:90585 1024 libnss3.dylib sqlite3SelectDup db/sqlite3/src/sqlite3.c:90585 Given that sqlite3SelectDup actually is recursive, it seems like this might be a bug in sqlite3: (Note that this file is huge and takes a while to load) https://hg.mozilla.org/mozilla-central/annotate/f7d5008ee2ab/db/sqlite3/src/sqlite3.c#l90585
Crash Signature: [@ je_malloc | libsystem_malloc.dylib@0x2331 ] → [@ je_malloc | libsystem_malloc.dylib@0x2331 ]
[@ arena_malloc | je_malloc | libsystem_malloc.dylib@0x22d9]
[@ je_malloc | libsystem_malloc.dylib@0x22d9 ]
[@ arena_avail_tree_remove | arena_run_split | arena_run_alloc | arena_malloc | je_malloc | libsy…
Summary: Crash on Mac OS X [@ je_malloc | libsystem_malloc.dylib@0x2331] → Crash on Mac OS X [@ je_malloc | libsystem_malloc.dylib@0x...] from infinite recursion of sqlite3SelectDup
Comment 6•8 years ago
|
||
I withdraw my questions of comment 1 - Esther obviously isn't the person crashing. Plus, Stephen's crashes haven't stopped.
Flags: needinfo?(esthermonchari)
Comment 7•8 years ago
|
||
I see a similar issue at another address. For instance: bp-3e7044bc-b1dd-4357-8619-a0ddb2170201
Crash Signature: libsystem_malloc.dylib@0x22d9 ]
[@ arena_run_alloc | arena_malloc | je_malloc | libsystem_malloc.dylib@0x22d9 ] → libsystem_malloc.dylib@0x22d9 ]
[@ arena_run_alloc | arena_malloc | je_malloc | libsystem_malloc.dylib@0x22d9 ] [@ arena_malloc | je_malloc | libsystem_malloc.dylib@0x22c1 ]
Comment 8•8 years ago
|
||
I guess I'll move it over to Toolkit::Storage for now.
Component: General → Storage
Product: Core → Toolkit
Comment 9•8 years ago
|
||
(In reply to Mats Palmgren (:mats) from comment #4) > Given that sqlite3SelectDup actually is recursive, it seems like this might > be a bug in sqlite3: I wonder if Richard can gather anything useful out of these stacks (comment 4 and comment 7).
Flags: needinfo?(drh)
Comment 10•8 years ago
|
||
Dan was able to reproduce the stack trace by running an SQL statement like this: CREATE VIEW v1 AS VALUES(0,0,0),(0,0,0),<Repeat 130,000 times>,(0,0,0); It appears to be overflowing the stack. Smaller VALUES clauses (less than about 131,000 terms) work. We will be working on a fix. Does Firefox ever generate a VIEW like that? Another construct that invokes sqlite3SelectDup() on a massive VALUES clause would be an INSERT statement inside of a TRIGGER. Does Firefox ever generate the SQL for triggers at run-time? One work-around to this problem is to rebuild with -DSQLITE_MAX_SQL_LENGTH=1000000 or some other reasonable limit on the length of an SQL statement. (The default limit is 1GB.) The CREATE VIEW above that crashes is about 5MB in size, so setting an SQL size limit of 1MB seems reasonable.
Comment 11•8 years ago
|
||
A patch has been checked into the SQLite trunk that avoids the deep recursion when processing a VALUES clause with hundreds of thousands of rows. We can backport this patch to version 3.14.1, or any other version you need, if desired. Otherwise, the patch will appear in the 3.17.0 release, probably sometime in February.
Comment 12•8 years ago
|
||
(In reply to D. Richard Hipp from comment #10) > Does Firefox ever generate a VIEW like that? We don't seem to use VIEWs, I just did a plain code search for "CREATE.*VIEW". Though, I can't exclude add-ons. > Another construct that invokes sqlite3SelectDup() on a massive VALUES clause > would be an INSERT statement inside of a TRIGGER. Does Firefox ever > generate the SQL for triggers at run-time? I couldn't find any in our codebase. All the triggers that do an insert are just inserting a single entry. Again, I can't exclude add-ons. The strange thing is that this is Mac 10.12.1 only, if this would be something in the queries I'd expect it to hit on all the platforms. Off-hand I can't think of Mac-only code that uses Sqlite, I'll keep digging. > One work-around to this problem is to rebuild with > -DSQLITE_MAX_SQL_LENGTH=1000000 or some other reasonable limit on the length > of an SQL statement. (The default limit is 1GB.) The CREATE VIEW above that > crashes is about 5MB in size, so setting an SQL size limit of 1MB seems > reasonable. We could evaluate that as it seems a good idea in general, but we clearly couldn't apply that to the system sqlite on Linux. Also, unfortunately we still have some queries that can generate large IN clauses, we should first check we're not hitting a limit there, and that's also a perf problem (we should use TEMP tables instead).
Comment 13•8 years ago
|
||
(In reply to D. Richard Hipp from comment #11) > We can backport this patch to version 3.14.1, or any other version > you need, if desired. Otherwise, the patch will appear in the 3.17.0 > release, probably sometime in February. The crash rate is quite low, I think it will be fine to wait for the next release, I just wanted to be sure you were informed about this stack.
Comment 14•8 years ago
|
||
Add-on correlations in Nightly are: BrowserStack Desktop messenger for WhatsApp™ FoxClocks Tamper Data FoxyProxy Standard @activity-streams LastPass Password Manager wayback_machine@mozilla.org I'm sure some of these use Sqlite. Just in case I'm ni Nan to ensure Activity Stream doesn't build large VALUES() clauses in triggers/views, since we can check that easily.
Flags: needinfo?(najiang)
Comment 15•8 years ago
|
||
(In reply to Marco Bonardo [::mak] from comment #12) > > One work-around to this problem is to rebuild with > > -DSQLITE_MAX_SQL_LENGTH=1000000 or some other reasonable limit on the length > > of an SQL statement. (The default limit is 1GB.) The CREATE VIEW above that > > crashes is about 5MB in size, so setting an SQL size limit of 1MB seems > > reasonable. > > We could evaluate that as it seems a good idea in general, but we clearly > couldn't apply that to the system sqlite on Linux. Also, unfortunately we > still have some queries that can generate large IN clauses, we should first > check we're not hitting a limit there, and that's also a perf problem (we > should use TEMP tables instead). If you want, the SQL length limit can also be set at run-time for each database connection using sqlite3_limit(db, SQLITE_LIMIT_SQL_LENGTH, 250000). Rather than generating larger IN clauses, could you use the "carray" extension: https://www.sqlite.org/carray.html
Comment 16•8 years ago
|
||
Hi Marco, So Activity Stream doesn't use triggers/views at all. Also, it uses SQLITE_MAX_VARIABLE_NUMBER (default as 999) to cap the # of host parameters. However, it doesn't check the length of SQL queries. Hope this helps.
Flags: needinfo?(najiang)
Comment 17•8 years ago
|
||
(In reply to D. Richard Hipp from comment #15) > If you want, the SQL length limit can also be set at run-time for each > database connection using sqlite3_limit(db, SQLITE_LIMIT_SQL_LENGTH, 250000). we could do that, carefully. Will file a bug. > Rather than generating larger IN clauses, could you use the "carray" > extension: https://www.sqlite.org/carray.html Yes, it's something we want to implement, but we don't have resources to do that (bug 483318). One of the problems we have to solve is that our Sqlite.jsm API currently tries to bind arrays as blobs. We may have to change blobs to use a typedArray to distinguish the 2 cases. Will file a bug.
Comment 18•8 years ago
|
||
Given that 52 is our next ESR, I think it would be good to get some sort of in-product fix that we can uplift. I'm not sure if that means we take a targeted SQLite fix or an in-tree fix, though?
Flags: needinfo?(mak77)
Comment 19•8 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #18) > Given that 52 is our next ESR, I think it would be good to get some sort of > in-product fix that we can uplift. I'm not sure if that means we take a > targeted SQLite fix or an in-tree fix, though? Not worth the complexity based on the number of reports, we'll take a fix when it's ready, unless the crash ratio increases to meaningful numbers.
Flags: needinfo?(mak77)
Comment 20•8 years ago
|
||
Hello. Just to say I've got this bug and it's rather regular / consistent. See: - https://crash-stats.mozilla.com/report/index/792262b8-dc7a-45dc-bb66-252b52170208 - https://crash-stats.mozilla.com/report/index/af55709e-9283-4a0a-9b21-f692e2170208 - https://crash-stats.mozilla.com/report/index/cd3a10a6-7e9d-4daf-a82d-b005b2170208 - https://crash-stats.mozilla.com/report/index/257bafc0-12ec-46b3-87b7-3584f2170206 - https://crash-stats.mozilla.com/report/index/6d23fbbb-2704-49ae-a950-bdf4b2170206 - https://crash-stats.mozilla.com/report/index/d0d61cee-56e2-416e-875b-0071f2170206 - https://crash-stats.mozilla.com/report/index/05bfb3b4-c84c-43ba-981f-05ed52170206 - https://crash-stats.mozilla.com/report/index/4a77b500-9fe8-4eb7-a8a5-858442170206 - https://crash-stats.mozilla.com/report/index/a1920145-2291-446d-9602-d9ff22170206 - https://crash-stats.mozilla.com/report/index/bb969d6d-6b59-4e34-80d9-7a5a52170206 - https://crash-stats.mozilla.com/report/index/2bab7a58-423a-484e-a55a-321442170206
Comment 21•8 years ago
|
||
I get this reproducibly, a couple of minutes after start up, even without add-ons. Just turn on sync. Started on FF51. I get the same on 52 and 54. Several thousand bookmarks. Going back to 49 things are stable again eg for 54 e2791c60-c8e6-46bf-b4e7-276cd2170209
Comment 22•8 years ago
|
||
Glad you said that @AM because I tried to not sync bookmarks and it seems to be stable. Indeed, I've got lots of bookmarks too in "Unsorted bookmarks".
Comment 23•8 years ago
|
||
Thank you, that's useful information, at least we have a possible path to investigate. Sync recently changed some code and queries, I wonder if due to those changes we may end up building a very large query. Kit, we should check all the Sync changes that went into 51 and try to identify the query causing this with many thousands bookmarks in a single folder. It's likely once we take Sqlite 3.17.0 the query will fail, and the sync will be interrupted. At this point possible culprits could be bug 1274108, bug 1293365 (In reply to AM from comment #21) > Just turn on sync. Started on FF51. I get the same on 52 and 54. Several > thousand bookmarks. > > Going back to 49 things are stable again Why not 50? Does it crash or not? Can you tell us how many thousands bookmarks you have in the Unsorted folder? Richard, can this also happen for WITH queries? Something like this: WITH sorting(g, p) AS ( VALUES ${valuesTable} ) UPDATE moz_bookmarks SET position = ( SELECT CASE count(*) WHEN 0 THEN -position ELSE count(*) - 1 END FROM sorting a JOIN sorting b ON b.p <= a.p WHERE a.g = guid ) WHERE parent = :parentId where ${valuesTable} is a very large list.
Flags: needinfo?(kit)
Flags: needinfo?(augustmiles)
Updated•8 years ago
|
Depends on: SQLite3.17.0
Comment 24•8 years ago
|
||
It's very likely this affects a small number of users that have a lot of bookmarks into unsorted, that would explain why we see repeated reports from the same users, and on a single platform. A possible workaround to upgrade to a newer version may be to move some of those bookmarks into subfolders.
Comment 25•8 years ago
|
||
I've got 7000 bookmarks in "other bookmarks". :) And to be fair, "other bookmarks" is for me my "read later" folder.
Comment 27•8 years ago
|
||
First I thought it was because of lack of RAM on my macbookair (4gb) when it didn't crash on my windows computer (6gb) but now i've got a 16gb macbookpro and it still crashes. So I think it's a macOS only bug to be fair.
Comment 28•8 years ago
|
||
Sync switched to using `PlacesUtils.bookmarks.reorder` in bug 1274108. Assuming the CTE in `reorder` is the problem, that lines up with comment 23.
Comment 29•8 years ago
|
||
(In reply to Marco Bonardo [::mak] from comment #23) > Richard, can this also happen for WITH queries? Something like this: > WITH sorting(g, p) AS ( > VALUES ${valuesTable} > ) > UPDATE ... > where ${valuesTable} is a very large list. Yes. Any VALUES clause with a large number of rows (130,000 or more) would use excessive stack space in 3.16.2 and earlier. In 3.17.0, it should work and get the correct answer without deep recursion.
(In reply to Erwann Mest from comment #25) > I've got 7000 bookmarks in "other bookmarks". :) And to be fair, "other > bookmarks" is for me my "read later" folder. I have ~6900 (so, very close to the same number, obviously), and have been crashing (though it's been stable lately).
Comment 31•8 years ago
|
||
I have 12,000 bookmarks. 4/day for ten years. I tend to annotate in detail using tags -- and work by keywords. Almost all are professional links. It seems a reasonable use of a tagging system. I work on a big campus, so I tend to have a fair number of machines (5 or 6) that synchronize depending on where I am working. I went back to 49 because 50 was already using lots of cpu and eating battery on my macbook.
Comment 32•8 years ago
|
||
"The strange thing is that this is Mac 10.12.1 only", no, it also happens on the previous macOS.
Updated•8 years ago
|
Flags: needinfo?(augustmiles)
Comment 34•8 years ago
|
||
All development versions have been updated to Sqlite 3.17.0. Thus the crash should not happen anymore on Nightly, and in the next DevEdition and Beta releases (not sure when they will be released exactly, likely in a week). If you could reproduce easily, please try Nightly, or wait for next dev/beta releases, and let us know. We are still investigating ways to reduce the number of times we need to reorder large list of bookmarks during a Sync in bug 1339390, but there isn't much more we can do here.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(kit)
Flags: needinfo?(drh)
Resolution: --- → FIXED
Comment 35•8 years ago
|
||
Firefox 52 beta 7, released today, should contain the fix for this. Or a recently Nightly/DevEdition build will have it. Would be great if anybody in this bug who could reproduce the problem would be able to verify that things work now :)
Comment 36•8 years ago
|
||
It seems to be fixed. I don't have any crash for some days. :) Thank you all.
Updated•29 days ago
|
Product: Toolkit → Core
You need to log in
before you can comment on or make changes to this bug.
Description
•