Closed Bug 1392937 Opened 7 years ago Closed 7 years ago

TEST-UNEXPECTED-FAIL | mailnews/db/gloda/test/unit/test_fts3_tokenizer.js | xpcshell return code: 1

Categories

(MailNews Core :: Database, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
Thunderbird 57.0

People

(Reporter: jorgk-bmo, Unassigned)

References

Details

(Whiteboard: [Thunderbird-testfailure: X all debug])

TEST-UNEXPECTED-FAIL | mailnews/db/gloda/test/unit/test_fts3_tokenizer.js | xpcshell return code: 1 [log…] PID 5144 | Assertion failed: (p->flags & MEM_Dyn)==0 || p->szMalloc==0, file c:/builds/moz2_slave/tb-c-cen-w32-d-000000000000000/build/mozilla/db/sqlite3/src/sqlite3.c, line 70285 [log…] PROCESS-CRASH | mailnews/db/gloda/test/unit/test_fts3_tokenizer.js | application crashed [@ abort + 0x5a] [log…] TEST-UNEXPECTED-FAIL | mailnews/db/gloda/test/unit/test_msg_search.js | xpcshell return code: 1 [log…] PID 12144 | Assertion failed: (p->flags & MEM_Dyn)==0 || p->szMalloc==0, file c:/builds/moz2_slave/tb-c-cen-w32-d-000000000000000/build/mozilla/db/sqlite3/src/sqlite3.c, line 70285 [log…] PROCESS-CRASH | mailnews/db/gloda/test/unit/test_msg_search.js | application crashed [@ abort + 0x5a] M-C last good: c7570eb46382 M-C first bad: eb72c8c07751 https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c7570eb46382&tochange=eb72c8c07751 Most likely: Bug 1386401 - Upgrade to SQLite 3.20.0. Marco, can you please take a look. As you know we have a "Global Search Engine", Gloda, which exercises SQLite quite heavily. Since your upgrade, we get assertions and hence crashes in debug compiles. Something is fishy here.
Flags: needinfo?(mak77)
The last executed query looks like SELECT messages.*, messagesText.*, offsets(messagesText) AS osets FROM messagesText, messages WHERE messagesText MATCH ?1 AND messagesText.docid IN (SELECT docid FROM messagesText JOIN messages ON messagesText.docid = messages.id WHERE messagesText MATCH ?1 ORDER BY (((glodaRank(matchinfo(messagesText), 1.0, 2.0, 2.0, 1.5, 1.5) + messages.notability) * 604800000000) + messages.date) DESC LIMIT ?2 ) AND messages.id = messagesText.docid AND +messages.deleted = 0 AND +messages.folderID IS NOT NULL AND +messages.messageKey IS NOT NULL ARGS: "bbb",1000 I see some changed in 3.20.0 about affinity, valueFromExpr and sqlite3VdbeCheckMemInvariants, but it's hard to tell for me without knowledge of the internals and gloda. This looks like the interesting part of the stack: 17:47:45 INFO - 18 libmozsqlite3.so!sqlite3VdbeCheckMemInvariants [sqlite3.c:80fafbaaf848 : 70285 + 0x1f] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afdde0 17:47:45 INFO - rip = 0x00007fa3c33c1d24 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 19 libmozsqlite3.so!releaseMemArray [sqlite3.c:80fafbaaf848 : 73556 + 0x8] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afde00 17:47:45 INFO - rip = 0x00007fa3c33c4392 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 20 libmozsqlite3.so!sqlite3VdbeHalt [sqlite3.c:80fafbaaf848 : 74137 + 0xa] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afde30 17:47:45 INFO - rip = 0x00007fa3c341a0b4 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 21 libmozsqlite3.so!fts3ClearCursor [sqlite3.c:80fafbaaf848 : 148727 + 0x9] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afde60 17:47:45 INFO - rip = 0x00007fa3c341c8e7 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 22 libmozsqlite3.so!cursorOwnsBtShared [sqlite3.c:80fafbaaf848 : 60026 + 0x9] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afde70 17:47:45 INFO - rip = 0x00007fa3c33c6799 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 23 libmozsqlite3.so!sqlite3VdbeExec [sqlite3.c:80fafbaaf848 : 79969 + 0x8] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afdeb0 17:47:45 INFO - rip = 0x00007fa3c341dd9d 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 24 libxul.so!nsStringBuffer::Alloc [nsSubstring.cpp:80fafbaaf848 : 244 + 0x5] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afdee0 17:47:45 INFO - rip = 0x00007fa3c858c7fd 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 25 libxul.so!nsACString::MutatePrep [nsTSubstring.h:80fafbaaf848 : 1114 + 0x5] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afdf10 17:47:45 INFO - rip = 0x00007fa3c858edc7 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 26 libmozsqlite3.so!_fini + 0x23e468 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afdf40 17:47:45 INFO - rip = 0x00007fa3c3690780 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 27 libxul.so!nsACString::ReplacePrepInternal [nsTSubstring.cpp:80fafbaaf848 : 241 + 0xd] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afdf70 17:47:45 INFO - rip = 0x00007fa3c858ee7c 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 28 libxul.so!_fini + 0x5a018 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afdf90 17:47:45 INFO - rip = 0x00007fa3cbd6b880 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 29 libxul.so!nsACString::ReplacePrep [nsTSubstring.cpp:80fafbaaf848 : 211 + 0x11] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afdfd0 17:47:45 INFO - rip = 0x00007fa3c85925cb 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 30 libxul.so!_fini + 0x811ea4 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afe008 17:47:45 INFO - rip = 0x00007fa3cc52370c 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 31 libxul.so!nsACString::Assign [nsTSubstring.cpp:80fafbaaf848 : 377 + 0xf] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afe020 17:47:45 INFO - rip = 0x00007fa3c85927aa 17:47:45 INFO - Found by: stack scanning 17:47:45 INFO - 32 libmozsqlite3.so!sqlite3_step [sqlite3.c:80fafbaaf848 : 77351 + 0x8] 17:47:45 INFO - rbp = 0x00007fa3c34523cf rsp = 0x00007fa3b0afe090 17:47:45 INFO - rip = 0x00007fa3c342add8 17:47:45 INFO - Found by: stack scanning Maybe Andrew, who worked on gloda, has any idea, or we could ni someone from the Sqlite team.
Flags: needinfo?(mak77) → needinfo?(bugmail)
Thanks for looking into it. Andrew isn't really involved in Thunderbird or Gloda any more, despite doing the occasional review (thanks!). That said, I don't see why the author of an SQL statement should understand why the underlying database is spitting the dummy, especially since it (seemingly) worked fine for years. Wouldn't it be fair to assume that you should be able to pass any old SQL and the database should handle is gracefully? I think this is an issue for the SQLite team, who are they? Just for reference, SQL and crash can be inspected in various logs, for example: https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-linux64-debug/1503442342/comm-central_ubuntu64_vm-debug_test-xpcshell-2-bm54-tests1-linux64-build241.txt.gz https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-macosx64-debug/1503442342/comm-central_yosemite_r7-debug_test-xpcshell-bm106-tests1-macosx-build0.txt.gz https://archive.mozilla.org/pub/thunderbird/tinderbox-builds/comm-central-win32-debug/1503442342/comm-central_win7_ix-debug_test-xpcshell-bm127-tests1-windows-build0.txt.gz
Flags: needinfo?(mak77)
(In reply to Jorg K (GMT+2) from comment #2) > Thanks for looking into it. Andrew isn't really involved in Thunderbird or > Gloda any more, despite doing the occasional review (thanks!). That said, I > don't see why the author of an SQL statement should understand why the > underlying database is spitting the dummy, especially since it (seemingly) > worked fine for years. Wouldn't it be fair to assume that you should be able > to pass any old SQL and the database should handle is gracefully? > > I think this is an issue for the SQLite team, who are they? Gloda is using a custom tokenizer and a custom ranking function that's derived from the example I cite in comment 3 via copy-and-paste. These go far beyond simple SQL statements. Having said that, I would expect the breakage to mainly just be minimally around the improved security around the sketchy pointer-passing going on in all these cases. As a related heads up... pragmatically (and dramatically speaking), now that we've hit Firefox 57 and no longer need to worry about maintaining compatibility for legacy extensions, there's a non-trivial potential for mozStorage to break gloda going forward in other ways. I think one of the following needs to happen: - Thunderbird does the gecko fork thing that was discussed at some point (I haven't kept up), maybe forking prior to this change. - Someone on the Thunderbird team needs to take on responsibility for updating gloda to compensate for mozStorage changes. Gloda has very meager binding needs other than the FTS3 stuff, so this might not be that hard to do. Especially if someone already has to un-break the calendar pieces. (I think calendar does database stuff?) This is the type of thing I'm willing to help mentor someone on as needed. - Turn off/disable gloda.
Flags: needinfo?(bugmail)
(In reply to Andrew Sutherland [:asuth] from comment #3) not in reply to comment #4: Thanks, interesting analysis, but where do we go from here? I'm not a storage or SQLite expert and had hoped not to become one. How is our relation to the SQLite team, could they advise us, especially given (from https://sqlite.org/bindptr.html): "Questions and confusion quickly arose on the mailing lists about the purpose behind these new interfaces, why they were introduced, and what problem they solve. This essay attempts to answer those questions and clear up the confusion." Also: "The traditional way of communicating this information was to transform a C-language pointer into a BLOB or a 64-bit integer, then move that BLOB or integer through SQLite using the usual interfaces like sqlite3_bind_blob(), sqlite3_result_blob(), sqlite3_value_blob() or the integer equivalents." I noticed BindBlobByIndex() calls, so that's an M-C API.
(In reply to Andrew Sutherland [:asuth] from comment #4) > Gloda is using a custom tokenizer and a custom ranking function that's > derived from the example I cite in comment 3 via copy-and-paste. These go > far beyond simple SQL statements. Having said that, I would expect the > breakage to mainly just be minimally around the improved security around the > sketchy pointer-passing going on in all these cases. OK, the line you pointed at is https://dxr.mozilla.org/comm-central/rev/7b75c2e0b9deb590c18aaddb2a6168146cde8ca4/mailnews/extensions/fts3/src/nsGlodaRankerFunction.cpp#57 uint32_t *aArgsData = (uint32_t *)aArguments->AsSharedBlob(0, &lenArgsData); AsSharedBlob() is still an M-C function. The example from https://sqlite.org/fts3.html#appendix_a had: aMatchinfo = (unsigned int *)sqlite3_value_blob(apVal[0]); So wouldn't have M-C to adapt this?
I forwarded the request to Andrew not just because he was the original author, but also because he's a very experienced Storage peer (I'd say more a co-owner honestly), his experience on threading and FTS is far larger than mine and it looked like the simplest path to move this forward. (In reply to Jorg K (GMT+2) from comment #5) > How is our > relation to the SQLite team, could they advise us, especially given (from > https://sqlite.org/bindptr.html): > "Questions and confusion quickly arose on the mailing lists about the > purpose behind these new interfaces, why they were introduced, and what > problem they solve. This essay attempts to answer those questions and clear > up the confusion." They could surely provide hints about proper usage of the new APIs, but we can't expect them to fix our code. On the other side, you could send a message to the sqlite-users mailing list and ask for help, it's likely other consumers hit your same problem. For example you could ask if someone could provide an updated versione of the ranker at https://sqlite.org/fts3.html#appendix_a (and likely at that point the Sqlite team will take care of updating the doc!) > "The traditional way of communicating this information was to transform a > C-language pointer into a BLOB or a 64-bit integer, then move that BLOB or > integer through SQLite using the usual interfaces like sqlite3_bind_blob(), > sqlite3_result_blob(), sqlite3_value_blob() or the integer equivalents." > > I noticed BindBlobByIndex() calls, so that's an M-C API. Right, based on the Sqlite document, we cannot use anymore a blob to pass or receive a pointer, instead we must use sqlite3_bind_pointer, sqlite3_result_pointer, sqlite3_value_pointer. I think Andrew in comment 3 suggested to directly use the sqlite3_bind_pointer API in the tokenizer since your code is in cpp and you can access selectStatement->nativeStatement. For the ranker, you could surely experiment on it, I still think the best path forward is asking in the mailing list if the ranker example on the docs could be updated. (In reply to Jorg K (GMT+2) from comment #6) > https://dxr.mozilla.org/comm-central/rev/ > 7b75c2e0b9deb590c18aaddb2a6168146cde8ca4/mailnews/extensions/fts3/src/ > nsGlodaRankerFunction.cpp#57 > uint32_t *aArgsData = (uint32_t *)aArguments->AsSharedBlob(0, > &lenArgsData); > AsSharedBlob() is still an M-C function. The example from > https://sqlite.org/fts3.html#appendix_a had: > aMatchinfo = (unsigned int *)sqlite3_value_blob(apVal[0]); > > So wouldn't have M-C to adapt this? This likely shouldn't use AsSharedBlob. Whether we need to add another util (AsPointer or such) depends on the updated ranker snippet, it's possible but we don't know yet.
Flags: needinfo?(mak77)
(In reply to Marco Bonardo [::mak] from comment #7) Yes, I realised that you sent this to Andrew as a storage peer. > On the other side, you could send a message to the sqlite-users mailing list > and ask for help, it's likely other consumers hit your same problem. > For example you could ask if someone could provide an updated versione of > the ranker at https://sqlite.org/fts3.html#appendix_a (and likely at that > point the Sqlite team will take care of updating the doc!) I sent an e-mail (CC to you and Andrew) although I think you would have been better qualified to do so as I assume that you are already subscribed to the lists and are knowledgable of the issue. > Right, based on the Sqlite document, we cannot use anymore a blob to pass or > receive a pointer, instead we must use sqlite3_bind_pointer, > sqlite3_result_pointer, sqlite3_value_pointer. > I think Andrew in comment 3 suggested to directly use the > sqlite3_bind_pointer API in the tokenizer since your code is in cpp and you > can access selectStatement->nativeStatement. Sadly my knowledge is so limited that I don't follow you here. We should replace rv = selectStatement->BindBlobByIndex() with a native call? If this is so, why is BindBlobByIndex() there if it doesn't do what it advertises to do? To me it looks like you have some blob functions in the box that need a refresh (or removal). In the former case there doesn't appear any need for action on behalf of the caller of these functions. > For the ranker, you could surely experiment on it, I still think the best > path forward is asking in the mailing list if the ranker example on the docs > could be updated. Done. > This likely shouldn't use AsSharedBlob. Whether we need to add another util > (AsPointer or such) depends on the updated ranker snippet, it's possible but > we don't know yet. Once again, if your toolkit offers that function and it doesn't work, it should be fixed or removed or an appropriate alternative offered. "But we don't know yet" - Hmm, and you're leaving it to me to find out :-( Digging a bit through the function, I get to GetSharedBlob() which is where the native calls are.
I think there may be a bug in the new sqlite3_result_pointer() interface. I think the following patch below is necessary. I'm still working on test cases though, and it might be a few hours before I can check in a change. Jorg: Could you manually make the one-line change shown below and rerun your tests to see if the patch below clears the problem? --- ../historical/3.20.0/sqlite3.c +++ sqlite3.c @@ -77117,17 +77117,17 @@ SQLITE_API void sqlite3_result_pointer( sqlite3_context *pCtx, void *pPtr, const char *zPType, void (*xDestructor)(void*) ){ Mem *pOut = pCtx->pOut; assert( sqlite3_mutex_held(pOut->db->mutex) ); - sqlite3VdbeMemSetNull(pOut); + sqlite3VdbeMemRelease(pOut); sqlite3VdbeMemSetPointer(pOut, pPtr, zPType, xDestructor); } SQLITE_API void sqlite3_result_subtype(sqlite3_context *pCtx, unsigned int eSubtype){ Mem *pOut = pCtx->pOut; assert( sqlite3_mutex_held(pOut->db->mutex) ); pOut->eSubtype = eSubtype & 0xff; pOut->flags |= MEM_Subtype; }
Thanks for the quick "service" (your own words) ;-) I hadn't run the test manually, but if I do, I can reproduce the crash. For the record: mozilla/mach xpcshell-test mailnews/db/gloda/test/unit/test_msg_search.js Applying - sqlite3VdbeMemSetNull(pOut); + sqlite3VdbeMemRelease(pOut); at line 77125 fixes the crash. So thanks again for the quick "service", I'll let the storage peers, the ones who look after SQLite integration, take it from here. Macro, can we just land that patch or will that upset the next upgrade?
Flags: needinfo?(mak77)
We can do both, if Sqlite team is willing to do a chemspill release shortly, we can take it, otherwise once the fix is approved upstream and merged to Sqlite, we can also do our own patching. We usually tend to not patch locally until a change was merged upstream. Please ensure that the proposed patch fixes completely the reported issues in Thunderbird, and no further changes are required. (fwiw, it's likely you may still want to update your code to the new APIs, but that code is a bit obscure to me atm).
Flags: needinfo?(mak77)
Note that if we do our own patching and Thunderbird supports system Sqlite, this will still be broken there, since system Sqlite won't have the patch.
(1) Further testing reveals that the patch above is not quite correct. The call to sqlite3VdbeMemRelease() should definitely be added, but the sqlite3VdbeMemSetNull() call is necessary too, and should occur after the call to sqlite3VdbeMemRelease(). (2) I'm not 100% sure what a "chemspill release" is, though I can guess. I'll make the initial bug fix on the SQLite trunk, then back-port the one-line fix to do a 3.20.1 release which is just 3.20.0 with the one-line fix.
Nice. I'm sure when 3.20.1 comes out, we can adopt it quickly. ETA?
Expect SQLite 3.20.1 later today.
I did a search for the meaning of "chemspill release" and found it to be a Mozilla-ism meaning an emergency release to fix a security vulnerability. By that definition, 3.20.1 is not a chemspill release, since the bug is not a security vulnerability. Without -DSQLITE_DEBUG, the only problem is an obscure memory leak. The assert() that is failing when -DSQLITE_DEBUG is used merely detects the memory leak. Joerg: Thunderbird is only using -DSQLITE_DEBUG during testing, right? I ask because SQLite can be between 3x and 10x slower when -DSQLITE_DEBUG is turned on, due to the extensive use of assert() statements inside of inner loops. In SQLite, the assert() statements are off by default and are only active when SQLITE_DEBUG is defined.
Sorry didn't surely mean to request a security release, it was just a way to name an "immediate" .0.1 version :/ (In reply to D. Richard Hipp from comment #16) > Joerg: Thunderbird is only using -DSQLITE_DEBUG during testing, right? Both TB and Firefox use SQLITE_DEBUG in debug builds. Those are mostly used by devs, not intended for general audience (even if in the past some dev liked to use debug builds every day...). Releases surely don't define it.
Marco has answered your question, so all that's left for me to do is to thank you again, Richard.
Depends on: SQLite3.20.1
Fixed by updating to SQLite 3.20.1 in bug 1393526. Thanks again to all involved.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Thunderbird 57.0
You need to log in before you can comment on or make changes to this bug.