Closed Bug 413589 Opened 14 years ago Closed 13 years ago

enable SQLite full-text (fulltext) index module (fts3) so extensions can use it

Categories

(Toolkit :: Storage, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9beta3

People

(Reporter: myk, Assigned: myk)

References

(Blocks 1 open bug)

Details

(Keywords: dev-doc-complete)

Attachments

(1 file, 4 obsolete files)

Firefox doesn't use SQLite's full-text index module, but there are a number of interesting applications of that feature in the browser space (f.e. full-text history search), and it's a compile time option in the version of SQLite we integrate, so it's worth making the module available to extension authors who would like to use the capability in Firefox 3 extensions.

To do that, we need to make two changes:

1. enable the compile-time option;

2. allow storage consumers to open a database connection that does not share
   a pager cache, since full-text indexing depends on virtual tables, which are
   not available to connections that share caches.

Here's a patch that enables the module and adds the method openUnsharedDatabase to mozIStorageService for opening a database connection that doesn't share its page cache.  As far as I can tell, this is a low-risk change, given that we don't use the feature in core code and that the routine we use to disable the cache, sqlite3_enable_shared_cache, is designed to be used this way <http://www.sqlite.org/capi3ref.html#sqlite3_enable_shared_cache>.

This patch also updates the description of mozIStorageService::openDatabase to reflect the change in the thread safety of the pager cache and database connections given our recent upgrade to SQLite version 3.5.x.

Requesting wanted-gecko1.9 for this change, as I think it's worth enabling experimentation with this technology in the extension space for Firefox 3 at this late stage in the game, even though we aren't using it ourselves.

More information about the module itself is available at these URLs:

  http://www.sqlite.org/cvstrac/wiki?p=FullTextIndex
  http://www.sqlite.org/cvstrac/wiki?p=FtsTwo
  http://www.sqlite.org/cvstrac/wiki?p=CompilingFts
Flags: blocking1.9?
Attachment #298639 - Flags: review?(sdwilsh)
Moving to wanted list
Flags: wanted1.9+
Flags: blocking1.9?
Flags: blocking1.9-
So, the full text indexing is already part of the Amalgamation?  That's neat, and I was completely unaware of that.

I've got some questions that I'll get to soon - just letting you know that this is on my radar.
see also bug 342915, which is obsolete from the looks of this patch.
(In reply to comment #2)
> So, the full text indexing is already part of the Amalgamation?  That's neat,
> and I was completely unaware of that.

Yeah, apparently they stick that module in the Amalgamation (I think they might do that with other modules as well).


> I've got some questions that I'll get to soon - just letting you know that this
> is on my radar.

Cool, good to know.

Note that most of the code in openDatabase and openUnsharedDatabase could be refactored.  I duplicated it because I thought that would be the lowest-risk approach, but we could also implement a common function that both methods call and that twiddles the sqlite3_enable_shared_cache option when some parameter (aUnshared?) so indicates.
(In reply to comment #3)
> see also bug 342915, which is obsolete from the looks of this patch.

Argh, I keep not finding these bugs when searching for them before filing a new one.
This version contains additional tests and some minor improvements to the test infrastructure over the previous patch.

It also contains an additional warning in the description of openDatabase (you can't open a shared cache connection to a database containing a virtual table).

I know we've gone back and forth a couple of times over whether or not to have the shared cache on by default.  It seems to me that the shared cache is no longer as useful now that database connections are thread-safe and we can share a single database connection with all consumers.

But we should probably have that conversation in another bug.
Attachment #298639 - Attachment is obsolete: true
Attachment #298892 - Flags: review?(sdwilsh)
Attachment #298639 - Flags: review?(sdwilsh)
(In reply to comment #6)
> I know we've gone back and forth a couple of times over whether or not to have
> the shared cache on by default.  It seems to me that the shared cache is no
> longer as useful now that database connections are thread-safe and we can share
> a single database connection with all consumers.
That's not entierly accurate.  mozIStorageConnection isn't threadsafe, but the underlying sqlite connection is.
So, how does match handle unicode here?  I'm willing to bet it doesn't so well (we had to implement our own unicode functions for LIKE and upper/lowercase functions.
(In reply to comment #8)
> So, how does match handle unicode here?  I'm willing to bet it doesn't so well
> (we had to implement our own unicode functions for LIKE and upper/lowercase
> functions.

According to http://www.sqlite.org/cvstrac/wiki?p=FullTextIndex it's ASCII-only at this point.  That would be a problem if we were using it in core code, but it seems reasonable to make this feature available to extenders, even though it has this limitation and others, since its current feature set still gives them plenty of room to do interesting things.

It looks like the 3.5.4.1 branch patch checked in for bug 411780 removed the fts3 module from our source drop, breaking the patch in this bug.  It's not clear why, as the amalgamated package of 3.5.4 as distributed by SQLite <http://sqlite.org/sqlite-amalgamation-3_5_4.zip> includes the module, and none of the SQLite bugs listed in bug 411780 comment 28 appear to pertain to it.
I'm aware that sqlite only supports ASCII, but what I'm asking is "is it possible for us to fix this like we did with LIKE?".

I'm looking into the missing fts stuff now.
(In reply to comment #10)
> I'm aware that sqlite only supports ASCII, but what I'm asking is "is it
> possible for us to fix this like we did with LIKE?".

I'm not very familiar with the code, but if I understand sqlite3_create_function correctly, then it should be possible to register our own function to override the default behavior of MATCH.  Nevertheless, it seems likely to be significantly harder, since MATCH is a much more complicated function.

But maybe what we want is not to override MATCH in toto but just the part of it that searches for individual term matches (a strict equality function?).
(In reply to comment #11)
> But maybe what we want is not to override MATCH in toto but just the part of it
> that searches for individual term matches (a strict equality function?).
Something like that if it were possible.
(In reply to comment #12)
> (In reply to comment #11)
> > But maybe what we want is not to override MATCH in toto but just the part of it
> > that searches for individual term matches (a strict equality function?).
> Something like that if it were possible.

Hmm, we'd also have to change the tokenizer so it breaks text into words in the appropriate places.
Comment on attachment 298892 [details] [diff] [review]
patch v2: more & better tests and comments

>   /**
>    * Open a connection to the specified file.
>    *
>+   * All connections share the same sqlite cache, and the cache is threadsafe.
>+   * The connection object returned by this function is also threadsafe.
>+   *
This isn't completely accurate.  The SQLite connection is threadsafe, but I can't vouch mozIStorageConnection (I don't think it is).

>+  /**
>+   * Open a connection to the specified file that doesn't share a sqlite cache.
>+   *
>+   * Each connection uses its own sqlite cache, which is inefficient, so you
>+   * should use openDatabase instead of this method unless you need a feature
>+   * of SQLite that is incompatible with a shared cache, like virtual table
>+   * and full text indexing support.  The caches are, however, threadsafe.
>+   * The connection object returned by this function is also threadsafe.
So it is perfectly safe to use more than one connection and we won't get weird results where one writes and the other's cache is out of date, right?

>+
>+/* mozIStorageConnection openUnsharedDatabase(in nsIFile aDatabaseFile); */
>+NS_IMETHODIMP
>+mozStorageService::OpenUnsharedDatabase(nsIFile *aDatabaseFile, mozIStorageConnection **_retval)
>+{
>+    nsresult rv;
>+
>+    mozStorageConnection *msc = new mozStorageConnection(this);
>+    if (!msc)
>+        return NS_ERROR_OUT_OF_MEMORY;
>+
>+    sqlite3_enable_shared_cache(0);
>+    rv = msc->Initialize (aDatabaseFile);
>+    sqlite3_enable_shared_cache(1);
I'd love to see a comment indicating what we are doing with the shared cache here.

>-function getOpenedDatabase()
>+function getOpenedDatabase(unshared)
JavaDoc style comments please - I'm trying to make it easier for folks to add tests when they submit patches, and well documented test suites help that out a lot

> function test_openDatabase_file_exists()
> {
>   // it should already exist from our last test
>   var db = getTestDB();
>   do_check_true(db.exists());
>   getService().openDatabase(db);
>   do_check_true(db.exists());
>+
>+  // Clean up in preparation for testing openUnsharedDatabase.
>+  cleanup();
wouldn't this be better to run in the test that needs it to be cleaned up?
Alternatively, if you need a new environment, why not make a new test file?

General nit on your sql statements.  Please have works like SELECT, INSERT INTO, WHERE, etc in all caps.  It makes it easier to read and that's how sql statements are in this test suite as well as throughout the codebase.

>+function run_test()
>+{
>+  // It's extra important to start from scratch, since these tests won't work
>+  // with an existing shared cache connection, so we do it even though the last
>+  // test probably did it already.
>+  cleanup();
>+
>+  try {
>+    for (var i = 0; i < tests.length; i++) {
>+      tests[i]();
>+    }
>+  }
>+  // It's extra important to clean up afterwards, since later tests that use
>+  // a shared cache connection will not be able to read the database we create,
>+  // so we do this in a finally block to ensure it happens even if some of our
>+  // tests fail.
That's why we generally cleanup before we run tests too.  This is still Ok though.
>+  finally {
>+    cleanup();
>+  }
>+}

We'll need a new patch addressing the unicode stuff as well as getting fts back in our sqlite file (or separate if that's what we want to do too).
Attachment #298892 - Flags: review?(sdwilsh)
Version: unspecified → Trunk
Blocks: 414102
(In reply to comment #14)
> (From update of attachment 298892 [details] [diff] [review])

> >+   * The connection object returned by this function is also threadsafe.
> >+   *
> This isn't completely accurate.  The SQLite connection is threadsafe, but I
> can't vouch mozIStorageConnection (I don't think it is).

I pinged Vlad about this, and he wasn't sure and doesn't remember the code well enough to say, so I have reverted the comment.


> So it is perfectly safe to use more than one connection and we won't get
> weird results where one writes and the other's cache is out of date, right?

I don't know this for a fact, but since non-shared caches are the default connection type in sqlite3, I expect that this is true.


> >+    sqlite3_enable_shared_cache(0);
> >+    rv = msc->Initialize (aDatabaseFile);
> >+    sqlite3_enable_shared_cache(1);
> I'd love to see a comment indicating what we are doing with the shared cache
> here.

Ok, I've added one.


> >-function getOpenedDatabase()
> >+function getOpenedDatabase(unshared)
> JavaDoc style comments please - I'm trying to make it easier for folks to add
> tests when they submit patches, and well documented test suites help that out
> a lot

Sure, makes sense, I've added a comment.


> >+  // Clean up in preparation for testing openUnsharedDatabase.
> >+  cleanup();
> wouldn't this be better to run in the test that needs it to be cleaned up?

Yes, good point.


> Alternatively, if you need a new environment, why not make a new test file?

This test file is called test_storage_service.js, and it seems to make the most sense for all the tests that are testing the storage service methods to be located in this one file, even if a couple need a new environment, since it'll be less confusing for future developers than having some of those tests live elsewhere.


> General nit on your sql statements.  Please have works like SELECT, INSERT
> INTO, WHERE, etc in all caps.  It makes it easier to read and that's how sql
> statements are in this test suite as well as throughout the codebase.

Yup, I like that style, too, I had just copied these verbatim from the FTS examples in the SQLite docs.  I have updated them now to use all caps for SQL reserved words.


> We'll need a new patch addressing the unicode stuff as well as getting fts
> back in our sqlite file (or separate if that's what we want to do too).

Per our conversation on IRC, I have filed bug 414102 on supporting Unicode text in the fts3 module, and I have updated the sqlite3.c file in my tree to include the fts3 module code, but I have not included those changes in this patch.
Attachment #298892 - Attachment is obsolete: true
Attachment #299396 - Flags: review?(sdwilsh)
Comment on attachment 299396 [details] [diff] [review]
patch v3: addresses review comments

> # -DSQLITE_SECURE_DELETE=1 will cause SQLITE to 0-fill delete data so we
> # don't have to vacuum to make sure the data is not visible in the file.
>-DEFINES = -DSQLITE_SECURE_DELETE=1 -DTHREADSAFE=1
>+# -DSQLITE_CORE=1 and -DSQLITE_ENABLE_FTS3=1 enable the full-text index module.
>+DEFINES = -DSQLITE_SECURE_DELETE=1 -DTHREADSAFE=1 -DSQLITE_CORE=1 -DSQLITE_ENABLE_FTS3=1
Missed this last time - why do we need SQLITE_CORE?  Can you add a comment above to explain it please?

>+
>+/**
>+ * Get a connection to the test database.  Creates and caches the connection
>+ * if necessary, otherwise reuses the existing cached connection.
>+ *
>+ * @param unshared {boolean} whether or not to open a connection to the database
>+ *                           that doesn't share its cache; if true, we use
>+ *                           mozIStorageService::openUnsharedDatabase to create
>+ *                           the connection; otherwise we use openDatabase.
>+ */
nit: I'm much more of a fan of doing the description on a newline:
@param unshared {boolean}
       Description here...

r=sdwilsh
Attachment #299396 - Flags: review?(sdwilsh) → review+
(In reply to comment #16)
> >+# -DSQLITE_CORE=1 and -DSQLITE_ENABLE_FTS3=1 enable the full-text index module.
> >+DEFINES = -DSQLITE_SECURE_DELETE=1 -DTHREADSAFE=1 -DSQLITE_CORE=1 -DSQLITE_ENABLE_FTS3=1
> Missed this last time - why do we need SQLITE_CORE?  Can you add a comment
> above to explain it please?

We can either compile FTS3 as a standalone shared library or statically link it into the SQLite library.  If we were to build it as a shared library, we'd have to load it before using it via:

  SELECT load_extension(<library filename>);

And we'd also have to call sqlite3_enable_load_extension before loading it.  Also, I think we wouldn't be able to compile it as part of the amalgamation.  So I think it's better (simpler, easier, safer) to link it in statically, which is what defining SQLITE_CORE does.

I've added a brief comment to this effect.


> >+ * @param unshared {boolean} whether or not to open a connection to the database
> >+ *                           that doesn't share its cache; if true, we use
> >+ *                           mozIStorageService::openUnsharedDatabase to create
> >+ *                           the connection; otherwise we use openDatabase.
> >+ */
> nit: I'm much more of a fan of doing the description on a newline:
> @param unshared {boolean}
>        Description here...

You got it.

Here's a version of patch v3 with these two issues addressed.  This is the version of the patch I'll check in.


Requesting approval to land this wanted-gecko1.9 patch that enables the full-text index module in the copy of SQLite that we compile as part of mozStorage and ship with Firefox.

The patch enables the compile-time option that compiles and statically links the module into the SQLite library, but it doesn't actually change any part of Firefox to use this functionality, so this change is low-risk provided the presence of the module doesn't cause any problems in other parts of the SQLite library, and there are no indications in conversations about the module between other SQLite users on the SQLite users mailing list that it does.

The patch also adds a method to the mozIStorageService interface for opening a database connection without a shared cache, which is a well-understood way of opening databases (indeed, it's the default way of opening databases in stock SQLite; we override the default to open them with shared caches) and necessary for using the module.  Again, it does not change Firefox to use this new method, so the risk of regression from the presence of the method is low.

The purpose of the change is to enable extenders to utilize the module, and it is possible that use by extensions could cause unforeseen problems, but these are unlikely based on my review of discussion by other users of the module, and they would be likely to be caught during the extension development process in any case.

The patch includes tests that exercise both the new mozIStorageService method and the basic functionality of the module.

In addition to this patch, I would also check in an update to sqlite3.c that adds the module code to that file.  The code is normally in that file (as distributed by the SQLite project), and it was in our version of it until we updated to the custom version 3.5.4.1 last week, which unintentionally left it out.  This change is thus low risk, as the code has a demonstrated history of its presence in that file not causing problems.
Attachment #299396 - Attachment is obsolete: true
Attachment #299483 - Flags: approval1.9?
Keywords: dev-doc-needed
Comment on attachment 299483 [details] [diff] [review]
patch v4: addresses remaining review nits

Myk: thanks for the description, that really helped. I have two follow up questions:

 - what's the binary size hit for taking this?

 - should we get a security review of this code?

I'm generally in favour, though.
(In reply to comment #18)
>  - should we get a security review of this code?
I don't think that will be needed.  This is only exposed to chrome code.
Duplicate of this bug: 342915
Here's a patch that rearranges the tests to work around that vexing Windows bug that stung sdwilsh when he was writing tests for bug 394789.


(In reply to comment #18)

>  - what's the binary size hit for taking this?

On my machines, compiling with fts3 adds about 20K, 10K, and 29K to packages for Windows, Mac, and Linux, respectively:

-rwxr-xr-x 1 myk Administrators 6312322 Jan 27 12:18 firefox-3.0b3pre.en-US.win32.installer.exe.with-fts3
-rwxr-xr-x 1 myk Administrators 6292521 Jan 27 10:27 firefox-3.0b3pre.en-US.win32.installer.exe.without-fts3

-rw-r--r--@ 1 myk  myk  10727825 Jan 27 02:55 firefox-3.0b3pre.en-US.mac.dmg.with-fts3
-rw-r--r--@ 1 myk  myk  10717044 Jan 27 03:15 firefox-3.0b3pre.en-US.mac.dmg.without-fts3

-rw-r--r-- 1 myk myk 9267182 2008-01-27 02:36 firefox-3.0b3pre.en-US.linux-i686.tar.bz2.with-fts3
-rw-r--r-- 1 myk myk 9238139 2008-01-27 03:27 firefox-3.0b3pre.en-US.linux-i686.tar.bz2.without-fts3


>  - should we get a security review of this code?

Like sdwilsh, I don't think the amount of exposure here rises to that level of concern.
Attachment #299483 - Attachment is obsolete: true
Attachment #299611 - Flags: review?(sdwilsh)
Attachment #299483 - Flags: approval1.9?
Comment on attachment 299611 [details] [diff] [review]
patch v5: rearranges tests to work around Windows bug

r=sdwilsh
Attachment #299611 - Flags: review?(sdwilsh) → review+
Comment on attachment 299611 [details] [diff] [review]
patch v5: rearranges tests to work around Windows bug

Requesting approval for this wanted bug.  Details on the risk and impact are in comment 17 and comment 21.
Attachment #299611 - Flags: approval1.9?
Comment on attachment 299611 [details] [diff] [review]
patch v5: rearranges tests to work around Windows bug

a1.9+=damons
Attachment #299611 - Flags: approval1.9? → approval1.9+
/sqlite3/src/sqlite3.c
Checking in db/sqlite3/src/Makefile.in;
/cvsroot/mozilla/db/sqlite3/src/Makefile.in,v  <--  Makefile.in
new revision: 1.34; previous revision: 1.33
done
Checking in db/sqlite3/src/sqlite3.c;
/cvsroot/mozilla/db/sqlite3/src/sqlite3.c,v  <--  sqlite3.c
new revision: 1.13; previous revision: 1.12
done
Checking in storage/public/mozIStorageService.idl;
/cvsroot/mozilla/storage/public/mozIStorageService.idl,v  <--  mozIStorageService.idl
new revision: 1.7; previous revision: 1.6
done
Checking in storage/src/mozStorageService.cpp;
/cvsroot/mozilla/storage/src/mozStorageService.cpp,v  <--  mozStorageService.cpp
new revision: 1.14; previous revision: 1.13
done
Checking in storage/test/unit/head_storage.js;
/cvsroot/mozilla/storage/test/unit/head_storage.js,v  <--  head_storage.js
new revision: 1.6; previous revision: 1.5
done
RCS file: /cvsroot/mozilla/storage/test/unit/test_storage_fulltextindex.js,v
done
Checking in storage/test/unit/test_storage_fulltextindex.js;
/cvsroot/mozilla/storage/test/unit/test_storage_fulltextindex.js,v  <--  test_storage_fulltextindex.js
initial revision: 1.1
done
Checking in storage/test/unit/test_storage_service.js;
/cvsroot/mozilla/storage/test/unit/test_storage_service.js,v  <--  test_storage_service.js
new revision: 1.2; previous revision: 1.1
done
RCS file: /cvsroot/mozilla/storage/test/unit/test_storage_service_unshared.js,v
done
Checking in storage/test/unit/test_storage_service_unshared.js;
/cvsroot/mozilla/storage/test/unit/test_storage_service_unshared.js,v  <--  test_storage_service_unshared.js
initial revision: 1.1
done
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Blocks: 377244
Blocks: 342913
Blocks: 342916
Summary: enable SQLite full-text (fulltext) index module so extensions can use it → enable SQLite full-text (fulltext) index module (fts3) so extensions can use it
full-text indexing support is now mentioned in various places in the docs.
I don't recall seeing this before, but now that I've found it, reopening:
It is illegal to call sqlite3_enable_shared_cache() if one or more open database connections were opened by the calling thread. If the argument is non-zero, shared-cache mode is enabled. If the argument is zero, shared-cache mode is disabled. The return value is either SQLITE_OK (if the operation was successful), SQLITE_NOMEM (if a malloc() failed), or SQLITE_MISUSE (if the thread has open database connections).

So as it stands right now, what we are doing is totally wrong...
Nominating for blocking because this is really bad :(
Status: RESOLVED → REOPENED
Flags: blocking1.9?
Resolution: FIXED → ---
Per comment 27 moving to blocking list.  Perhaps this is the cause of the sqlite crashes?

Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
Can this be fixed, rather than backed out? I am already using FTS in my extension, others may be doing it too, so this is late-compat. :(
Keywords: late-compat
Unless the sqlite documentation is wrong, I don't think it can be fixed...

I'm hoping the documentation is wrong for what it's worth, or that maybe they can fix it so we can do this.  Myk - can you contact them?
(In reply to comment #30)
> I'm hoping the documentation is wrong for what it's worth, or that maybe they
> can fix it so we can do this.  Myk - can you contact them?

Yes, I just did so and will report what I hear back.

Note: I also explicitly tested for this case, and the behavior is not as described on that page.  sqlite3_enable_shared_cache on a thread with open database connections returns SQLITE_OK, not SQLITE_MISUSE, and open_unshared_database has the desired effect (the database connection is opened with the shared cache disabled).  So unless I'm missing something, the documentation would appear to be wrong (perhaps out-of-date).
Documentation was out of date.  Our code here is fine.
Status: REOPENED → RESOLVED
Closed: 14 years ago13 years ago
Resolution: --- → FIXED
Keywords: late-compat
(In reply to comment #32)
> Documentation was out of date.  Our code here is fine.

Indeed, this was fixed when the shared cache mode was modified to work across an entire process rather than individual threads.
Great!
Is it fixed with UTF8 support or not?
Any documents in mozdev?
(In reply to comment #34)
> Is it fixed with UTF8 support or not?

No, that's dependent bug 414102.


> Any documents in mozdev?

According to comment 26, there is now documentation, presumably on developer.mozilla.org.
You need to log in before you can comment on or make changes to this bug.