Closed Bug 581374 Opened 14 years ago Closed 14 years ago

Firefox SQLite fails to use compression; code attached

Categories

(Toolkit :: Storage, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: d4345, Unassigned)

References

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:2.0b1) Gecko/20100630 Firefox/3.7a4pre (.NET CLR 3.5.30729)
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:2.0b1) Gecko/20100630 Firefox/3.7a4pre

SQLite makes it very easy to enable compression, giving a 50-80% savings on db size at very low cycle cost:

1. enable loadable extension support when you compile sqlite:
edit "Makefile.in" and comment out following:
 # TCC += -DSQLITE_OMIT_LOAD_EXTENSION=1

2. Use an addon like the 60 line zlib one below.

3. call functions:

sqlite> insert into test values(mycompress('text'),mycompress('text2')); 

sqlite> select myuncompress(one),myuncompress(two) from test;


4. that's it!


zlib works, but a dictionary nearly doubles the results. 
Also, VACUUM should be run more often, at least every two weeks.

Reproducible: Always




-- http://lserinol.googlepages.com/mycompress.c
   see: http://sites.google.com/site/lserinol/sqlitecompress
-------------------------------------------------

#include <stdlib.h>
#include <sqlite3ext.h>
#include <zlib.h>
#include <assert.h>
SQLITE_EXTENSION_INIT1

static void compressFunc( sqlite3_context *context, int argc, sqlite3_value **argv){
  int nIn, nOut;
  long int nOut2;
  const unsigned char *inBuf;
  unsigned char *outBuf;
  assert( argc==1 );
  nIn = sqlite3_value_bytes(argv[0]);
  inBuf = sqlite3_value_blob(argv[0]);
  nOut = 13 + nIn + (nIn+999)/1000;
  outBuf = malloc( nOut+4 );
  outBuf[0] = nIn>>24 & 0xff;
  outBuf[1] = nIn>>16 & 0xff;
  outBuf[2] = nIn>>8 & 0xff;
  outBuf[3] = nIn & 0xff;
  nOut2 = (long int)nOut;
  compress(&outBuf[4], &nOut2, inBuf, nIn);
  sqlite3_result_blob(context, outBuf, nOut2+4, free);
}

static void uncompressFunc( sqlite3_context *context, int argc, sqlite3_value **argv){
  unsigned int nIn, nOut, rc;
  const unsigned char *inBuf;
  unsigned char *outBuf;
  long int nOut2;

  assert( argc==1 );
  nIn = sqlite3_value_bytes(argv[0]);
  if( nIn<=4 ){
    return;
  }
  inBuf = sqlite3_value_blob(argv[0]);
  nOut = (inBuf[0]<<24) + (inBuf[1]<<16) + (inBuf[2]<<8) + inBuf[3];
  outBuf = malloc( nOut );
  nOut2 = (long int)nOut;
  rc = uncompress(outBuf, &nOut2, &inBuf[4], nIn);
  if( rc!=Z_OK ){
    free(outBuf);
  }else{
    sqlite3_result_blob(context, outBuf, nOut2, free);
  }
}



int sqlite3_extension_init(
  sqlite3 *db,
  char **pzErrMsg,
  const sqlite3_api_routines *pApi
){
  SQLITE_EXTENSION_INIT2(pApi)
  sqlite3_create_function(db, "mycompress", 1, SQLITE_UTF8, 0, &compressFunc, 0, 0);
  sqlite3_create_function(db, "myuncompress", 1, SQLITE_UTF8, 0, uncompressFunc, 0, 0);

  return 0;
}

---------------------------
See Also: → SQLite3.7.0.1
See Also: → 334010
We can't use this code since it's not licensed with anything.  Not sure we'd even want to do this either...
Component: Location Bar → Storage
Product: Firefox → Toolkit
QA Contact: location.bar → storage
Version: unspecified → Trunk
(In reply to comment #1)
> We can't use this code since it's not licensed with anything.  Not sure we'd
> even want to do this either...

I think we'd generally like to discourage people from storing large blobs of data in the db... If there are cases where that's unavoidable, then perhaps we should provide compression.
If firefox starts using FTS3 (the full-text search engine which requires the indexed data to be stored), it will have those blobs...

It would be nice if SQLite supported compression directly so that other SQLite tools could have more of a chance of interpreting our databases.  Which is to say, maybe we should kick such a feature request up-stream.  (Or maybe the feature request should be to make FTS3 stop requiring the text body be available...)
The original author of that code is D. Richard Hipp (see:
http://www.mail-archive.com/sqlite-users@sqlite.org/msg17018.html )

who, as you may know, *is* the upstream. He's repeating a pattern used many times before under his "The author disclaims copyright to this source code" license with the addition of 3 lines for zlib, but if necessary you or I could email him for approval. 

I think shrinking profile db size by 50-80% (80MB->35MB for me after compression+VACUUM) at negligible cycle cost is an excellent tradeoff. The problem is not just future blobs, but today's long strings.
(In reply to comment #4)

> 
> I think shrinking profile db size by 50-80% (80MB->35MB for me after
> compression+VACUUM) at negligible cycle cost is an excellent tradeoff. The
> problem is not just future blobs, but today's long strings.

How are you shrinking the profile db? This patch looks like it's opt-in only. So unless addons/firefox explicitly call the compress routines, nothing changes.
Right, you'll need to add mycompress() calls (just for URLs or other long strings). For testing purposes I compared *.sqlite size vs profile dir copied to a FuseCompress mount set to zlib. Doing it in sqlite directly should save cycles while giving up a few % in size.
OS: Windows 7 → All
Hardware: x86 → All
(For anyone unfamiliar with compression, this mycompress() scheme obviously works best with a shared dictionary.)
The proposed implementation won't help nearly as much as that. Compressing whole pages (a la FuseCompress) will give you good results, but treating 50 byte strings individually certainly won't (what dictionary are you going to use? some dictionary of "common URLs"?). It'd be effective for large BLOBs, but for the stuff that gets stored (URLs mostly, and a lot of the data isn't even text to begin with) it's just asking for trouble. Also this would make search a few orders of magnitude slower. You can't even use indexes on this "compressed" data.

Something that could be attempted is whole-page compression, but doing it correctly requites a *TON* of work in sqlite. There's some extension that does this but it's read-only (http://www.hwaci.com/sw/sqlite/cerod.html). Making it support writing with reasonable performance would be exceptionally involved.
If you believe that, try your own URL lists with the python sample here: http://stackoverflow.com/questions/479218/how-to-compress-small-strings/1825570#1825570

URLs have a lot of structure. If they're actually making up most of the data, it would be more rational to use a coding scheme with a few INTEGER columns, using only 3-7 bits per URL and preserving all the same search properties. 

Chromium apparently does/will do something similar, but chucks sqlite for improved performance: http://code.google.com/p/chromium/issues/detail?id=25336
This would also make our databases not work with older versions of Firefox, which is highly undesirable.

The gains here are small (smaller filesize) for a high cost (CPU time, code complexity, backwards compatibility).  As a result, I do not think we should take a change like this.
Status: UNCONFIRMED → RESOLVED
Closed: 14 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.