Closed Bug 889583 Opened 11 years ago Closed 9 years ago

notes on sqlite footprint inefficiency

Categories

(Toolkit :: Places, defect)

x86_64
Windows 8
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 977149

People

(Reporter: taras.mozilla, Unassigned)

Details

My database is 30mb, has ~42K rows in moz_places.
of that 8mb is an index on urls(bug 889561).

Removing all indexes gets db down to 18mb.

Exporting all tables to csv -> 13mb.

Compression with lz4 -> 4.8mb

with bzip2 -> 3.5mb.

It feels like we could keep the database in memory and write it out in a more efficient format to get 7x reduction in footprint + increase in query times.
s/increase/reduction/
Summary: sqlite footprint inefficiency → notes on sqlite footprint inefficiency
I still like LevelDB or similar stores where storage is lexicographically ordered both conceptually and on-disk, does log-structured merge to avoid gratuitous I/O, and supports block-compression that can take advantage of locality.
(In reply to Andrew Sutherland (:asuth) from comment #2)
> I still like LevelDB or similar stores where storage is lexicographically
> ordered both conceptually and on-disk, does log-structured merge to avoid
> gratuitous I/O, and supports block-compression that can take advantage of
> locality.

leveldb seems like it would help here, but then you look at the mailing list and see people asking how avoid corruption
On a lark I decided to write a node tool to convert a places SQLite db to a LevelDB rep that I made up.  I went a little overboard and tried to create a sane-looking denormalized schema, including some awesomebar support involving pre-generating prefixes based on frecency.  Values are JSON encoded; an implementation that used structured clone or something like avro to encode things according to a fixed schema without so much description meta-data could of course do much better.

My personal places.sqlite went from 20480 KiB to somewhere between 7844 KiB and 10120 KiB.  I got 7840 KiB then made a very minor change that bloated it up.  My guess was that what gets compacted/when is somewhat non-deterministic (I use a bunch of batches rather than one giant batch).  I added a script to compact the database, but then it decides that some data got lost and creates a 'lost' subdirectory.  When you delete that, we're back down to 7844 KiB.  I'm wondering a bit if my shutdown is not as clean and horrible things are happening.

In any event, my experiment repo is here:
https://github.com/asutherland/places-leveldb-experiment
(In reply to Taras Glek (:taras) from comment #3)
> leveldb seems like it would help here, but then you look at the mailing list
> and see people asking how avoid corruption

In my limited search, it seems like there are concerns about the atomic file operations not being truly atomic because of a failure to fsync the directory, but this is now being addressed: https://groups.google.com/d/msg/leveldb/MvufD2U6gzI/5KYilkXvd4kJ

Since the whole append-only strategy is otherwise sound, I am quite willing to believe this may be a key step to fixing it.  I also believe that sometimes the OS and hardware conspire against you and you may have to accept limited database corruption in the interest of speed.  I know in Thunderbird we ran SQLite with normal sync (rather than full sync) and still experienced db corruption amongst people who would use their mac until the battery ran down.  (I think we then went to full sync, and we still might have seen corruption.)
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.