Closed
Bug 889583
Opened 11 years ago
Closed 9 years ago
notes on sqlite footprint inefficiency
Categories
(Toolkit :: Places, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 977149
People
(Reporter: taras.mozilla, Unassigned)
Details
My database is 30mb, has ~42K rows in moz_places. of that 8mb is an index on urls(bug 889561). Removing all indexes gets db down to 18mb. Exporting all tables to csv -> 13mb. Compression with lz4 -> 4.8mb with bzip2 -> 3.5mb. It feels like we could keep the database in memory and write it out in a more efficient format to get 7x reduction in footprint + increase in query times.
Reporter | ||
Comment 1•11 years ago
|
||
s/increase/reduction/
Reporter | ||
Updated•11 years ago
|
Summary: sqlite footprint inefficiency → notes on sqlite footprint inefficiency
Comment 2•11 years ago
|
||
I still like LevelDB or similar stores where storage is lexicographically ordered both conceptually and on-disk, does log-structured merge to avoid gratuitous I/O, and supports block-compression that can take advantage of locality.
Reporter | ||
Comment 3•11 years ago
|
||
(In reply to Andrew Sutherland (:asuth) from comment #2) > I still like LevelDB or similar stores where storage is lexicographically > ordered both conceptually and on-disk, does log-structured merge to avoid > gratuitous I/O, and supports block-compression that can take advantage of > locality. leveldb seems like it would help here, but then you look at the mailing list and see people asking how avoid corruption
Comment 4•11 years ago
|
||
On a lark I decided to write a node tool to convert a places SQLite db to a LevelDB rep that I made up. I went a little overboard and tried to create a sane-looking denormalized schema, including some awesomebar support involving pre-generating prefixes based on frecency. Values are JSON encoded; an implementation that used structured clone or something like avro to encode things according to a fixed schema without so much description meta-data could of course do much better. My personal places.sqlite went from 20480 KiB to somewhere between 7844 KiB and 10120 KiB. I got 7840 KiB then made a very minor change that bloated it up. My guess was that what gets compacted/when is somewhat non-deterministic (I use a bunch of batches rather than one giant batch). I added a script to compact the database, but then it decides that some data got lost and creates a 'lost' subdirectory. When you delete that, we're back down to 7844 KiB. I'm wondering a bit if my shutdown is not as clean and horrible things are happening. In any event, my experiment repo is here: https://github.com/asutherland/places-leveldb-experiment
Comment 5•11 years ago
|
||
(In reply to Taras Glek (:taras) from comment #3) > leveldb seems like it would help here, but then you look at the mailing list > and see people asking how avoid corruption In my limited search, it seems like there are concerns about the atomic file operations not being truly atomic because of a failure to fsync the directory, but this is now being addressed: https://groups.google.com/d/msg/leveldb/MvufD2U6gzI/5KYilkXvd4kJ Since the whole append-only strategy is otherwise sound, I am quite willing to believe this may be a key step to fixing it. I also believe that sometimes the OS and hardware conspire against you and you may have to accept limited database corruption in the interest of speed. I know in Thunderbird we ran SQLite with normal sync (rather than full sync) and still experienced db corruption amongst people who would use their mac until the battery ran down. (I think we then went to full sync, and we still might have seen corruption.)
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•