Closed Bug 679852 (leveldb) Opened 13 years ago Closed 12 years ago

integrate leveldb

Categories

(Core :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jaas, Unassigned)

References

(Depends on 2 open bugs, )

Details

Attachments

(5 files, 16 obsolete files)

747.14 KB, patch
Details | Diff | Splinter Review
21.45 KB, patch
Details | Diff | Splinter Review
35.20 KB, patch
Details | Diff | Splinter Review
7.88 KB, patch
Details | Diff | Splinter Review
10.85 KB, patch
Details | Diff | Splinter Review
leveldb keeps coming up as a possibly better choice for upcoming projects than sqlite. We should investigate integrating it and do so if it makes sense so that we can start making use of it.
Blocks: 679267
Blocks: 580104
Blocks: 580099
Blocks: 679883
Assignee: nobody → mozilla
As far as I can tell leveldb doesn't even build on Windows.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #1) > As far as I can tell leveldb doesn't even build on Windows. I think we'd do something like chromium does, see https://src.chromium.org/viewvc/chrome/trunk/src/third_party/leveldatabase/
I'm starting work on this. I agree with Ben Turner about doing what chromium does: Implement an environment, port, and integrate it into the build system.
Status: NEW → ASSIGNED
Blocks: 663979
Do we have any story on how to evict entries when we use levelDB? I.e for the HTTP cache we want the ability to do least-recently-used (LRU) or ideally a plug-in-able algorithm. Does leveldb give us any efficient way to get LRU data? This is an issue for DNS prefetching too, and probably many other potential use cases. Given the docs here http://leveldb.googlecode.com/svn/trunk/doc/index.html the only algorithm that comes to mind is to 1) store a "last use" timestamp in the data, and 2) have a background thread do occasional scans of all key/value pairs and evict those that are older than a certain date (or which have expired). That's a pretty blunt hammer: I'm not sure how we could do things like "evict 10 MB", or prioritize evicting larger objects.
Blocks: 687566
No update for quite a while. Josh or Matt, are you actually pursuing this? If so, what is the ETA? We've got a number of potential uses for a smaller hammer/footgun than SQLite both in desktop and mobile Firefox. We also might want to have Taras and the perf team do an analysis of how LevelDB could work in the context of one of our potential use-cases. It would be useful to have a comparative analysis of memory usage, IO patterns, binary size (both lib and data) between this and SQLite.
(In reply to Jason Duell (:jduell) from comment #4) > I'm not sure how we could do things like > "evict 10 MB", or prioritize evicting larger objects. GetApproximateSizes looks like it can give bytes-used for specific key ranges, which sounds like it might do what you want.
(In reply to Dietrich Ayala (:dietrich) from comment #5) > No update for quite a while. Josh or Matt, are you actually pursuing this? Yes, I am actively working on this. Initial patches to be posted soon. Sorry for any slowness, but I'm new to both of these codebases.
That's great news Matthew! When possible, please do put status updates and work-in-progress patches here, so that interested parties can help and follow along :)
Attached patch Code import, V1 (obsolete) — Splinter Review
First in a series of in-progress patches: Import LevelDB code into source tree.
Attached patch Makefiles, V0.5 (obsolete) — Splinter Review
In progress: Makefiles to build leveldb as part of Firefox build.
Attached patch Lock primitives V0.5 (obsolete) — Splinter Review
Some lock primitives that LevelDB requires.
Attachment #568187 - Attachment is patch: true
Progress checkpoint. One patch for working additions (apply code import first). Individual patches will be posted later today.
Attachment #571100 - Attachment is patch: true
Comment on attachment 571100 [details] [diff] [review] Checkpoint: Basic API on Linux|Mac with test Review of attachment 571100 [details] [diff] [review]: ----------------------------------------------------------------- ::: db/leveldb/public/IMozLevelDB.idl @@ +14,5 @@ > + ldbStatus Get(in ldbReadOptions options, in string key, out ACString value); > + ldbStatus Put(in ldbWriteOptions options, in string key, in string value); > + ldbStatus Write(in ldbWriteOptions options, in ldbWriteBatch updates); > + ldbStatus Delete(in ldbWriteOptions options, in string key); > +}; In general, please add doxygen-like comments to all method and also their arguments. Even when these interfaces are so young. How can I change the default options? As I have seen in the test bellow in this patch, passing null as options means to use the default options, right? How is the API for range-requests/gets gonna look? That is what I'm most interested in. And thanks for these patches!
Is all the I/O happening in a non-main-thread? And does the API and implementation make sure that the main-thread can't be blocked?
(In reply to Olli Pettay [:smaug] from comment #14) > Is all the I/O happening in a non-main-thread? And does the API and > implementation make sure that the > main-thread can't be blocked? I/O happens on background threads, however, the API can block on locks while waiting for the background thread to catch up if buffers fill. The API is intended to be used synchronously. If there's a strong desire for an async API, I can make something happen. I envisioned the async behaviour to take place at the level of services calling levelDB. A documented, feature-complete API will be posted within 24 hours for review, btw.
We must stop using blocking calls in the main thread, and because of that, I think it would be better to not even provide synchronous/blocking API in the main thread. Or is there some reason why the API needs to by synchronous?
I think when LevelDB already does things in the background we could try using the sync APIs first. Advantage is easy of use, mainly when using e10s (e.g. localStorage.getItem). Question is how fast reading will be. If our async API could be build on the natural "threadiness" of LevelDB then it would be much better then inventing another layer of ours over it - i.e. no new threads or whatever. I don't remember whether LevelDB has an API we could use for that.
Could we add some assertions or aborts if the sync API is used in the main thread?
For E10s I would assume chrome would run leveldb in a background thread and content process would communicate with that. We must not block chrome process' main thread. Blocking content process' main thread is kind ok, and, well, required by localStorage.
IMO I think we should just integrate leveldb now with it's normal sync API. We need this for indexedDB, for instance. Then, we should just strictly enforce (via hard abort) that the sync API is never used on the main thread in the chrome process. We can build a generic async API later. This bug should be about just getting it building in our tree, right?
(In reply to ben turner [:bent] from comment #20) > IMO I think we should just integrate leveldb now with it's normal sync API. > We need this for indexedDB, for instance. Then, we should just strictly > enforce (via hard abort) that the sync API is never used on the main thread > in the chrome process. We can build a generic async API later. This bug > should be about just getting it building in our tree, right? That sounds ok if "Then, we should just strictly enforce (via hard abort) that the sync API is never used on the main thread in the chrome process" is changed to "When landing the patch, we should just strictly enforce (via hard abort) that the sync API is never used on the main thread in the chrome process"
(In reply to Olli Pettay [:smaug] from comment #21) > That sounds ok if "Then, we should just strictly > enforce (via hard abort) that the sync API is never used on the main thread > in the chrome process" is changed to "When landing the patch, we should just > strictly > enforce (via hard abort) that the sync API is never used on the main thread > in the chrome process" This.
(In reply to Olli Pettay [:smaug] from comment #21) > (In reply to ben turner [:bent] from comment #20) > > IMO I think we should just integrate leveldb now with it's normal sync API. > > We need this for indexedDB, for instance. Then, we should just strictly > > enforce (via hard abort) that the sync API is never used on the main thread > > in the chrome process. We can build a generic async API later. This bug > > should be about just getting it building in our tree, right? > > That sounds ok if "Then, we should just strictly > enforce (via hard abort) that the sync API is never used on the main thread > in the chrome process" is changed to "When landing the patch, we should just > strictly > enforce (via hard abort) that the sync API is never used on the main thread > in the chrome process" This sounds reasonable. I'll do this if nobody has any strong objections.
I'm attaching the .idl files that describe LevelDB's public xpcom interface. Please provide commentary on the API. In particular, I'm looking for feedback from people who have uses in mind, and how well it fits with their use cases.
Attached file V1, DB interface. Start here. (obsolete) —
Attached file V1, Options objects IDL file. (obsolete) —
Attached file V1, Iterator over database. (obsolete) —
This is the second most important file. The interface for iterating over DB keys/values.
Depends on: 701522
Attached patch Code Import, V1.1 (obsolete) — Splinter Review
Attachment #568183 - Attachment is obsolete: true
Attached patch Makefiles, V1.0 (obsolete) — Splinter Review
Attachment #568187 - Attachment is obsolete: true
Attached patch Part of the port. V0.7 (obsolete) — Splinter Review
Porting is incomplete. This is the "port" portion (locks, primitives), "Env" portion (nsIthread, file I/O) is still in progress and will be posted within a week. Once we have both, then this should run anywhere.
Attachment #568213 - Attachment is obsolete: true
Attached patch XPCOM interface, V1 (obsolete) — Splinter Review
PCOM interface for LevelDB. Documented and working. Please comment. (N.B. Writing batches is failing currently; will determine why ASAP).
Attachment #571100 - Attachment is obsolete: true
Attachment #573978 - Attachment is obsolete: true
Attachment #573979 - Attachment is obsolete: true
Attachment #573980 - Attachment is obsolete: true
Attachment #573981 - Attachment is obsolete: true
Attachment #573982 - Attachment is obsolete: true
Attachment #574842 - Attachment is patch: true
Attached patch xpcshell tests, V0.7 (obsolete) — Splinter Review
Test of the majority of the interesting APIs. Consider these as "examples", but they may be a little tiny bit crufty.
Attachment #574844 - Attachment is patch: true
Attachment #574842 - Attachment is obsolete: true
Attached patch xpcshell tests, V1.0.1 (obsolete) — Splinter Review
fix batches and test them
Some general comments on the API: 1) Would prefixing the interface names with "ns" be a good idea? 2) The SeekToFirst/SeekToLast comments call them SeekStart/SeekEnd. 3) Does getNext return false when you reach the end? If so, it should document that clearly. 4) We typically use camelCase, not underscores (so create_if_missing may be better as createIfMissing; similar for error_if_exists). 5) Are the various unWrap methods actually planned to be used? If not, should we still have them? 6) Should these interfaces be marked as only being implementable via C++? Seems like it, given all the noscript stuff on them... 7) The [notxpcom] parts should probably also be [nostdcall]. 8) Should ILdbStatus be using readonly attributes? 9) Method names should generally start with a lowercase letter. This is all over the interfaces. 10) Nit on the C++: add a space between "if" and '('.
Attachment #574844 - Attachment is obsolete: true
(In reply to Boris Zbarsky (:bz) from comment #37) > 1) Would prefixing the interface names with "ns" be a good idea? or use moz, so mozILevelDB instead of IMozLevelDB, though someone still prefers sticking to the old ns prefix.(In reply to ben turner [:bent] from comment #20) > IMO I think we should just integrate leveldb now with it's normal sync API. > We can build a generic async API later. This bug > should be about just getting it building in our tree, right? While I agree we may need both kind of APIs, this is exactly what happened with Storage, and now we have issues to solve. We should try to learn from errors. An abort is a decent solution but does not block addons from abusing the API on mainthread. Since the issue is mostly thread-locking and contention, would it be possible, when working on mainthread, to enforce a sort of exclusive lock, so that it's impossible for any other thread or process to even try access the same "database"? This way we'd at least have protection against contention on mainthread. Unless the idea was to use a RUNTIME_ABORT.
We should totally use release aborts to prevent addons from doing synchronous main thread i/o. We have a quick way to abort if off the main thread in a release build already that's used for the cycle collector. Just flip the polarity and we're good to go.
If we completely disallow mainthread, can we still use levelDB for localStorage API (that is a synchronous API)?
(In reply to Marco Bonardo [:mak] from comment #38) > An abort is a decent solution but does not block addons from abusing the API > on mainthread. I mean an abort even for release builds. No funny business from extensions.
(In reply to Marco Bonardo [:mak] from comment #40) > If we completely disallow mainthread, can we still use levelDB for > localStorage API (that is a synchronous API)? Do we care? I thought we were done trying to improve that API and were instead funneling people towards indexedDB.
We care because it's one of the current mainthread offenders (bug 627635), and the plan to reduce that pain, was to move to levelDB (bug 687566). If we won't be able to do that, a different plan should be made. Actually, if the web standards are moving far away from localStorage, and it's going to be deprecated soon for indexedDB, we may even just survive with some bandaid SQLite improvement (bug 704933 is a start).
Please, let's not repeat past mistakes. No main thread I/O. (In reply to Marco Bonardo [:mak] from comment #43) > We care because it's one of the current mainthread offenders (bug 627635), > and the plan to reduce that pain, was to move to levelDB (bug 687566). If we > won't be able to do that, a different plan should be made. localStorage will just have to get a different engineered solution. We can't give developers APIs that allow synchronous disk I/O on the GUI thread because they will use those instead (as we've seen).
(In reply to Shawn Wilsher :sdwilsh from comment #44) > (In reply to Marco Bonardo [:mak] from comment #43) > > We care because it's one of the current mainthread offenders (bug 627635), > > and the plan to reduce that pain, was to move to levelDB (bug 687566). If we > > won't be able to do that, a different plan should be made. > localStorage will just have to get a different engineered solution. We > can't give developers APIs that allow synchronous disk I/O on the GUI thread > because they will use those instead (as we've seen). We give browser developers(us) private C++ apis. We can't block a useful backend api on "people might use it wrong" grounds.
(In reply to Taras Glek (:taras) from comment #45) > We give browser developers(us) private C++ apis. We can't block a useful > backend api on "people might use it wrong" grounds. We are already the ones who use it wrong. Footgun APIs are going to get misused, that's why we shouldn't make them (and that's why you've got a super reviewer saying the same thing in comment 21).
Depends on: 44959
Assignee: mozilla → nobody
Attachment #574839 - Attachment is obsolete: true
Attached patch port/env v2.0Splinter Review
Attachment #574841 - Attachment is obsolete: true
Attachment #575024 - Attachment is obsolete: true
Attached patch Makefiles, V2.0Splinter Review
Attachment #574840 - Attachment is obsolete: true
Attached patch tests, v2.0Splinter Review
Attachment #575025 - Attachment is obsolete: true
Attachment #588298 - Attachment is patch: true
Something make be worth looking into? ( May be too late in the cycle ) https://github.com/shuttler/nessDB
So I guess we are sticking with the synchronous API, eh?
"Asynchronous API later" is like a "California maybe". Synchronous APIs snowball in engineering cost. My fro will twitch just as soon as you're done fsyncing. Without reading this whole thread, I'll just add that Sync runs on the chrome thread, accessing storage APIs from js, and will block the hell out of your ui if there is no async option available. I am happy to provide a Sync perspective on blocking calls.
(In reply to Shawn Wilsher :sdwilsh from comment #53) > So I guess we are sticking with the synchronous API, eh? I don't think this decision has been finalized. Development is paused while we are looking for a new owner since Matthew's internship is done.
Due to success in optimizing sqlite usage we've decided to stop work on leveldb integration. Marking WONTFIX, we can re-open if we change course in the future. If we do re-open I strongly recommend that we do not provide any synchronous APIs. I was quiet on the subject before but in the mean time I've seen more of what such footguns can do and I'm strongly against them. If we are going to provide a synchronous API that can do disk i/o it needs to have code enforcing non-main-thread-only usage, failing if called on the main thread.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → WONTFIX
No longer blocks: 580104
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: