Bug 679852 (leveldb)

integrate leveldb

RESOLVED WONTFIX

Status

()

Core
General
RESOLVED WONTFIX
6 years ago
3 years ago

People

(Reporter: Josh Aas, Unassigned)

Tracking

(Depends on: 2 bugs)

Trunk
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(5 attachments, 16 obsolete attachments)

747.14 KB, patch
Details | Diff | Splinter Review
21.45 KB, patch
Details | Diff | Splinter Review
35.20 KB, patch
Details | Diff | Splinter Review
7.88 KB, patch
Details | Diff | Splinter Review
10.85 KB, patch
Details | Diff | Splinter Review
(Reporter)

Description

6 years ago
leveldb keeps coming up as a possibly better choice for upcoming projects than sqlite. We should investigate integrating it and do so if it makes sense so that we can start making use of it.
(Reporter)

Updated

6 years ago
(Reporter)

Updated

6 years ago
Blocks: 679267
Blocks: 580104
Blocks: 580099
Blocks: 679883
(Reporter)

Updated

6 years ago
Assignee: nobody → mozilla
As far as I can tell leveldb doesn't even build on Windows.
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #1)
> As far as I can tell leveldb doesn't even build on Windows.

I think we'd do something like chromium does, see https://src.chromium.org/viewvc/chrome/trunk/src/third_party/leveldatabase/
I'm starting work on this.

I agree with Ben Turner about doing what chromium does: Implement an environment, port, and integrate it into the build system.
Status: NEW → ASSIGNED
Blocks: 663979
Do we have any story on how to evict entries when we use levelDB?  I.e for the HTTP cache we want the ability to do least-recently-used (LRU) or ideally a plug-in-able algorithm.  Does leveldb give us any efficient way to get LRU data?

This is an issue for DNS prefetching too, and probably many other potential use cases.

Given the docs here

   http://leveldb.googlecode.com/svn/trunk/doc/index.html

the only algorithm that comes to mind is to 

 1) store a "last use" timestamp in the data, and 

 2) have a background thread do occasional scans of all key/value pairs and evict those that are older than a certain date (or which have expired).

That's a pretty blunt hammer: I'm not sure how we could do things like "evict 10 MB", or prioritize evicting larger objects.
Blocks: 687566
No update for quite a while. Josh or Matt, are you actually pursuing this?

If so, what is the ETA? We've got a number of potential uses for a smaller hammer/footgun than SQLite both in desktop and mobile Firefox.

We also might want to have Taras and the perf team do an analysis of how LevelDB could work in the context of one of our potential use-cases. It would be useful to have a comparative analysis of memory usage, IO patterns, binary size (both lib and data) between this and SQLite.
(In reply to Jason Duell (:jduell) from comment #4)
> I'm not sure how we could do things like
> "evict 10 MB", or prioritize evicting larger objects.

GetApproximateSizes looks like it can give bytes-used for specific key ranges, which sounds like it might do what you want.
(In reply to Dietrich Ayala (:dietrich) from comment #5)
> No update for quite a while. Josh or Matt, are you actually pursuing this?

Yes, I am actively working on this.  Initial patches to be posted soon.

Sorry for any slowness, but I'm new to both of these codebases.
That's great news Matthew! When possible, please do put status updates and work-in-progress patches here, so that interested parties can help and follow along :)
Created attachment 568183 [details] [diff] [review]
Code import, V1

First in a series of in-progress patches:

Import LevelDB code into source tree.
Created attachment 568187 [details] [diff] [review]
Makefiles, V0.5

In progress: Makefiles to build leveldb as part of Firefox build.
Created attachment 568213 [details] [diff] [review]
Lock primitives V0.5

Some lock primitives that LevelDB requires.
Attachment #568187 - Attachment is patch: true
Created attachment 571100 [details] [diff] [review]
Checkpoint: Basic API on Linux|Mac with test

Progress checkpoint.
One patch for working additions (apply code import first).

Individual patches will be posted later today.
(Reporter)

Updated

6 years ago
Attachment #571100 - Attachment is patch: true
Comment on attachment 571100 [details] [diff] [review]
Checkpoint: Basic API on Linux|Mac with test

Review of attachment 571100 [details] [diff] [review]:
-----------------------------------------------------------------

::: db/leveldb/public/IMozLevelDB.idl
@@ +14,5 @@
> +  ldbStatus Get(in ldbReadOptions options, in string key, out ACString value);
> +  ldbStatus Put(in ldbWriteOptions options, in string key, in string value);
> +  ldbStatus Write(in ldbWriteOptions options, in ldbWriteBatch updates);
> +  ldbStatus Delete(in ldbWriteOptions options, in string key);
> +};

In general, please add doxygen-like comments to all method and also their arguments.  Even when these interfaces are so young.


How can I change the default options?  As I have seen in the test bellow in this patch, passing null as options means to use the default options, right?

How is the API for range-requests/gets gonna look?  That is what I'm most interested in.

And thanks for these patches!
Is all the I/O happening in a non-main-thread? And does the API and implementation make sure that the 
main-thread can't be blocked?
(In reply to Olli Pettay [:smaug] from comment #14)
> Is all the I/O happening in a non-main-thread? And does the API and
> implementation make sure that the 
> main-thread can't be blocked?

I/O happens on background threads, however, the API can block on locks while waiting for the background thread to catch up if buffers fill.

The API is intended to be used synchronously.  If there's a strong desire for an async API, I can make something happen.  I envisioned the async behaviour to take place at the level of services calling levelDB.

A documented, feature-complete API will be posted within 24 hours for review, btw.
We must stop using blocking calls in the main thread, and because of that, I think it would be
better to not even provide synchronous/blocking API in the main thread.

Or is there some reason why the API needs to by synchronous?
I think when LevelDB already does things in the background we could try using the sync APIs first.  Advantage is easy of use, mainly when using e10s (e.g. localStorage.getItem).  

Question is how fast reading will be.  If our async API could be build on the natural "threadiness" of LevelDB then it would be much better then inventing another layer of ours over it - i.e. no new threads or whatever.  I don't remember whether LevelDB has an API we could use for that.
Could we add some assertions or aborts if the sync API is used in the main thread?
For E10s I would assume chrome would run leveldb in a background thread and
content process would communicate with that. We must not block chrome process' main thread.
Blocking content process' main thread is kind ok, and, well, required by localStorage.
IMO I think we should just integrate leveldb now with it's normal sync API. We need this for indexedDB, for instance. Then, we should just strictly enforce (via hard abort) that the sync API is never used on the main thread in the chrome process. We can build a generic async API later. This bug should be about just getting it building in our tree, right?
(In reply to ben turner [:bent] from comment #20)
> IMO I think we should just integrate leveldb now with it's normal sync API.
> We need this for indexedDB, for instance. Then, we should just strictly
> enforce (via hard abort) that the sync API is never used on the main thread
> in the chrome process. We can build a generic async API later. This bug
> should be about just getting it building in our tree, right?

That sounds ok if "Then, we should just strictly
enforce (via hard abort) that the sync API is never used on the main thread
in the chrome process" is changed to "When landing the patch, we should just strictly
enforce (via hard abort) that the sync API is never used on the main thread
in the chrome process"
(In reply to Olli Pettay [:smaug] from comment #21)
> That sounds ok if "Then, we should just strictly
> enforce (via hard abort) that the sync API is never used on the main thread
> in the chrome process" is changed to "When landing the patch, we should just
> strictly
> enforce (via hard abort) that the sync API is never used on the main thread
> in the chrome process"
This.
(In reply to Olli Pettay [:smaug] from comment #21)
> (In reply to ben turner [:bent] from comment #20)
> > IMO I think we should just integrate leveldb now with it's normal sync API.
> > We need this for indexedDB, for instance. Then, we should just strictly
> > enforce (via hard abort) that the sync API is never used on the main thread
> > in the chrome process. We can build a generic async API later. This bug
> > should be about just getting it building in our tree, right?
> 
> That sounds ok if "Then, we should just strictly
> enforce (via hard abort) that the sync API is never used on the main thread
> in the chrome process" is changed to "When landing the patch, we should just
> strictly
> enforce (via hard abort) that the sync API is never used on the main thread
> in the chrome process"

This sounds reasonable. I'll do this if nobody has any strong objections.
I'm attaching the .idl files that describe LevelDB's public xpcom interface.

Please provide commentary on the API.  In particular, I'm looking for feedback from people who have uses in mind, and how well it fits with their use cases.
Created attachment 573978 [details]
V1, DB interface.  Start here.
Created attachment 573979 [details]
V1, Options objects IDL file.
Created attachment 573980 [details]
V1, Status object. Returned by many DB methods.
Created attachment 573981 [details]
V1, Iterator over database.

This is the second most important file. The interface for iterating over DB keys/values.
Created attachment 573982 [details]
V1, Batch object. Group writes together.
Depends on: 701522
Created attachment 574839 [details] [diff] [review]
Code Import, V1.1
Attachment #568183 - Attachment is obsolete: true
Created attachment 574840 [details] [diff] [review]
Makefiles, V1.0
Attachment #568187 - Attachment is obsolete: true
Created attachment 574841 [details] [diff] [review]
Part of the port.  V0.7

Porting is incomplete. This is the "port" portion (locks, primitives), "Env" portion (nsIthread, file I/O) is still in progress and will be posted within a week.  Once we have both, then this should run anywhere.
Attachment #568213 - Attachment is obsolete: true
Created attachment 574842 [details] [diff] [review]
XPCOM interface, V1

XPCOM interface for LevelDB.  Documented and working.  Please comment.

(N.B. Writing batches is failing currently; will determine why ASAP).
Attachment #571100 - Attachment is obsolete: true
Attachment #573978 - Attachment is obsolete: true
Attachment #573979 - Attachment is obsolete: true
Attachment #573980 - Attachment is obsolete: true
Attachment #573981 - Attachment is obsolete: true
Attachment #573982 - Attachment is obsolete: true
Attachment #574842 - Attachment is patch: true
Created attachment 574844 [details] [diff] [review]
xpcshell tests, V0.7

Test of the majority of the interesting APIs.  Consider these as "examples", but they may be a little tiny bit crufty.
Attachment #574844 - Attachment is patch: true
Alias: leveldb
Created attachment 575024 [details] [diff] [review]
XPCOM interface, V1.0.1  (fix batches)
Attachment #574842 - Attachment is obsolete: true
Created attachment 575025 [details] [diff] [review]
xpcshell tests, V1.0.1

fix batches and test them
Some general comments on the API:

1)  Would prefixing the interface names with "ns" be a good idea?
2)  The SeekToFirst/SeekToLast comments call them SeekStart/SeekEnd.
3)  Does getNext return false when you reach the end?  If so, it should document that
    clearly.
4)  We typically use camelCase, not underscores (so create_if_missing may be better as
    createIfMissing; similar for error_if_exists).
5)  Are the various unWrap methods actually planned to be used?  If not, should we still
    have them?
6)  Should these interfaces be marked as only being implementable via C++?  Seems like it,
    given all the noscript stuff on them...
7)  The [notxpcom] parts should probably also be [nostdcall].
8)  Should ILdbStatus be using readonly attributes?
9)  Method names should generally start with a lowercase letter.  This is all over the
    interfaces.
10) Nit on the C++: add a space between "if" and '('.
Attachment #574844 - Attachment is obsolete: true
(In reply to Boris Zbarsky (:bz) from comment #37)
> 1)  Would prefixing the interface names with "ns" be a good idea?

or use moz, so mozILevelDB instead of IMozLevelDB, though someone still prefers sticking to the old ns prefix.(In reply to ben turner [:bent] from comment #20)

> IMO I think we should just integrate leveldb now with it's normal sync API.
> We can build a generic async API later. This bug
> should be about just getting it building in our tree, right?

While I agree we may need both kind of APIs, this is exactly what happened with Storage, and now we have issues to solve. We should try to learn from errors.
An abort is a decent solution but does not block addons from abusing the API on mainthread. Since the issue is mostly thread-locking and contention, would it be possible, when working on mainthread, to enforce a sort of exclusive lock, so that it's impossible for any other thread or process to even try access the same "database"? This way we'd at least have protection against contention on mainthread.
Unless the idea was to use a RUNTIME_ABORT.
We should totally use release aborts to prevent addons from doing synchronous main thread i/o.  We have a quick way to abort if off the main thread in a release build already that's used for the cycle collector.  Just flip the polarity and we're good to go.
If we completely disallow mainthread, can we still use levelDB for localStorage API (that is a synchronous API)?
(In reply to Marco Bonardo [:mak] from comment #38)
> An abort is a decent solution but does not block addons from abusing the API
> on mainthread.

I mean an abort even for release builds. No funny business from extensions.
(In reply to Marco Bonardo [:mak] from comment #40)
> If we completely disallow mainthread, can we still use levelDB for
> localStorage API (that is a synchronous API)?

Do we care? I thought we were done trying to improve that API and were instead funneling people towards indexedDB.
We care because it's one of the current mainthread offenders (bug 627635), and the plan to reduce that pain, was to move to levelDB (bug 687566). If we won't be able to do that, a different plan should be made.
Actually, if the web standards are moving far away from localStorage, and it's going to be deprecated soon for indexedDB, we may even just survive with some bandaid SQLite improvement (bug 704933 is a start).
Please, let's not repeat past mistakes.  No main thread I/O.

(In reply to Marco Bonardo [:mak] from comment #43)
> We care because it's one of the current mainthread offenders (bug 627635),
> and the plan to reduce that pain, was to move to levelDB (bug 687566). If we
> won't be able to do that, a different plan should be made.
localStorage will just have to get a different engineered solution.  We can't give developers APIs that allow synchronous disk I/O on the GUI thread because they will use those instead (as we've seen).

Comment 45

6 years ago
(In reply to Shawn Wilsher :sdwilsh from comment #44)
> (In reply to Marco Bonardo [:mak] from comment #43)
> > We care because it's one of the current mainthread offenders (bug 627635),
> > and the plan to reduce that pain, was to move to levelDB (bug 687566). If we
> > won't be able to do that, a different plan should be made.
> localStorage will just have to get a different engineered solution.  We
> can't give developers APIs that allow synchronous disk I/O on the GUI thread
> because they will use those instead (as we've seen).

We give browser developers(us) private C++ apis. We can't block a useful backend api on "people might use it wrong" grounds.
(In reply to Taras Glek (:taras) from comment #45)
> We give browser developers(us) private C++ apis. We can't block a useful
> backend api on "people might use it wrong" grounds.
We are already the ones who use it wrong.  Footgun APIs are going to get misused, that's why we shouldn't make them (and that's why you've got a super reviewer saying the same thing in comment 21).
Depends on: 44959
(Reporter)

Updated

5 years ago
Assignee: mozilla → nobody
Created attachment 588293 [details] [diff] [review]
Code import, V2.0
Attachment #574839 - Attachment is obsolete: true
Created attachment 588294 [details] [diff] [review]
port/env v2.0
Attachment #574841 - Attachment is obsolete: true
Created attachment 588295 [details] [diff] [review]
XPCOM interface, V2.0
Attachment #575024 - Attachment is obsolete: true
Created attachment 588296 [details] [diff] [review]
Makefiles, V2.0
Attachment #574840 - Attachment is obsolete: true
Created attachment 588298 [details] [diff] [review]
tests, v2.0
Attachment #575025 - Attachment is obsolete: true
Attachment #588298 - Attachment is patch: true

Comment 52

5 years ago
Something make be worth looking into? ( May be too late in the cycle )

https://github.com/shuttler/nessDB
So I guess we are sticking with the synchronous API, eh?
"Asynchronous API later" is like a "California maybe". Synchronous APIs snowball in engineering cost. My fro will twitch just as soon as you're done fsyncing. 

Without reading this whole thread, I'll just add that Sync runs on the chrome thread, accessing storage APIs from js, and will block the hell out of your ui if there is no async option available. 

I am happy to provide a Sync perspective on blocking calls.
(Reporter)

Comment 55

5 years ago
(In reply to Shawn Wilsher :sdwilsh from comment #53)
> So I guess we are sticking with the synchronous API, eh?

I don't think this decision has been finalized. Development is paused while we are looking for a new owner since Matthew's internship is done.
(Reporter)

Comment 56

5 years ago
Due to success in optimizing sqlite usage we've decided to stop work on leveldb integration. Marking WONTFIX, we can re-open if we change course in the future.

If we do re-open I strongly recommend that we do not provide any synchronous APIs. I was quiet on the subject before but in the mean time I've seen more of what such footguns can do and I'm strongly against them. If we are going to provide a synchronous API that can do disk i/o it needs to have code enforcing non-main-thread-only usage, failing if called on the main thread.
(Reporter)

Updated

5 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → WONTFIX
No longer blocks: 580104
You need to log in before you can comment on or make changes to this bug.