Closed Bug 600307 Opened 14 years ago Closed 11 years ago

Rewrite and cleanup DOMStorage code

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla23

People

(Reporter: mayhemer, Assigned: mayhemer)

References

(Blocks 3 open bugs, )

Details

(Keywords: qawanted, Whiteboard: [MemShrink:P3])

Attachments

(3 files, 42 obsolete files)

19.71 KB, text/plain
Details
16.62 KB, patch
smaug
: review+
Details | Diff | Splinter Review
434.19 KB, patch
mayhemer
: review+
mayhemer
: superreview+
mayhemer
: checkin+
Details | Diff | Splinter Review
The dom storage code and its items manipulation starts to be a mess, I want to rewrite it.  Also I have heard Shawn wanted to base the indexedDB code on the DOM storage database.  We should then probably coordinate on this bug to stick to all demands.


What we do now and what I really don't like:
- we always load items from the database, although we also cache them in mItems hash table ; this hash table is trying to keep in sync with the db as it is used for some operations prior to database
- mItems is used for sessionStorage implementation (to get no db store)
- we have an in-memory implementation of the database API that is used in PB and SO modes ; hard to maintain both persistent and non-persistent codes ; mItems is still used along with an engaged in-memory "database" having also the same implementation
- we store data for offline allowed and common web sites in a single database and have a special code to handle offline and non-offline data


What I would like to have:
- all operations should be made only in the database
- use temporary tables exclusively where we don't want to store to disk
- sum usage and check quota only by db queries
- have just a single API implementation for the database
- probably have a separate database for offline data (needs discussion)
(In reply to comment #0)
> The dom storage code and its items manipulation starts to be a mess, I want to
> rewrite it.  Also I have heard Shawn wanted to base the indexedDB code on the
> DOM storage database.  We should then probably coordinate on this bug to stick
> to all demands.
I think you heard that from bent, and I think he was joking.  Regardless, happy to see this filed, and I'm happy to provide input.
Bug 495337 should probably be fixed prior to this one.

I'm also thinking of dropping support for globalStorage.  It is too late to do this in Firefox 4, bug this bug will neither be there.

(In reply to comment #1)
> I think you heard that from bent, and I think he was joking.  
It sounded to me a little bit crazy :D
Depends on: 495337
Adding dep on bug 622112.  If using only sqlite for managing DOM storage data and operations we may get to a situation it blocks on I/O on the main thread, which is highly undesired.
Depends on: 622112
Becomes actual again.  The description might be a little bit outdated now, but the main goals are:

- remove any duplicate caching
- make the code much simpler for maintenance

Jonas is suggesting to start from scratch and I really am not against.
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Tagging this for memshrink as there is potential for significant memory reduction here.
Whiteboard: [MemShrink]
It is realistic that I could have by the end of the next week (by Nov 18th) a fully working patch that would also include async (out-of-main-thread) transactional writes.

Based on that it seems that work on async writes may become worthless.
(In reply to Taras Glek (:taras) from comment #5)
> Tagging this for memshrink as there is potential for significant memory
> reduction here.

How so?  We've never seen an about:memory where the local storage SQLite DB is very big.
Whiteboard: [MemShrink] → [MemShrink:P3]
(In reply to Nicholas Nethercote [:njn] from comment #7)
> (In reply to Taras Glek (:taras) from comment #5)
> > Tagging this for memshrink as there is potential for significant memory
> > reduction here.
> 
> How so?  We've never seen an about:memory where the local storage SQLite DB
> is very big.

I'm not sure what you define as very big. I'm also not sure how temporary(ie in-memory) tables are displayed in about:memory. Also note that half of the memory used here is in dom cache.
So, it is no longer realistic I have a compete patch till the end of this week.  

However, I have already done most of this major overhaul with a much better concept.  Still needs to add things like security checks and change API where needed, but 75% of the patch is done already.
Attached patch WIP1 (no eTLD+1 quota support) (obsolete) — Splinter Review
Making a progress here!

https://tbpl.mozilla.org/?tree=Try&rev=3102958d63d7

This patch passes try for win and mac and manual testing implies it works as expected.  Setting a new key and updating an existing key is 2 times faster, getting a key is 3 times faster (on i7/ssd).

There is just one exception: quota is not counted per eTLD+1 but for each origin separately.  I currently work on an update to have eTLD+1 quota checking again as specified ; I assume in two days I can have a complete patch for first review round.
https://tbpl.mozilla.org/?tree=Try&rev=c034f3832009

I'll add comments, logs and telemetry and I can get reviews.
Attachment #687198 - Flags: feedback?(vdjeric)
Attachment #686683 - Flags: feedback?(vdjeric)
Hi Honza, I apologize, I've been very busy over the last two days. I will review your patch on Wednesday
(In reply to Vladan Djeric (:vladan) from comment #13)
> Hi Honza, I apologize, I've been very busy over the last two days. I will
> review your patch on Wednesday

No problem.  Don't look at the eTLD+1 patch, it is dead anyway.

I have (working on) a newer patch that removes the etld+1 quota check based on some discussion with Jonas and fixes one major issue with IPC.  However, the concept is going to be the same, except I also plan to introduce coalescing of clean/remove/set for keys and scopes and simplify the task scheduling logic.  There will only be two sets: preloads and updates.  Preloads will be processed as first and immediately.  Updates will be collected, coalesced during collection (such as that when an update to a key is made twice, then only the second will actually be made), and then after some period (5s) will be flushed in a single transaction on the background thread.  This will make the logic simpler and more optimal.  I adopt your ideas here, btw ;)
Comment on attachment 687198 [details] [diff] [review]
support for eTLD+1 quota [obsolete, left for reference]

Dead based on bug 767944 comment 11.
Attachment #687198 - Flags: feedback?(vdjeric)
Vladan, ping on feedback.
(In reply to Honza Bambas (:mayhemer) from comment #16)
> Vladan, ping on feedback.

I am reviewing your patch but it's going to take me a while longer to get through all of it, but most likely by the end of the day. Have you done any tests to determine how often pre-loading is benefecial? i.e. the fraction of preloads where the page does end up reading from localStorage
(In reply to Vladan Djeric (:vladan) from comment #17)
> (In reply to Honza Bambas (:mayhemer) from comment #16)
> > Vladan, ping on feedback.
> 
> I am reviewing your patch but it's going to take me a while longer to get
> through all of it, but most likely by the end of the day. Have you done any
> tests to determine how often pre-loading is benefecial? i.e. the fraction of
> preloads where the page does end up reading from localStorage

Don't do any deep review please, just look around the concept and if you have some ideas I'll be glad to get inspired.

I will today or on Monday submit a newer patch, final, that will have a different way of how we collect write operations.  I like the feature of your patch from bug 807021 that we actually do only writes that are necessary, like not writing all changes made to a single key during one flush window.  So, there will be a coalescing logic and data will write in a single and short living transaction (as your patch is doing) what may save some of the performance.
Attachment #686683 - Flags: feedback?(vdjeric)
Attachment #687198 - Attachment description: WIP1 addition1: support for eTLD+1 quota → support for eTLD+1 quota [obsolete, left for reference]
Here are some of my initial impressions, I will take another look tomorrow:

- This is going to be a big code change so it might be nice to have a pref that switches off some of the optional behaviors initially like pre-fetching
- You should add back the Telemetry probes for timing common LocalStorage operations, they'll be useful for quantifying perf gains in the field
- As I mentioned before, I'm particularly interested in knowing the fraction of beneficial pre-fetches. It looks like we check the disk for site LocalStorage on first site visit during a Firefox session.. Could we maybe fetch the list of sites with localStorage at some point after startup so we don't have to keep hitting disk to do this check?
- It looks like a new DOMStorageCache object is created for every site visited even when the page doesn't have any localStorage data and might never call the localStorage API? This could be wasteful wrt memory.
- Am I correct in concluding that these empty DOMStorageCaches will never get evicted from memory? Wouldn't this cause memory use to grow without bound?
- DOMStorageManager.h could use some documentation 
- Nitpick: the "Data" class should probably have a more specific name
- wasRunningOnes -> sWasRunningOnes... what does this do exactly?
- enqueuing a new Task for every write-type operation seems excessive, but you've already said you're rewriting it to batch updates in some form
- I think having a single batch flush outstanding at any one time would make the code simpler, e.g. helping to get rid of things like ForgetUpdatesForMatchingScope
- Are the HigherPriority/DefaultPriority methods just going to change the helper thread's scheduling priority? Has this priority toggling shown itself to be beneficial in initial testing? 
- Nitpick: As a personal perference, it would be nice if method names started with verbs, e.g. "SetHigherPriority"/"SetDefaultPriority"
- How does the helper thread indicate that a DB operation failed with an error to the synchronous LocalStorage calls blocking on it? e.g. shouldn't DOMStorageCache::GetKey return the error condition encountered by the Preload task?
- How do you handle outstanding flush items on shutdown? If I understand correctly, you enqueue a flush operation and set mStop = true, and the helper thread stops when it sees mStop == true? If there were multiple write tasks queued on the helper thread, wouldn't that cause us to stop processing queued tasks after the first one?
- I think waiting to flush outstanding data until "xpcom-shutdown" is probably too late, especially with the exit(0) work trying to switch us to a model where components save profile data at "profile-before-change" and then exit(0) is called
- Why did you change the stored value to "" in test_brokenUTF-16.html instead of the diacritic data? I assume that's probably a debug change, I think your code can handle diacritics as data remains cached for the duration of a browser session
First, thanks for looking at this.

I have a newer patch concerning some of your comments already, just needs to fix some last nits.

(In reply to Vladan Djeric (:vladan) from comment #19)
> Here are some of my initial impressions, I will take another look tomorrow:
> 
> - This is going to be a big code change so it might be nice to have a pref
> that switches off some of the optional behaviors initially like pre-fetching

I'm not personally worried about that and I would like to have data on how prefetch helps or how long we have to wait for it.  Why do you think prefetching would be a problem and needed a pref?

> - You should add back the Telemetry probes for timing common LocalStorage
> operations, they'll be useful for quantifying perf gains in the field

Done already, but it is not (cannot be) exactly the same.  However a compare will be possible.

> - As I mentioned before, I'm particularly interested in knowing the fraction
> of beneficial pre-fetches. It looks like we check the disk for site
> LocalStorage on first site visit during a Firefox session.. Could we maybe
> fetch the list of sites with localStorage at some point after startup so we
> don't have to keep hitting disk to do this check?

This is a good point.  I can try to incorporate this.

> - It looks like a new DOMStorageCache object is created for every site
> visited even when the page doesn't have any localStorage data and might
> never call the localStorage API? This could be wasteful wrt memory.

It is depending on the previous point, if we know there are no data per a scope, we may simply prevent the manager to create a cache for that scope.

> - Am I correct in concluding that these empty DOMStorageCaches will never
> get evicted from memory? Wouldn't this cause memory use to grow without
> bound?

I already implemented eviction of a cache when there are DOMStorage objects using it.

> - DOMStorageManager.h could use some documentation 

Already done ;)

> - Nitpick: the "Data" class should probably have a more specific name

Hm.. it is a nested class and I would like to keep the name short.  What better name do you have in mind?

> - wasRunningOnes -> sWasRunningOnes... what does this do exactly?

Ah, this had only been added as part of testing, actually just for try run to check we don't spawn the thread after shutdown again.  I can completely remove it.  I never hit MOZ_ASSERT(wasRunningOnes).

> - enqueuing a new Task for every write-type operation seems excessive, but
> you've already said you're rewriting it to batch updates in some form

Yes, I collect them to batches in the new patch.

> - I think having a single batch flush outstanding at any one time would make
> the code simpler, e.g. helping to get rid of things like
> ForgetUpdatesForMatchingScope

There is still need to look up existing updates per scope even with batching.  It will be well documented why.  It comes from my intention of having a list of tasks from the start and not build any intermediate record of changes as you do.  I would have to keep track of dirty items in the thread anyway, since of IPC isolation (cannot do it in the cache as you do) and having immediately a track of tasks is IMO better.

> - Are the HigherPriority/DefaultPriority methods just going to change the
> helper thread's scheduling priority? Has this priority toggling shown itself
> to be beneficial in initial testing? 

I will remove them.

> - Nitpick: As a personal perference, it would be nice if method names
> started with verbs, e.g. "SetHigherPriority"/"SetDefaultPriority"

moot.

> - How does the helper thread indicate that a DB operation failed with an
> error to the synchronous LocalStorage calls blocking on it? e.g. shouldn't
> DOMStorageCache::GetKey return the error condition encountered by the
> Preload task?

Hmm.. good idea!

(In reply to Vladan Djeric (:vladan) from comment #20)
> - How do you handle outstanding flush items on shutdown? If I understand
> correctly, you enqueue a flush operation and set mStop = true, and the
> helper thread stops when it sees mStop == true? If there were multiple write
> tasks queued on the helper thread, wouldn't that cause us to stop processing
> queued tasks after the first one?

If you read the condition carefully it cannot happen.  This code has changed in the newer patch anyway.

> - I think waiting to flush outstanding data until "xpcom-shutdown" is
> probably too late, especially with the exit(0) work trying to switch us to a
> model where components save profile data at "profile-before-change" and then
> exit(0) is called

Depends.  Currently we flush at "xpcom-shutdown" and don't have any trouble.  However, what you suggest might be a good point.  I'll try to check which notification is better, but unless there is a really strong reason, I would like not to change this.

> - Why did you change the stored value to "" in test_brokenUTF-16.html
> instead of the diacritic data? I assume that's probably a debug change, I
> think your code can handle diacritics as data remains cached for the
> duration of a browser session

Ah!  Just my script for patch cleaning wiped that out.  It is not intended for landing!

Thanks.
Attached patch v0.9 (obsolete) — Splinter Review
Introduced:
- batching of update tasks
- reintroduced telemetry
- cache eviction after last storage using it goes away

Still not introduced (hence 0.9 and not 1.0):
- checking on which origins we have data to prevent prefetch of caches that would be empty, probably never user and thus wasting memory also because they would be hardly evicted
- preference to turn of preload (which I, however, don't see a need of)
- let getItem throw when prefetch failed (I also have to think of this a bit more first)
- not using mozilla/dom namespace

Try run:
https://tbpl.mozilla.org/?tree=Try&rev=928c8096b601

Implementation documentation:
https://developer.mozilla.org/en-US/docs/DOM/Storage/Implementation
Attachment #686683 - Attachment is obsolete: true
Attachment #690675 - Flags: review?(vdjeric)
Looks like a single statement cannot be used concurrently by two threads:
https://tbpl.mozilla.org/php/getParsedLog.php?id=17813969&tree=Try

When preload is about to happen on the main thread, a different scoped statement has to be used.  Will be fixed on v1.0.
(In reply to Honza Bambas (:mayhemer) from comment #21)
> Depends.  Currently we flush at "xpcom-shutdown" and don't have any trouble.

I was wrong, we do shutdown at "profile-before-change", so I'm reverting it.
Attached patch v1 (obsolete) — Splinter Review
This can be considered as a first full candidate.

Missing:
- methods don't throw when preload fails, I don't think its a good idea (but will check when there is time)
- no pref for disabling preloads, more and more I'm against it
Attachment #690675 - Attachment is obsolete: true
Attachment #690675 - Flags: review?(vdjeric)
Attachment #691030 - Flags: review?(vdjeric)
Attached patch v1.0.1 (obsolete) — Splinter Review
Stopping the database in "profile-before-change" is not working on child process (leaking).  We must shutdown in both "profile-before-change" and "xpcom-shutdown".  Duplicate call of DOMStorageCache::StopDatabase() is a no-op.

Otherwise identical to v1.

Try: https://tbpl.mozilla.org/?tree=Try&rev=afcd5f70e56f
Attachment #691030 - Attachment is obsolete: true
Attachment #691030 - Flags: review?(vdjeric)
Attachment #691135 - Flags: review?(vdjeric)
Comment on attachment 691135 [details] [diff] [review]
v1.0.1

This review took longer than I expected because of the size & complexity of the patch. A lot of new classes are added + the threading and some of the models in the code take a good bit of time to understand and verify, e.g. the behavior of the helper thread is controlled by a lot of different flags & other state. I would suggest splitting the patch into multiple pieces + writing an overview of the code as a bug comment to make it easier for the next reviewer.

> I already implemented eviction of a cache when there are DOMStorage objects using it.

Oh I see, this is the DOMStorage::~DOMStorage code? i.e. a scope is evicted from memory after the last reference to window.localStorage goes away?
Does this model work well in practice? I can think of a scenario where a user goes from site to site and most pages have a GoogleAnalytics widget that uses LocalStorage. LocalStorage would be constantly dumping & re-fetching the scope data. Also, what happens when you navigate a window from page A to page B on the same site (same scope). Would the scope cache be evicted during the navigation?

> Why do you think prefetching would be a problem and needed a pref?

I was trying to think of ideas for testing this patch, it's a big change and I thought it would be nice to test it in phases. What is your plan for testing?

>diff --git a/dom/src/storage/DOMStorage.h b/dom/src/storage/DOMStorage.h
>new file mode 100644
>+uint32_t
>+DOMStorageCache::GetLength(DOMStorage* storage)
>+{
>+  if (Persist(storage))
>+    WaitForPreload(Telemetry::LOCALDOMSTORAGE_GETLENGTH_MS);
>+
>+  return DataSet(storage).mKeys.Count();

I think the "DataSet" and "SetIndex" patterns make it hard to read & understand the code, but I'm not sure how to improve them. Maybe replace the "DataSet().mKeys" call with a private getter (e.g. "GetKeysAndValues")

>+void
>+DOMStorageCache::CloneFrom(DOMStorageCache *that)
>+{
>+  mLoaded = that->mLoaded;
>+  mInitialized = that->mInitialized;
>+  mPersistent = that->mPersistent;
>+  mSessionOnlyDataSetActive = that->mSessionOnlyDataSetActive;
>+
>+  for (uint32_t i = 0; i < 3; ++i) {
>+    that->mData[i].mKeys.EnumerateRead(CloneSetData, &mData[i]);
>+    that->ProcessUsageDelta(i, mData[i].mSize);
>+  }

Add a comment explaining the "3" in the for-loop

>+bool
>+DOMStorageCache::ProcessUsageDelta(uint32_t setIndex, const int64_t delta)
>+{
>+  // Check limit per this origin
>+  Data& data = mData[setIndex];
>+  uint64_t newUsage = data.mSize + delta;
>+  if (delta > 0 && newUsage > DOMStorageManager::GetQuota())
>+    return false;
>+
>+  // TODO here we will check the global or any other scoped kind of quota
>+  // if that will not pass, just return false early

How will this work? Will ProcessUsageDelta "reserve" quota with QuotaManager? You don't want to postpone quota checks until you do the flush.

>+nsresult
>+DOMStorageDBThread::Done()
>+{

Nit: maybe a more descriptive name for the method (with a verb)

>+void
>+DOMStorageDBThread::SyncPreload(DOMStorageCacheBridge * cache, bool force)
>+{
>+  if (!force && cache->LoadedCount()) {
>+    // Preload already started for this cache, just wait for it to finish.
>+    // LoadWait will exit after LoadDone on the cache has been called.
>+    SetHigherPriority();
>+    cache->LoadWait();
>+    SetDefaultPriority();
>+    return;
>+  }
>+

What if preload has already started but LoadedCount() == 0?

>+  if (mJournalMode) {
>+    // Do the load synchronously on the main thread, WAL is enabled.
>+    Task preload(Task::taskPreload, cache);
>+    preload.Run(this);

mJournalMode name is misleading: Write-Ahead-Logging (mJournalMode = true) vs Rollback *JOURNAL* (mJournalMode = false)

>+  }
>+  else {
>+    // Need to go asynchronously since WAL is not allowed.
>+    // Schedule preload for this cache as the first task.
>+    InsertTask(new Task(Task::taskPreloadUrgent, cache));
>+
>+    // LoadWait will exit after LoadDone of the cache has been called.
>+    cache->LoadWait();
>+  }
>+}

Isn't there a possibility of a race here? You grab the monitor lock in "InsertTask", which wakes the helper thread, and then you *release* the lock. The higher-priority helper thread could finish preloading and call "LoadDone" before the main thread ever calls "LoadWait". The main thread would end up waiting forever.

>+void
>+DOMStorageDBThread::SetDefaultPriority()
>+{
>+  if (!--mPriorityCounter)
>+    PR_SetThreadPriority(mThread, PR_PRIORITY_LOW);
>+}

I would prefer an "mPriorityCounter <= 0" check here, just in case

>+  // List scopes having any stored data
>+  nsCOMPtr<mozIStorageStatement> stmt;
>+  rv = mConnection->CreateStatement(
>+             NS_LITERAL_CSTRING("SELECT DISTINCT scope FROM webappsstore2"),
>+                                getter_AddRefs(stmt));

Indentation of NS_LITERAL_CSTRING line

>+const nsCString
>+DOMStorageDBThread::Task::Target()
>+{
>+  switch (mType) {
>+    case taskAddItem:
>+    case taskUpdateItem:
>+    case taskRemoveItem:
>+      return Scope() + NS_LITERAL_CSTRING("|") + NS_ConvertUTF16toUTF8(mKey);
>+
>+    default:
>+      return Scope();
>+  }
>+}

This method needs to be renamed at the very least, e.g. GetScopeAndKey()

>+nsresult
>+DOMStorageDBThread::Task::Execute(DOMStorageDBThread* thread)
>+{
>+  nsresult rv;
>+
>+  switch (mType) {
>+  case taskPreload:
>+  case taskPreloadUrgent:
>+  {
>+    // Already loaded?
>+    if (mCache->Loaded())
>+      break;
>+
>+#define PRELOAD_STATEMENT \
>+      "SELECT key, value FROM webappsstore2 " \
>+      "WHERE scope = :scope ORDER BY key LIMIT -1 OFFSET :offset"

When is an offset load needed? Document please

>+#define SYNCLOAD_STATEMENT \
>+      "/* sync load */" \
>+      PRELOAD_STATEMENT

These defines are ugly, maybe just write out the full strings as static const char*

>+void
>+DOMStorageDBThread::Task::Finalize(nsresult rv)
>+{
>+  switch (mType) {
>+  case taskPreloadUrgent:
>+  case taskPreload:
>+    if (NS_FAILED(rv)) {
>+      // When we are here, something failed when loading from the database.
>+      // Notify that the storage is loaded to prevent deadlock of the main thread,
>+      // even though it is actually empty or incomplete.
>+      NS_WARNING("Failed to preload localStorage");

Why not pass the error code to the blocking task? This will be valuable for preventing corruption if, for example, the DB is temporarily unavailable because the profile is on a network share. Same goes for SyncPreload, what's the benefit of hiding errors from the callers and continuing with corrupted state?

>+void
>+DOMStorageDBThread::PendingTasks::Add(DOMStorageDBThread::Task* task)
>+{
>+  switch (task->Type())
>+  {
>+  // Operations on single keys
>+
>+  case Task::taskRemoveItem:
>+  {
>+    DOMStorageDBThread::Task* pendingTask;
>+    if (mUpdates.Get(task->Target(), &pendingTask) &&
>+        pendingTask->Type() == Task::taskAddItem)
>+    {
>+      // The key was never put on disk, just forget the add task
>+      // This is only an optimization.
>+      mUpdates.Remove(task->Target());
>+      delete task;
>+      return;
>+    }
>+    // NO BREAK

This method needs more documentation. I'm not a fan of the fall-through case style when the logic is as elaborate as here.

>+DOMStorageDBThread::PendingTasks::Execute(DOMStorageDBThread* thread)
>+{
>+  mozStorageTransaction transaction(thread->mConnection, false);
>+
>+  nsresult rv;
>+
>+  for (uint32_t i = 0; i < mExecList.Length(); ++i) {
>+    DOMStorageDBThread::Task* task = mExecList[i];
>+    rv = task->Execute(thread);
>+    if (NS_FAILED(rv))
>+      break;
>+  }
>+
>+  mExecList.Clear();

Why clear the exec list if one of the flushes fails? Shouldn't we retry the flush? Otherwise, this won't be able to handle intermittent fails.

>+    typedef enum {
>+      // Only task that reads data from the database
>+      taskPreload,
>+      // The same as taskPreload, just executed with highest priority
>+      taskPreloadUrgent,
>+
>+      // Operations invoked by the DOM content API
>+      taskAddItem,
>+      taskUpdateItem,
>+      taskRemoveItem,
>+      taskClear,
>+
>+      // Operations invoked by chrome
>+      taskClearAll,
>+      taskClearMatchingScope,
>+    } TType;

Rename "TType" to "TaskType". Also, I think we usually use all-caps+underscores for enum values.

>+    // Finilize the task, i.e. do any internal cleanup and finish calls
>+    void Finalize(nsresult rv);

Typo in comment

>diff --git a/dom/src/storage/DOMStorageManager.cpp b/dom/src/storage/DOMStorageManager.cpp
>new file mode 100644
>--- /dev/null
>+++ b/dom/src/storage/DOMStorageManager.cpp
>+// Only allow relatively small amounts of data since performance of
>+// the synchronous IO is very bad.
>+#define DEFAULT_QUOTA_LIMIT (5 * 1024)

Comment that you're not doing "per site" quota enforcement in this file.
Btw, how is this going to integrate with 

>+    if (!_retval) {
>+      // This is demand to just preload the cache, if the scope has
>+      // no data stored, bypass creation and preload of the cache.
>+      DOMStorageDBBridge * db = DOMStorageCache::GetDatabase();
>+      if (db && !db->ShouldPreloadScope(scope))
>+        return NS_OK;
>+    }

Isn't it possible for mScopesHavingData to not be initialized at this point?

>+NS_IMETHODIMP
>+DOMStorageManager::CloneStorage(nsIDOMStorage *aStorage)
>+{
>+  nsCOMPtr<nsPIDOMStorage> pstorage = do_QueryInterface(aStorage);
>+  if (!pstorage)
>+    return NS_ERROR_UNEXPECTED;
>+
>+  DOMStorageCache* origcache = pstorage->GetCache();
>+
>+  DOMStorageCache* existingcache = GetCache(origcache->Scope());

Use camel-case for variable names

>+  if (existingcache) {
>+    // Do not replace an existing session storage.
>+    return NS_ERROR_NOT_AVAILABLE;
>+  }

Shouldn't this really be NS_ERROR_UNEXPECTED?

>+// static
>+nsresult
>+DOMStorageObserver::Init()
>+{
>+  if (sSelf)
>+    return NS_OK;
>+
>+  nsCOMPtr<nsIObserverService> obs = mozilla::services::GetObserverService();
>+  if (!obs)
>+    return NS_ERROR_UNEXPECTED;
>+
>+  sSelf = new DOMStorageObserver();
>+  NS_ADDREF(sSelf);
>+  // Chrome clear operations.
>+
>+  obs->AddObserver(sSelf, "final-ui-startup", true);

Am I right in deducing "StartDatabase" will be first called as a result of this event? If that's correct, there will be extra I/O on startup creating extra I/O contention. You can verify this by setting homepage to about:blank and then comparing Telemetry's startupSessionRestore[Read|Write]Bytes with and without this event listener. 

I have a few questions related to how prefetch affects startup in general:

1) Will "about:home" have its LS data prefetched? I noticed the "IsChromeWindow()" check in PrecacheLocalStorage but I don't know if it affects "about:home".
2) If a user restores a session but he is using the default "Don't load tabs until selected" behavior, will the tabs' LS data be prefetched on session restore? I suspect the answer is no.
3) Assuming that "about:home" is refactored to not use LocalStorage and we find that most sessions start only with "about:home", do you think it would make sense to initialize the database off of a timer scheduled a couple of seconds after startup is done (e.g. 2 seconds after "session-restored" event)?


>diff --git a/dom/tests/mochitest/localstorage/Makefile.in b/dom/tests/mochitest/localstorage/Makefile.in
>--- a/dom/tests/mochitest/localstorage/Makefile.in
>+++ b/dom/tests/mochitest/localstorage/Makefile.in
>@@ -33,40 +33,45 @@ MOCHITEST_FILES	= \
>-    test_localStorageQuota.html \
>-    test_localStorageQuotaSessionOnly.html \
>-    test_localStorageQuotaSessionOnly2.html \

There should still be tests for the 5MB per-scope quota

>diff --git a/embedding/components/windowwatcher/src/nsWindowWatcher.cpp b/embedding/components/windowwatcher/src/nsWindowWatcher.cpp
>--- a/embedding/components/windowwatcher/src/nsWindowWatcher.cpp
>+++ b/embedding/components/windowwatcher/src/nsWindowWatcher.cpp
>+    if (parentStorageManager && newStorageManager) {
>+      nsCOMPtr<nsIDOMStorage> storage;
>+      parentStorageManager->GetStorage(subjectPrincipal, isPrivateBrowsingWindow, getter_AddRefs(storage));
>+      if (storage)
>+        newStorageManager->CloneStorage(storage);

Would this create a copy of cached persistent storage data as well as session storage data?

>diff --git a/toolkit/components/telemetry/Histograms.json b/toolkit/components/telemetry/Histograms.json
>--- a/toolkit/components/telemetry/Histograms.json
>+++ b/toolkit/components/telemetry/Histograms.json
>   "LOCALDOMSTORAGE_GETVALUE_MS": {
>     "kind": "exponential",
>     "high": "3000",
>     "n_buckets": 10,
>-    "description": "Time to return a value for a key in LocalStorage (ms)"
>+    "description": "Time to block before we return a value for a key in LocalStorage (ms)"
>   },

It will be hard to do apples-to-apples comparisons if you change the meaning of these histograms. Why not just leave them in the top-level DOMStorage functions? I like the current probe sites because they measure the distribution of the main-thread costs for each LocalStorage function, regardless of the underlying implementation.
Attachment #691135 - Flags: review?(vdjeric)
My apologies, I didn't see your documentation at https://developer.mozilla.org/en-US/docs/DOM/Storage/Implementation
(In reply to Vladan Djeric (:vladan) from comment #27)
> Comment on attachment 691135 [details] [diff] [review]
> v1.0.1
> 
> I would suggest splitting the patch into multiple pieces

If you suggest how to split it (I have no idea how to split this patch) then I'm OK to do it, but I personally prefer one patch for review.

> > I already implemented eviction of a cache when there are DOMStorage objects using it.
> 
> Oh I see, this is the DOMStorage::~DOMStorage code? i.e. a scope is evicted
> from memory after the last reference to window.localStorage goes away?
> Does this model work well in practice? I can think of a scenario where a
> user goes from site to site and most pages have a GoogleAnalytics widget
> that uses LocalStorage. LocalStorage would be constantly dumping &
> re-fetching the scope data. Also, what happens when you navigate a window
> from page A to page B on the same site (same scope). Would the scope cache
> be evicted during the navigation?

LocalStorage instance is held by the windows (nsGlobalWindow).  It lives a long time while a tab with the content is open.  Then GC gives us some time after the tab gets closed.  This needs testing, manual testing with watching logs or traces.

> 
> > Why do you think prefetching would be a problem and needed a pref?
> 
> I was trying to think of ideas for testing this patch, it's a big change and
> I thought it would be nice to test it in phases. What is your plan for
> testing?

Hard to say what the plan is.  I am using the test build on 3 machines of mine to get telemetry and check whether there are any crashes or so (so far few telemetry data gathered and no crashes experienced).  Data loss seems not occur either.

> I think the "DataSet" and "SetIndex" patterns make it hard to read &
> understand the code, but I'm not sure how to improve them. Maybe replace the
> "DataSet().mKeys" call with a private getter (e.g. "GetKeysAndValues")

Harder to read IMO.  Reviewer/coder has to figure out what GetKeysAndValues and do.  There is also mSize so we have to have GetSize?  How will we write to it?  Via SetSize?  The way I did it seems to me cleaner.

What the sets mean, why are where they are is well documented in the overview.

> >+bool
> >+DOMStorageCache::ProcessUsageDelta(uint32_t setIndex, const int64_t delta)
> >+{
> >+  // Check limit per this origin
> >+  Data& data = mData[setIndex];
> >+  uint64_t newUsage = data.mSize + delta;
> >+  if (delta > 0 && newUsage > DOMStorageManager::GetQuota())
> >+    return false;
> >+
> >+  // TODO here we will check the global or any other scoped kind of quota
> >+  // if that will not pass, just return false early
> 
> How will this work? Will ProcessUsageDelta "reserve" quota with
> QuotaManager? You don't want to postpone quota checks until you do the flush.

Yes, so, this is something to discuss.  My patch addition for eTLD+1 quota checks had a simple, but not sure whether proven API: a single method bool CheckQuota(scope, delta).  The method does the check whether delta would exceed the quota for the scope.  If so, false is immediately returned.  If the test passes, then the cached value for quota is updated by delta and true is returned.  Then the consumer (=localStorage) MUST write the data to its cache and also to the database.

I need to check Jan's work on the quota manager and maybe somehow pre-integrate it here.

> >+void
> >+DOMStorageDBThread::SyncPreload(DOMStorageCacheBridge * cache, bool force)
> >+{
> >+  if (!force && cache->LoadedCount()) {
> >+    // Preload already started for this cache, just wait for it to finish.
> >+    // LoadWait will exit after LoadDone on the cache has been called.
> >+    SetHigherPriority();
> >+    cache->LoadWait();
> >+    SetDefaultPriority();
> >+    return;
> >+  }
> >+
> 
> What if preload has already started but LoadedCount() == 0?

The this will not pass :)

Well, then we do a sync preload.  This is place to optimize anyway, I need some telemetry first here.

> 
> >+  if (mJournalMode) {
> >+    // Do the load synchronously on the main thread, WAL is enabled.
> >+    Task preload(Task::taskPreload, cache);
> >+    preload.Run(this);
> 
> mJournalMode name is misleading: Write-Ahead-Logging (mJournalMode = true)
> vs Rollback *JOURNAL* (mJournalMode = false)

This is directly copied from your patch.

> 
> >+  }
> >+  else {
> >+    // Need to go asynchronously since WAL is not allowed.
> >+    // Schedule preload for this cache as the first task.
> >+    InsertTask(new Task(Task::taskPreloadUrgent, cache));
> >+
> >+    // LoadWait will exit after LoadDone of the cache has been called.
> >+    cache->LoadWait();
> >+  }
> >+}
> 
> Isn't there a possibility of a race here? You grab the monitor lock in
> "InsertTask", which wakes the helper thread, and then you *release* the
> lock. The higher-priority helper thread could finish preloading and call
> "LoadDone" before the main thread ever calls "LoadWait". The main thread
> would end up waiting forever.

A quick look at all LoadWait() implementations will tell you it is not possible ;)

LoadWait() exits immediately when mLoaded is already true.  If you see a mistake in any of them, then it needs to be fixed, however I don't see any.

> >+const nsCString
> >+DOMStorageDBThread::Task::Target()
> >+{
> >+  switch (mType) {
> >+    case taskAddItem:
> >+    case taskUpdateItem:
> >+    case taskRemoveItem:
> >+      return Scope() + NS_LITERAL_CSTRING("|") + NS_ConvertUTF16toUTF8(mKey);
> >+
> >+    default:
> >+      return Scope();
> >+  }
> >+}
> 
> This method needs to be renamed at the very least, e.g. GetScopeAndKey()

No.  If you look at the implementation then you clearly see it is not always scope+key.  It is a task-type dependent value.  And it is document in the header what this method is intended to be used for and what it does.

> >+#define PRELOAD_STATEMENT \
> >+      "SELECT key, value FROM webappsstore2 " \
> >+      "WHERE scope = :scope ORDER BY key LIMIT -1 OFFSET :offset"
> 
> When is an offset load needed? Document please

This isn't clear??  Wops.  It is documented in the overview.  Only child process cache already having some, but not all, items loaded triggers use of it.  It is intended to load only the remaining keys and not all when some keys have already been loaded.

> >+void
> >+DOMStorageDBThread::Task::Finalize(nsresult rv)
> >+{
> >+  switch (mType) {
> >+  case taskPreloadUrgent:
> >+  case taskPreload:
> >+    if (NS_FAILED(rv)) {
> >+      // When we are here, something failed when loading from the database.
> >+      // Notify that the storage is loaded to prevent deadlock of the main thread,
> >+      // even though it is actually empty or incomplete.
> >+      NS_WARNING("Failed to preload localStorage");
> 
> Why not pass the error code to the blocking task? This will be valuable for
> preventing corruption if, for example, the DB is temporarily unavailable
> because the profile is on a network share. Same goes for SyncPreload, what's
> the benefit of hiding errors from the callers and continuing with corrupted
> state?

I was thinking of it.  I'm not sure how to propagate the error up.  What should we do when preload fail?  let getKey fail?  I'm still not sure it's a good idea.

When a write fail, we can either retry later (just keep the mExecList array) and cut it only when it exceeds some limit.  This way we may retry writes easily.

> 
> >+void
> >+DOMStorageDBThread::PendingTasks::Add(DOMStorageDBThread::Task* task)
> This method needs more documentation. 

I'll add it.

> I'm not a fan of the fall-through case
> style when the logic is as elaborate as here.

And what style would you like?

> 
> >+  mExecList.Clear();
> 
> Why clear the exec list if one of the flushes fails? Shouldn't we retry the
> flush? Otherwise, this won't be able to handle intermittent fails.

Yes, exactly my idea as well.  I made it here this way just for keeping things simpler.

There however has to be some limit.  Maybe drop the list when it simply exceeds some length limit (like 2000 operations?)

> >diff --git a/dom/src/storage/DOMStorageManager.cpp b/dom/src/storage/DOMStorageManager.cpp
> >+#define DEFAULT_QUOTA_LIMIT (5 * 1024)
> 
> Comment that you're not doing "per site" quota enforcement in this file.
> Btw, how is this going to integrate with 

With what?

> 
> >+    if (!_retval) {
> >+      // This is demand to just preload the cache, if the scope has
> >+      // no data stored, bypass creation and preload of the cache.
> >+      DOMStorageDBBridge * db = DOMStorageCache::GetDatabase();
> >+      if (db && !db->ShouldPreloadScope(scope))
> >+        return NS_OK;
> >+    }
> 
> Isn't it possible for mScopesHavingData to not be initialized at this point?

Sure it is possible, we are async.  I really don't want to block on this.  It will just make early loaded caches to hang.  We can solve the cleanup somehow in a followup patch.

> >+  if (existingcache) {
> >+    // Do not replace an existing session storage.
> >+    return NS_ERROR_NOT_AVAILABLE;
> >+  }
> 
> Shouldn't this really be NS_ERROR_UNEXPECTED?

I think this is what use now for this case.  I'll double check.

> >+  obs->AddObserver(sSelf, "final-ui-startup", true);
> 
> Am I right in deducing "StartDatabase" will be first called as a result of
> this event? If that's correct, there will be extra I/O on startup creating
> extra I/O contention. You can verify this by setting homepage to about:blank
> and then comparing Telemetry's startupSessionRestore[Read|Write]Bytes with
> and without this event listener. 

Good idea.  I will.  I was trying to find a place where to start the database up early.  Maybe this is not the best place, I'm very open to suggestions.

> 
> I have a few questions related to how prefetch affects startup in general:
> 
> 1) Will "about:home" have its LS data prefetched? I noticed the
> "IsChromeWindow()" check in PrecacheLocalStorage but I don't know if it
> affects "about:home".

It is being prefetched, yes.

> 2) If a user restores a session but he is using the default "Don't load tabs
> until selected" behavior, will the tabs' LS data be prefetched on session
> restore? I suspect the answer is no.

No, it will come from session store.  I will check on this specifically.

> 3) Assuming that "about:home" is refactored to not use LocalStorage and we
> find that most sessions start only with "about:home", do you think it would
> make sense to initialize the database off of a timer scheduled a couple of
> seconds after startup is done (e.g. 2 seconds after "session-restored"
> event)?

Interesting idea.  I will check on this.

> There should still be tests for the 5MB per-scope quota

I will introduce them.

> >+    if (parentStorageManager && newStorageManager) {
> >+      nsCOMPtr<nsIDOMStorage> storage;
> >+      parentStorageManager->GetStorage(subjectPrincipal, isPrivateBrowsingWindow, getter_AddRefs(storage));
> >+      if (storage)
> >+        newStorageManager->CloneStorage(storage);
> 
> Would this create a copy of cached persistent storage data as well as
> session storage data?

:) managers here are sessionStorage managers.  So it will just clone sessionStorage data, and not the persistent data.  If you read the overview, it would be simpler to understand this.

> 
> >diff --git a/toolkit/components/telemetry/Histograms.json b/toolkit/components/telemetry/Histograms.json
> >--- a/toolkit/components/telemetry/Histograms.json
> >+++ b/toolkit/components/telemetry/Histograms.json
> >   "LOCALDOMSTORAGE_GETVALUE_MS": {
> >     "kind": "exponential",
> >     "high": "3000",
> >     "n_buckets": 10,
> >-    "description": "Time to return a value for a key in LocalStorage (ms)"
> >+    "description": "Time to block before we return a value for a key in LocalStorage (ms)"
> >   },
> 
> It will be hard to do apples-to-apples comparisons if you change the meaning
> of these histograms. Why not just leave them in the top-level DOMStorage
> functions? I like the current probe sites because they measure the
> distribution of the main-thread costs for each LocalStorage function,
> regardless of the underlying implementation.

I'm writing about this in the overview:telemetry as well.  It really may be better to preserve these probes as are.  But, I want to have probes that measure how long we block on first access and on what operation.  That is critical to determine for following optimizations.
(In reply to Honza Bambas (:mayhemer) from comment #29)
> (In reply to Vladan Djeric (:vladan) from comment #27)
> > Comment on attachment 691135 [details] [diff] [review]
> > v1.0.1
> > 
> > I would suggest splitting the patch into multiple pieces
> 
> If you suggest how to split it (I have no idea how to split this patch) then
> I'm OK to do it, but I personally prefer one patch for review.

It's up to you & next reviewer. I would split it into IPC, test, deleted files, DOMStorageCache, DOMStorageManager+Observer, DOMStorageDBThread parts

> > >+  if (mJournalMode) {
> > >+    // Do the load synchronously on the main thread, WAL is enabled.
> > >+    Task preload(Task::taskPreload, cache);
> > >+    preload.Run(this);
> > 
> > mJournalMode name is misleading: Write-Ahead-Logging (mJournalMode = true)
> > vs Rollback *JOURNAL* (mJournalMode = false)
> 
> This is directly copied from your patch.

Hmm, I don't have an mJournalMode variable -- I keep track of DB mode indirectly via the NULL-ness of the mFlushThread variable

> > I'm not a fan of the fall-through case
> > style when the logic is as elaborate as here.
> 
> And what style would you like?

If's and else's

> > Comment that you're not doing "per site" quota enforcement in this file.
> > Btw, how is this going to integrate with 
> 
> With what?

I didn't notice I forgot to remove this line. I was going to ask how this integrates with QuotaManager, but it sounds like that's not decided yet

> > >+  if (existingcache) {
> > >+    // Do not replace an existing session storage.
> > >+    return NS_ERROR_NOT_AVAILABLE;
> > >+  }
> > 
> > Shouldn't this really be NS_ERROR_UNEXPECTED?
> 
> I think this is what use now for this case.  I'll double check.

This is what is used now, but I don't understand why "not available" is returned

> I'm writing about this in the overview:telemetry as well.  It really may be
> better to preserve these probes as are.  But, I want to have probes that
> measure how long we block on first access and on what operation.  That is
> critical to determine for following optimizations.

We could have new probes for these new measurements
> I was thinking of it.  I'm not sure how to propagate the error up. What should we do
> when preload fail?  let getKey fail?  I'm still not sure it's a good idea.

We already let nsDOMStoragePersistentDB::GetKeyValue() fail on DB errors. 

In general, it's impossible to deal with this type of error intelligently across all possible situations, so let's allow the application logic to deal with it. That has to be better than pretending localStorage is empty and then writing to the DB later and causing inconsistent state in the DB. 

If you really want to hide the error from the callers, you could assume localStorage is blank, return an empty localStorage but then also schedule a "clear" of the scope on next flush to actually make it true/consistent.
(In reply to Vladan Djeric (:vladan) from comment #30)
> > If you suggest how to split it (I have no idea how to split this patch) then
> > I'm OK to do it, but I personally prefer one patch for review.
> 
> It's up to you & next reviewer. I would split it into IPC, test, deleted
> files, DOMStorageCache, DOMStorageManager+Observer, DOMStorageDBThread parts

I don't know, those are just files, you as a reviewer can check these pieces one by one in splinter.  I personally prefer one big patch where I can see all pieces together and see all dependencies.  This patch has no layers so it is hard to provide incrementally.  But split removals, new code, integration changes and tests (=4 parts) is probably a good way to split.  I can do that next time.

Btw, from your comment it looks like you don't want to review the next patch version?

> Hmm, I don't have an mJournalMode variable -- I keep track of DB mode
> indirectly via the NULL-ness of the mFlushThread variable

I really copied this from some of your patches.  However I will address your comment.

> 
> > > I'm not a fan of the fall-through case
> > > style when the logic is as elaborate as here.
> > 
> > And what style would you like?
> 
> If's and else's

I don't believe it would be better and more readable.  Fall-through is in place here.

> We could have new probes for these new measurements

Yep, I agree.
(In reply to Honza Bambas (:mayhemer) from comment #32)
> Btw, from your comment it looks like you don't want to review the next patch
> version?

I'd be happy to review future versions, but I'm not a module peer so I assumed that you would want a DOM person to do an "official" review in the near future.
(In reply to Vladan Djeric (:vladan) from comment #27)
> What if preload has already started but LoadedCount() == 0?

I've misread this first.  We may want to stop the pending preload or just wait for it.  This is something for a followup I think.

> >+#define SYNCLOAD_STATEMENT \
> >+      "/* sync load */" \
> >+      PRELOAD_STATEMENT
> 
> These defines are ugly, maybe just write out the full strings as static
> const char*

I'll leave it because I want to have the statement be written on just a single place.

> >+void
> >+DOMStorageDBThread::PendingTasks::Add(DOMStorageDBThread::Task* task)
>
> This method needs more documentation. I'm not a fan of the fall-through case
> style when the logic is as elaborate as here.

I finally rewrote this but just because there was a small bug: the mUpdates hash table was walked unnecessarily twice.

> >diff --git a/dom/src/storage/DOMStorageManager.cpp b/dom/src/storage/DOMStorageManager.cpp
> >new file mode 100644
> >--- /dev/null
> >+++ b/dom/src/storage/DOMStorageManager.cpp
> >+// Only allow relatively small amounts of data since performance of
> >+// the synchronous IO is very bad.
> >+#define DEFAULT_QUOTA_LIMIT (5 * 1024)
> 
> Comment that you're not doing "per site" quota enforcement in this file.
> Btw, how is this going to integrate with 

But I do.

> >+  if (existingcache) {
> >+    // Do not replace an existing session storage.
> >+    return NS_ERROR_NOT_AVAILABLE;
> >+  }
> 
> Shouldn't this really be NS_ERROR_UNEXPECTED?

No, because it is in some case expected.
(In reply to Vladan Djeric (:vladan) from comment #27)
> 3) Assuming that "about:home" is refactored to not use LocalStorage and we
> find that most sessions start only with "about:home", do you think it would
> make sense to initialize the database off of a timer scheduled a couple of
> seconds after startup is done (e.g. 2 seconds after "session-restored"
> event)?

I forgot to answer this question...  I really don't know.  I will check what all notifications I can get and which one can be the best to use.  I may raise this on dev-platform too probably.
Attached patch v2 [Complete patch] (obsolete) — Splinter Review
Major changes:

- errors of preload (read) are now propagated up to DOM
  - then localStorage methods throw until clear() method on it is called
- write operations are queued and retried when flushing them fails for any reason
  - the write backlog is cleared when more then 1000 write operations failed to flush
- reintroduced (rewritten) quota check tests to reflect we do quota check per origin and not eTLD+1
- existing telemetry probes are left unchanged, probes for measuring blocking times introduced

Addresses most of review comments, some nits left unchanged as discussed.

Try: https://tbpl.mozilla.org/?tree=Try&rev=1b0439614815
Attachment #691135 - Attachment is obsolete: true
Attachment #695185 - Flags: superreview?(jst)
Attachment #695185 - Flags: review?(vdjeric)
Attachment #695187 - Attachment is patch: true
Attached patch v2 [part5] - changes to tests (obsolete) — Splinter Review
The [partN] reference-only patches don't overlap, each contains different set of files, may be applied in any order and applied all form the [Complete patch].
Comment on attachment 695187 [details] [diff] [review]
v2 [part2] - changes to existing interfaces

Review of attachment 695187 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/interfaces/storage/nsPIDOMStorage.h
@@ +14,3 @@
>  class nsIPrincipal;
>  
> +namespace mozilla { namespace dom {

Two lines

@@ +17,5 @@
> +
> +class DOMStorageCache;
> +class DOMStorageManager;
> +
> +}} // mozilla::dom

And here

@@ +19,5 @@
> +class DOMStorageManager;
> +
> +}} // mozilla::dom
> +
> +using namespace mozilla::dom;

No using statements in headers.
Documentation has been updated to the v2 patch.
The documentation wiki page is looking good :) 

Btw, you said in the wiki page that "according to Telemetry, reads from the database are very fast". The LOCALDOMSTORAGE_FETCH_DOMAIN_MS histogram indeed looks very good but you have to take into account that this probe is timing scope fetches even when the scope has *zero* data in storage. The "Slow SQL" section on the Metrics site shows that the SQL for fetching LS scope data is still in the Top 15 most common SQL statements taking more than 150ms. Your implementation is going to be no worse than the current one, but we should still remain vigilant about doing extra I/O on reads.

> LocalStorage instance is held by the windows (nsGlobalWindow).  It lives a
> long time while a tab with the content is open.  Then GC gives us some time
> after the tab gets closed.  This needs testing, manual testing with watching
> logs or traces.

Ok. One simple approach would be to use a "grace period" of X seconds after the scope's last tab is closed before evicting its DOMStorageCache.

> > 2) If a user restores a session but he is using the default "Don't load tabs
> > until selected" behavior, will the tabs' LS data be prefetched on session
> > restore? I suspect the answer is no.
>
> No, it will come from session store.  I will check on this specifically.

What did you find out?

> > Comment that you're not doing "per site" quota enforcement in this file.
>
> But I do.

By "per site", I meant eTLD+1 (as opposed to "per origin")

> Try: https://tbpl.mozilla.org/?tree=Try&rev=1b0439614815

1) I took a look at the Try logs for "Mac OS X Lion 10.7 debug" MochiTest-1. It looks like a few of the NS_ENSURE_SUCCESS calls in DOMStorage failed + it's complaining about leaks. Do you know what these are about?
2) I'd also be curious to know the results of Talos tests for startup & shutdown numbers.

>+  if (Persist(storage)) {
>+    WaitForPreload(Telemetry::LOCALDOMSTORAGE_GETKEY_BLOCK_MS);

It would be nice to have a Telemetry probe that counts the number of preloaded scopes that never get used in a Firefox session

Nit: _BLOCKING_ is a bit clearer than _BLOCK_

----

Question: Is this race possible?

0. A scope has data in the LocalStorage DB
1. User requests to "Clear Browsing History"
    -> DOMStorageDBThread::PendingTasks::Add taskClearAll
2. Start processing taskClearAll on helper thread
    -> DOMStorageDBThread::ThreadFunc calls mPendingTasks.Prepare() while holding the monitor lock
    -> PendingTasks::Prepare empties mClears while holding the lock
    -> DOMStorageDBThread::ThreadFunc unlocks before calling mPendingTasks.Execute()
3. Helper thread is taking a long time executing the SQL stmt to empty the DB
4. An iframe belonging to the scope in step 0 is loaded
    -> DOMStorageDBThread::ShouldPreloadScope returns true
    -> DOMStorageDBThread::InsertTask grabs the monitor lock
    -> mPendingTasks.IsScopeClearPending() returns false since mClears is empty
    -> taskPreload task gets enqueued
4. A script in the iframe requests a LocalStorage item before taskClearAll and taskPreload are done
    -> WaitForPreload is called
    -> cache->LoadedCount() is 0 + WAL mode is on, so load happens on main thread
    -> If my understanding of WAL is correct, main thread load might read stale values from the DB and return them to the caller (which is incorrect)
5. taskClearAll finishes
    -> DB is emptied
    -> taskClearAll clears mScopesHavingData

>+nsTArray<nsString> *
>+DOMStorage::GetKeys()
>+{
>+  if (!CanUseStorage(this))
>+    return nullptr;
>+
>+  return mCache->GetKeys(this);

It looks like you've changed the semantics for GetKeys. We used to return a zero-element array when the LS was empty and now we return a nullptr. Did you check that this can't cause a null pointer exception?

>+
>+  ERROR(NS_SUCCESS_DOM_NO_OPERATION,               SUCCESS(1)),

Is this return code documented somewhere?

>+nsresult
>+DOMStorageCache::Clear(DOMStorage* storage)
>+{
>+  Telemetry::AutoTimer<Telemetry::LOCALDOMSTORAGE_REMOVEALL_MS> autoTimer;
>+

This probe is currently used for timing the clearing of the *entire* DB. You'll need a new probe for timing clearing of a single scope

>+nsresult
>+DOMStorageCache::Clear(DOMStorage* storage)
>+{
>+  Telemetry::AutoTimer<Telemetry::LOCALDOMSTORAGE_REMOVEALL_MS> autoTimer;
>+
>+  bool refresh = false;
>+  if (Persist(storage)) {
>+    WaitForPreload(Telemetry::LOCALDOMSTORAGE_REMOVEALL_BLOCK_MS);

Why do we need to wait for preload to clear a scope?

>+void
>+DOMStorageCache::LoadDone(nsresult rv)
>+{
>+  MonitorAutoLock monitor(mMonitor);
>+  mLoaded = true;
>+  monitor.Notify();
>+}

Do you get a compiler warning for the unused parameter?

>+  virtual void LoadDone(nsresult rv)
>+  {
>+    // Called on the cache background thread
>+    MonitorAutoLock monitor(mMonitor);
>+    mLoaded = true;
>+    *mRv = rv;
>+    monitor.Notify();
>+  }

Nit: Be consistent about putting return types on their own line

> <...> I was trying to find a place where to start the
> database up early.  Maybe this is not the best place, I'm very open to
> suggestions.

"final-ui-startup" seems too early. Instead, we could have preload turned off at startup so that we don't initialize the DB until either first use (e.g. google.com homepage) or 5 seconds after "sessionstore-windows-restored" (whichever ends up happening first).

>+void
>+DOMStorageDBThread::PendingTasks::Add(DOMStorageDBThread::Task* task)
>+{
>+  // Optimize: when a key to remove has never been written to disk
>+  // just bypass this operation.  A kew is new when a task scheduled
>+  // to write it to the database is of type taskAddItem.
>+  if (task->Type() == Task::taskRemoveItem) {
>+    DOMStorageDBThread::Task* pendingTask;
>+    if (mUpdates.Get(task->Target(), &pendingTask) &&
>+        pendingTask->Type() == Task::taskAddItem) {
>+      mUpdates.Remove(task->Target());
>+      delete task;
>+      return;
>+    }
>+  }

What would happen if there was an existing key "foo" in the DB and the memory cache, and the web page made these calls:

localStorage.removeItem("foo"); // taskRemoveItem is enqueued in mUpdates
localStorage.setItem("foo", "bar") // overwrite taskRemoveItem with taskAddItem 
localStorage.removeItem("foo"); // nothing gets changed on disk?

>+bool
>+DOMStorageDBThread::PendingTasks::Prepare()
>+{
>+  if (mExecList.Length() > 1000) {
>+    NS_WARNING("Failed to flush more then 1000 writes"
>+               " of localStorage data");
>+    mExecList.Clear();
>+  }

I don't think we can avoid flushing old changes if we want to flush new ones. I think we should just stop flushing in this case but also set a system-wide error flag so we error out on all future write attempts.

Essentially, I believe "data loss + exposing error conditions to clients" is preferable to "data loss + risk of inconsistency".
Attachment #695185 - Flags: review?(vdjeric)
Thanks for great review!

I have already updated the patch for some issue you have found as well some others I have found.

(In reply to Vladan Djeric (:vladan) from comment #46)
> The documentation wiki page is looking good :) 
> 
> Btw, you said in the wiki page that "according to Telemetry, reads from the
> database are very fast". The LOCALDOMSTORAGE_FETCH_DOMAIN_MS histogram
> indeed looks very good but you have to take into account that this probe is
> timing scope fetches even when the scope has *zero* data in storage. The
> "Slow SQL" section on the Metrics site shows that the SQL for fetching LS
> scope data is still in the Top 15 most common SQL statements taking more
> than 150ms. Your implementation is going to be no worse than the current
> one, but we should still remain vigilant about doing extra I/O on reads.

This is interesting info.  Locally I never get times more then 20ms.  Sometimes content is faster even for such short time.

I want to let the initial implementation simple but some optimization will be in place.  Telemetry will tell.

> 
> > LocalStorage instance is held by the windows (nsGlobalWindow).  It lives a
> > long time while a tab with the content is open.  Then GC gives us some time
> > after the tab gets closed.  This needs testing, manual testing with watching
> > logs or traces.
> 
> Ok. One simple approach would be to use a "grace period" of X seconds after
> the scope's last tab is closed before evicting its DOMStorageCache.

I had to introduce a timer for holding the cache alive, so I can easily turn that to a keep-alive functionality when released by a window.

> 
> > > 2) If a user restores a session but he is using the default "Don't load tabs
> > > until selected" behavior, will the tabs' LS data be prefetched on session
> > > restore? I suspect the answer is no.
> >
> > No, it will come from session store.  I will check on this specifically.
> 
> What did you find out?

Ah, I forgot about this one.  I'll investigate before submit another version.

> 
> > > Comment that you're not doing "per site" quota enforcement in this file.
> >
> > But I do.
> 
> By "per site", I meant eTLD+1 (as opposed to "per origin")

Now I understand.

> 
> > Try: https://tbpl.mozilla.org/?tree=Try&rev=1b0439614815
> 
> 1) I took a look at the Try logs for "Mac OS X Lion 10.7 debug" MochiTest-1.
> It looks like a few of the NS_ENSURE_SUCCESS calls in DOMStorage failed +
> it's complaining about leaks. Do you know what these are about?

I'll check those ensures.

The leaks seems to be unrelated, but I'll try to find out.

> 2) I'd also be curious to know the results of Talos tests for startup &
> shutdown numbers.

There are try runs with talos, I'll find them and put here if not that old.  Next try can be with -t all.

> Question: Is this race possible?

Yes, good catch, and it has already been fixed in the updated patch.

> >+nsresult
> >+DOMStorageCache::Clear(DOMStorage* storage)
> >+{
> >+  Telemetry::AutoTimer<Telemetry::LOCALDOMSTORAGE_REMOVEALL_MS> autoTimer;
> >+
> >+  bool refresh = false;
> >+  if (Persist(storage)) {
> >+    WaitForPreload(Telemetry::LOCALDOMSTORAGE_REMOVEALL_BLOCK_MS);
> 
> Why do we need to wait for preload to clear a scope?

I had a comment here (not sure where it's gone).  I'll re-add.

> 
> >+void
> >+DOMStorageCache::LoadDone(nsresult rv)
> >+{
> >+  MonitorAutoLock monitor(mMonitor);
> >+  mLoaded = true;
> >+  monitor.Notify();
> >+}
> 
> Do you get a compiler warning for the unused parameter?

Good catch.  On windows?  Hardly or lost among tun of others...  However, this build on mac where warns in this module are fatal...

> "final-ui-startup" seems too early. Instead, we could have preload turned
> off at startup so that we don't initialize the DB until either first use
> (e.g. google.com homepage) or 5 seconds after
> "sessionstore-windows-restored" (whichever ends up happening first).

I'll try this one.  Seems like you've already done the research here.

> What would happen if there was an existing key "foo" in the DB and the
> memory cache, and the web page made these calls:
> 
> localStorage.removeItem("foo"); // taskRemoveItem is enqueued in mUpdates
> localStorage.setItem("foo", "bar") // overwrite taskRemoveItem with
> taskAddItem 
> localStorage.removeItem("foo"); // nothing gets changed on disk?

Very good catch.  Yes, the optimization is wrong for this case, I'll introduce a test to verify.  I may need to remove this logic...

> I don't think we can avoid flushing old changes if we want to flush new
> ones. I think we should just stop flushing in this case but also set a
> system-wide error flag so we error out on all future write attempts.
> 
> Essentially, I believe "data loss + exposing error conditions to clients" is
> preferable to "data loss + risk of inconsistency".

Agree.

Do you have experience with e.g. for a short time non-responding network disks or web storages that may start reacting after some short time again?  We may be prepared for this case.  The retry task capacity may be better based on time then on number.
(In reply to Vladan Djeric (:vladan) from comment #46)
> 1) I took a look at the Try logs for "Mac OS X Lion 10.7 debug" MochiTest-1.
> It looks like a few of the NS_ENSURE_SUCCESS calls in DOMStorage failed +
> it's complaining about leaks. Do you know what these are about?

NS_ENSURES are copied from the original code.  I may however call them from some new places that invoke these failures.  Anyway, they should not be logged.

According leaks, it was caused by some other changeset, different run based on the same parent changeset has the same leaks: https://tbpl.mozilla.org/?tree=Try&rev=9bb94815661b
(In reply to Vladan Djeric (:vladan) from comment #46)
> > > 2) If a user restores a session but he is using the default "Don't load tabs
> > > until selected" behavior, will the tabs' LS data be prefetched on session
> > > restore? I suspect the answer is no.
> >
> > No, it will come from session store.  I will check on this specifically.
> 
> What did you find out?

I misinterpreted this for sessionStorage.  localStorage is not saved in session store.  However, the cache is not preloaded (even created) until user clicks the tab.  Only then the window is assigned a document and preload starts.  If we find it useful to preload sooner, it can be added as a followup fix in a different bug I think.

> It looks like you've changed the semantics for GetKeys. We used to return a
> zero-element array when the LS was empty and now we return a nullptr. Did
> you check that this can't cause a null pointer exception?

I've changed this to always return an object - simplest and safest solution.  However, in some followup bug we should change signature of that function to return nsresult and return the new object as an out arg.

> 5 seconds after "sessionstore-windows-restored".

Why 5 seconds?  Why not 1 or 2 only?
(In reply to Honza Bambas (:mayhemer) from comment #47)
> Do you have experience with e.g. for a short time non-responding network disks 
> or web storages that may start reacting after some short time again?  We may be 
> prepared for this case.  The retry task capacity may be better based on time 
> then on number.

No, I haven't tested this scenario outside of a debugger, i.e. breakpoints, 
changing return values, etc

(In reply to Honza Bambas (:mayhemer) from comment #48)
> NS_ENSURES are copied from the original code.  I may however call them from
> some new places that invoke these failures.  Anyway, they should not be
> logged.

Personally, I'm not opposed to NS_ENSURE as long they're used in such way that they only log during a truly unexpected condition.

(In reply to Honza Bambas (:mayhemer) from comment #49)
> Why 5 seconds?  Why not 1 or 2 only?

It's just a a rough estimate, it all depends on when we think that data will be needed. I want to give the user a chance to interact with the browser after startup before doing any additional initialization that could risk I/O contention leading to UI pauses ("jank"). On the other hand, delaying initialization too long would lead to blocking on first use or unnecessary prefetching if the user starts visiting new sites and we don't have the list of LS domains.

5 seconds seemed roughly like the amount of time before a user types in a URL or clicks on a link on his homepage that could lead to a localStorage page. However if about:home, facebook.com and google.com are the most common homepages/pinned tabs (we can ask Marketing) and they all use localStorage (about:home will get fixed soon?), then we might as well kick off LS initialization ASAP.. maybe even at "first-paint"
(In reply to Vladan Djeric (:vladan) from comment #50)
> (In reply to Honza Bambas (:mayhemer) from comment #49)
> > Why 5 seconds?  Why not 1 or 2 only?
> 
> It's just a a rough estimate, it all depends on when we think that data will
> be needed. I want to give the user a chance to interact with the browser
> after startup before doing any additional initialization that could risk I/O
> contention leading to UI pauses ("jank"). On the other hand, delaying
> initialization too long would lead to blocking on first use or unnecessary
> prefetching if the user starts visiting new sites and we don't have the list
> of LS domains.
> 
> 5 seconds seemed roughly like the amount of time before a user types in a
> URL or clicks on a link on his homepage that could lead to a localStorage
> page. However if about:home, facebook.com and google.com are the most common
> homepages/pinned tabs (we can ask Marketing) and they all use localStorage
> (about:home will get fixed soon?), then we might as well kick off LS
> initialization ASAP.. maybe even at "first-paint"

After some testing I decided to use "before-first-paint" and trigger 0ms delayed timer from it to start the DB.
(In reply to Honza Bambas (:mayhemer) from comment #51)
> After some testing I decided to use "before-first-paint" and trigger 0ms
> delayed timer from it to start the DB.

If we initialize LocalStorage that early, that means we'll *always* pay the I/O cost on startup, even when there is no need. We're already reading ~35MB of data on startup and any extra I/O could create delays in first paint. We can fix about:home to not use LS relatively easily and then I think the trade-off of extra I/O before first-paint vs reduced risk of main thread I/O becomes contentious.

In any case, I'd be curious to see if Talos startup numbers are affected by this decision.
Also, please share your test numbers :)
Attached patch v2.1 interdiff [FOR REFERENCE] (obsolete) — Splinter Review
Now requesting just feedback.  I will update again with few changes (listed at the bottom) soon anyway.  I believe you are most interested in talos results:

https://tbpl.mozilla.org/?tree=Try&rev=86f7c68cb6f6.  The patch uploaded to try is missing few features (write error propagation to DOM, wait for preload when first filling session only data)


The patch does:
- corrected constness, mainly of DOMStorage and its usage
- comment 44
- storage referencing is now different (see the latest version of the docs for that please)
  - referencing by tasks also during their execution gets rid of the race condition you found
- telemetry probe names updated
- fixed the remove/set/remove optimization by changing taskAdd to taskUpdate when taskRemove is pending for a key
- there is a potential for a rare race condition: when a child process throws its cache away and tries to preload it again sooner then a flush for it is done on the parent process, it would load inconsistent (same as the single process race condition, but harder to solve in IPC model)
  - I fixed that by forcing flush before preload when there is a pending update operation for the scope (cannot happen in a single process model, and extremely rarely in IPC model)
- fixed CloneFrom quota update bug
- fixed mLoadResult = rv in LoadDone
- measuring time spent by initiation of the database including the time needed to load the has-data scopes
- when 1000 update operation fail to flush, we stop accepting writes to the database
- caches in DB thread *parent* are no more being held but only used as single operation helper classes (always new for each thread->Async*() call) ; I realized there is no need to keep the 
mLoaded flag and it just makes the code more complicated
  - thanks that in DOMStorageDBParent::RecvPreload I cannot stop potentially running preload but DOMStorageDBParent::RecvPreload will be removed anyway
- removed unnecessary ENSUREs
- starting the DB in "before-first-paint" ; not starting it when instantiating localStorage manager (was too soon and not necessary)



The following changes that still has to be done:
- propagate write errors up to DOM in IPC
- remove need for sync load in IPC model based on discussion in bug 627635, this will simplify the code as well (maybe in a separate bug rather)
- coalescing of operations on child to prevent sending of large data when modifying just a single key too often
Attachment #701130 - Flags: feedback?(vdjeric)
I'm very busy this week with several performance investigations so I likely won't be able to get you a review until early next week.
(In reply to Vladan Djeric (:vladan) from comment #55)
> I'm very busy this week with several performance investigations so I likely
> won't be able to get you a review until early next week.

No problem.  I will try to update the patch be actually complete and give you a new patch for full review this time.
(In reply to Honza Bambas (:mayhemer) from comment #54)
> https://tbpl.mozilla.org/?tree=Try&rev=86f7c68cb6f6.

If I read the talos numbers correctly, then I see no impact (I assume small reasonable jitter) on ts_paint and tpaint, as well as on dromeo tests and tp5* tests.
No longer depends on: 622112
Blocks: 832540
Blocks: 832541
Blocks: 832544
No longer blocks: 823720
Blocks: 834301
Attached patch v2.1 [Complete patch] (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=27a6630bcafc
Attachment #695185 - Attachment is obsolete: true
Attachment #695185 - Flags: superreview?(jst)
Attachment #707084 - Flags: superreview?(ehsan)
Attached patch v2.1 interdiff [FOR REFERENCE] (obsolete) — Splinter Review
The patch introduces:
- corrected constness, mainly of DOMStorage and its usage
- comment 44
- storage referencing is now different (see the latest version of the docs for that please)
  - referencing by tasks also during their execution gets rid of the race condition you've found
- telemetry probe names updated
- fixed the remove/set/remove optimization by changing taskAdd to taskUpdate when taskRemove is pending for a key
- there is a potential for a rare race condition: when a child process throws its cache away and tries to preload it again sooner then a flush for it is done on the parent process, it would load inconsistent (same as the single process race condition, but harder to solve in IPC model)
  - I fixed that by forcing flush before preload when there is a pending update operation for the scope (cannot happen in a single process model, and extremely rarely in IPC model)
- fixed CloneFrom quota update bug
- fixed mLoadResult = rv in LoadDone
- measuring time spent by initiation of the database including the time needed to load the has-data scopes
- when 1000 update operation fail to flush, we stop accepting writes to the database
- propagate write errors up to DOM in IPC
- caches in DB thread *parent* are no more being held but only used as single operation helper classes (always new for each thread->Async*() call) ; I realized there is no need to keep the 
mLoaded flag and it just makes the code more complicated
  - thanks that in DOMStorageDBParent::RecvPreload I cannot stop potentially running preload but DOMStorageDBParent::RecvPreload will be removed anyway
- removed unnecessary ENSUREs
- starting the DB in "before-first-paint" ; not starting it when instantiating localStorage manager (was too soon and not necessary)
- introduced new deterministic test to check for all corner cases with concurrent flushes/preloads for a single origin (scope)
  - also code has been slightly instrumented to support this test (observer notifications)

My only concern is intermittent failure of "case 3" in the new automated test.  It might be that WAL-enabled connection cannot be used to UPDATE on one thread and immediately after to SELECT on the main (or any other) thread.

I'm adding that particular test case just because I cannot reproduce it recently.


However, that test case could be considered as invalid and we may just want to remove it from the test at all.  I have an idea to make completely moot, can be done in a low-priority followup bug.

There are followup bugs to do in the blocks list, like more optimizations and some coding improvements. But in this state the patch is OK to land as is now.
Attachment #707084 - Attachment is obsolete: true
Attachment #707084 - Flags: superreview?(ehsan)
Attachment #707085 - Flags: review?(vdjeric)
Attached patch v2.1 [part5] - changes to tests (obsolete) — Splinter Review
- changes to existing tests
- introduced new deterministic test for database operations
- instrumentation code to allow the deterministic test
Attachment #701130 - Attachment is obsolete: true
Attachment #701130 - Flags: feedback?(vdjeric)
Attachment #707087 - Flags: review?(vdjeric)
Attachment #707084 - Attachment is obsolete: false
Attachment #695191 - Attachment is obsolete: true
Blocks: 748620
Blocks: 836039
Blocks: 836394
Comment on attachment 707085 [details] [diff] [review]
v2.1 interdiff [FOR REFERENCE]

General performance questions:

- Can you share any performance data you've gathered + any other insights you've gained, e.g. the fraction of "beneficial" pre-fetches during a regular browsing session, trade-offs of initializing the DB at various points during startup, profiles of Firefox startup before & after the patch, etc.

- Can you do "before & after" try runs of the Talos "o" & "tp" suites with the latest patch? For the "before" data, submit the parent changeset as a try build. For some reason, there are differences between try builds and regular mozilla-inbound builds for the same rev :( Also, please request 5 instances of each of the two Talos tests for each build (to help with noise)

- The Talos numbers will tell us a bit about the impact on startup & shutdown, but the pages in the Talos page-set use very little LocalStorage. If you have time, it would be great if you could also post timings from a couple of cold & warm startups using your own PC + your real Firefox profile. The Firefox profiles used in Talos tests have empty webappstore.sqlite DBs which is not very realistic. Please use a mechanical hard drive for the test if you have one. I can help you with setting up the test and getting the timings out of about:telemetry.

- I've noticed that LocalStorage scope fetches take slightly longer with WAL than with a rollback journal. This makes sense since reads now have also have to read the WAL. However, we can reduce this regression by more frequently checkpointing the write-ahead. We could investigate this in a follow up bug


>+  virtual DOMStorageManager* GetManager() const { return mManager; }
>+  virtual const DOMStorageCache * GetCache() const { return mCache; }

Typo: consistent spacing for pointer returns

> bool
> DOMStorageCache::ProcessUsageDelta(uint32_t setIndex, const int64_t delta)
> {
[..]
>+  // Keep the preloaded cache alive for a time
>+  KeepAlive();
>+
>   sDatabase->AsyncPreload(this);
> }

What happens if the pre-load takes longer than 60 seconds? e.g. if the computer goes to sleep with the preload request outstanding. Could it cause the DOMStorageCache to be accessed from the helper thread after DOMStorageCache has been freed by the timer callback? If so, you might want to de-allocate the caches in the DOMStorageDBThread loop instead.

>+  nsRefPtr<DOMStorageCacheHolder> observer = new DOMStorageCacheHolder(this);
>+  timer->Init(observer, 60000, nsITimer::TYPE_ONE_SHOT);

- Have you talked to mobile people about increased memory use (e.g. from pre-fetching)?
- Just as an aside, this 60s delay might cause "regressions" in the areweslimyet.com benchmark since it measures memory use 30seconds after navigating through the Tp5 pageset

>     if (DOMStringIsNull(old))
>-      sDatabase->AsyncAddItem(this, key, value);
>+      return sDatabase->AsyncAddItem(this, key, value);
>     else
>-      sDatabase->AsyncUpdateItem(this, key, value);
>+      return sDatabase->AsyncUpdateItem(this, key, value);

I think style guide now requires braces around these control statements

>+bool
>+DOMStorageDBThread::PendingTasks::CheckTaskTransit(Task* newTask,
>+                                                   Task::TaskType pendingType,
>+                                                   Task::TaskType newType)
>+{

This helper method prevents code duplication, but it's a bit hard to understand what it does on its own (despite the documentation in the header). It might be clearer if the "newTask->type != newType" check was done by the callers, or perhaps if the method was renamed to "CheckForOptimization" or "CheckForCoalesceOpportunity" or something more descriptive.

>+    NS_WARNING("Flush operation on localStorage database failed");
>+
>+    return mExecList.Length() < 1000;

So localStorage will be deemed "failed" after we reach a backlog of 1000 operations? Would it be more meaningful to express this threshold in terms of "# of failed consecutive flushes" instead of the size of the backlog? It seems unlikely that anyone will ever reach the 1000 backlogged DB operations limit + we could end up in a state where the storage system is AWOL (e.g. USB stick) but we keep trying to flush the same 5 or 10 operations forever.

>-DOMStorageDBParent::PutCache(const nsACString &scope)
>+DOMStorageDBParent::Cache(const nsACString &scope)

Nit: I think methods that start with verbs are a bit clearer

>+// For now we want to start the database immediately
>+// in the first before-first-paint.
>+static const char kStartupTopic[] = "before-first-paint";
>+static const uint32_t kStartupDelay = 0;

So if I'm understanding correctly, you're setting up the timer on the before-first-paint event, then presumably the first paint happens, then the timer callback spawns the helper thread, and the helper thread opens the DB. Did you evaluate the impact of this change on the time required to reach the "session-restored" milestone?

>+  "LOCALDOMSTORAGE_INIT_DATABASE_MS": {
>+    "kind": "exponential",
>+    "high": "3000",
>+    "n_buckets": 10,
>+    "description": "Time to open the localStorage database (ms)"
>+  },

Can you also add a probe for shutdown time? i.e. time to PR_JoinThread. It will help us catch any unexpected shutdown behavior.
Attachment #707085 - Flags: review?(vdjeric)
Comment on attachment 707087 [details] [diff] [review]
v2.1 [part5] - changes to tests

Why did you remove the frameBug624047 regression test?
Attachment #707087 - Flags: review?(vdjeric)
(In reply to Vladan Djeric (:vladan) from comment #64)
> >+  virtual DOMStorageManager* GetManager() const { return mManager; }
> >+  virtual const DOMStorageCache * GetCache() const { return mCache; }
> 
> Typo: consistent spacing for pointer returns

And * should go to the left :)

> >     if (DOMStringIsNull(old))
> >-      sDatabase->AsyncAddItem(this, key, value);
> >+      return sDatabase->AsyncAddItem(this, key, value);
> >     else
> >-      sDatabase->AsyncUpdateItem(this, key, value);
> >+      return sDatabase->AsyncUpdateItem(this, key, value);
> 
> I think style guide now requires braces around these control statements

Yep. Also, no else after a return.
I think we should find a second reviewer for this patch before landing. The patch is quite sizeable (400KB, 27 new classes) and a second pair of eyes would help catch things I've missed + fill in the gaps where I lack experience (e.g. the IPC code).

If Olli has the time, he might be a good choice since he's module peer + he already has some of the context from reviewing my patch in bug 807021.
I've been testing the per-window private browsing changes.  I raised bug 837091 related to cookie changed notifications from private browsing windows being received in non-private browsing windows, and now I'm seeing the same sort of thing with DOM storage.

If I open a particular document in both a private browsing window and a normal window, then one of them puts something in local storage, the other one receives a storage event.  I only just found this because I was being bamboozled by the fact that the document making the local storage change doesn't get the event.

So a bit of a privacy issue.  Also a nightmare to use the events since the storage area in the event is not the storage area in the window receiving it.
I also came across an issue related to permissions, something that has changed in the rewrite but I think was correct before.

DOM storage is driven off the cookie permissions.  I set cookie permissions to "session only", cookies to be discarded at the end of the session.  Prior to this bug, this would result in local storage being written normally and then wiped at the end of the session.  Presumably never committed to disk?  The behaviour now appears to be that local storage is never written when cookie permissions are to discard at the end of the session.  I don't seem to get any errors, but writing an item to local storage leaves me with a local storage object of zero length.
Ian, to confirm, are you testing the patches attached here and reporting issues caused by them?
(In reply to Josh Matthews [:jdm] from comment #70)
> Ian, to confirm, are you testing the patches attached here and reporting
> issues caused by them?

Yes, I'm running a try build from a few days ago.  Seems silly to raise bugs elsewhere on code that isn't even reviewed yet, although the private browsing event issue is not new.
Thanks for clearing that up. Could you file the storage event issue as a new bug? I'll need to fix that for FF 20, so it requires its own space.
Raised bug 838541 for storage events firing across the private browsing boundary.
(In reply to Ian Nartowicz from comment #73)
> Raised bug 838541 for storage events firing across the private browsing
> boundary.

Thanks!
Sorry, I think I was wrong about the permissions.  Testing with various versions from FF4 to FF17ESR, it seems that local storage is not allowed when cookie permissions are set to session only.  The behaviour with this patch seems to match.  I'm still a little unsure because I have code from a couple of years back that says different, but this is what I see today.  Is it correct though?  Seems that ideally local storage would be allowed in such situation, just like cookies are allowed, but wiped at the end of the session.

However, session storage was allowed in those old versions even when cookie permissions were set to session only.  Makes sense I guess.  With this patch, I don't seem to be able to write to session storage unless the cookie permissions are set to allow all instead of allow for the session.
I'd expect localStorage in private browsing to behave just like sessionStorage, bascially. ;-)
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #76)
> I'd expect localStorage in private browsing to behave just like
> sessionStorage, bascially. ;-)

To clarify, I was talking about storage permissions generally and not just in private browsing.  In private browsing, the behaviour is basically the same, except that everything is thrown away when the session ends.  Or now probably when the last private browsing window closes?  Edit: see next paragraph.

I just did some more testing, all with cookie permissions set to session only, and the plot thickens.  What I see in my try build is that normal browsing windows behave as described in comment 75: DOM storage is denied.  However, in private browsing mode, with the same cookie permissions, DOM storage is allowed.  Obviously in a private browsing window all the storage items will be discarded at the end of the session so this may be intended behaviour.  However allowing DOM storage items to be written, even temporarily, in a private browsing window but not in a normal window is probably not right.

I do hope my testing is correct.  For so long it has been hard to reliably spot storage items from chrome, and I still distrust what my eyes tell me :)
Ian, you cannot access sessionStorage when cookies are set to session-only?  With the current code or with this new patch?

Now I can see we only have test for localStorage behavior in session-only mode.  I will introduce a test for sessionStorage as well and check.

This may be a separate bug, long lasting.  I'll report asap my build env is back on line (should be soon).
(In reply to Vladan Djeric (:vladan) from comment #64)
> Comment on attachment 707085 [details] [diff] [review]
> v2.1 interdiff [FOR REFERENCE]
> 
> General performance questions:
> 
> - Can you share any performance data you've gathered + any other insights
> you've gained, e.g. the fraction of "beneficial" pre-fetches during a
> regular browsing session, trade-offs of initializing the DB at various
> points during startup, profiles of Firefox startup before & after the patch,
> etc.

Regarding startup: I was trying to find a place to open the DB as late as possible before first real demand for localstorage.  Doing it sooner caused to preload before even UI showed up and could potentially block other loading ops.  Doing it later was useless since about:home already demanded localstorage.  After usage of localStorage is removed from about:home, we can try moving forward, but I suspect when e.g. google.com is a homepage we are at the same state.

On the other hand, I'm currently testing only on my older mac with mechanical hdd, since nightly with telemetry on has serious troubles for me on win (bug 805817).  So I cannot collect data there.

Usually significant number (even 100%) of preloads happens before first localstorage access.  The blocking it self is extremely short (milliseconds top, or even unmeasurable, 0ms).

Opening database ones took 250ms, but it could be just because of I was building Firefox in the background.  I tried to restart few times when being "resource-free" and the time is usually about 5-10ms (quit good!).  But this is a hot start, so things are probably cached by the system.

> 
> - Can you do "before & after" try runs of the Talos "o" & "tp" suites with
> the latest patch? For the "before" data, submit the parent changeset as a
> try build. For some reason, there are differences between try builds and
> regular mozilla-inbound builds for the same rev :( 

PGO?

> Also, please request 5
> instances of each of the two Talos tests for each build (to help with noise)

I already did this compare, but didn't share it.  I didn't see any changes (at least significant) in the numbers.  I will do this again after update and share.

> 
> - The Talos numbers will tell us a bit about the impact on startup &
> shutdown, but the pages in the Talos page-set use very little LocalStorage.
> If you have time, it would be great if you could also post timings from a
> couple of cold & warm startups using your own PC + your real Firefox
> profile. The Firefox profiles used in Talos tests have empty
> webappstore.sqlite DBs which is not very realistic. Please use a mechanical
> hard drive for the test if you have one. I can help you with setting up the
> test and getting the timings out of about:telemetry.

Warm starts: 5-10ms.
Cold starts (first nightly start after system restart):
- when starting nightly along with other apps very very soon after desktop allows it and some 4 apps are starting concurrently: 9700ms, whole nightly start takes... ages :)
- after all apps are up (skype, itunes, vidyo): 70-350ms, nightly starts very acceptably fast

Hard to say from these test what influence on startup time this has.  Also remember the thread has lower priority.  On win we can turn it to even have low i/o prio.

> What happens if the pre-load takes longer than 60 seconds? e.g. if the
> computer goes to sleep with the preload request outstanding. Could it cause
> the DOMStorageCache to be accessed from the helper thread after
> DOMStorageCache has been freed by the timer callback? If so, you might want
> to de-allocate the caches in the DOMStorageDBThread loop instead.

I more think we should KeepAlive the cache when the load is done.  For the time of the preload cache is referenced (and thus kept alive) by the bck thread.

> 
> >+  nsRefPtr<DOMStorageCacheHolder> observer = new DOMStorageCacheHolder(this);
> >+  timer->Init(observer, 60000, nsITimer::TYPE_ONE_SHOT);
> 
> - Have you talked to mobile people about increased memory use (e.g. from
> pre-fetching)?
> - Just as an aside, this 60s delay might cause "regressions" in the
> areweslimyet.com benchmark since it measures memory use 30seconds after
> navigating through the Tp5 pageset

I didn't talk to them, this is an interesting point.  We may even want to shorten the time to say 20s.  It's hard to say when a page is about to call window.localStorage to grab and and keep the object for life time of the window.  I assume it's going to be soon.

> >+bool
> >+DOMStorageDBThread::PendingTasks::CheckTaskTransit(Task* newTask,
> >+                                                   Task::TaskType pendingType,
> >+                                                   Task::TaskType newType)
> >+{
> 
> This helper method prevents code duplication, but it's a bit hard to
> understand what it does on its own (despite the documentation in the
> header). It might be clearer if the "newTask->type != newType" check was
> done by the callers, or perhaps if the method was renamed to
> "CheckForOptimization" or "CheckForCoalesceOpportunity" or something more
> descriptive.

I like CheckForCoalesceOpportunity.  Also, bug 832540 will remove this completely.  I will use technique of coalescing closer to your current code.

> 
> >+    NS_WARNING("Flush operation on localStorage database failed");
> >+
> >+    return mExecList.Length() < 1000;
> 
> So localStorage will be deemed "failed" after we reach a backlog of 1000
> operations? Would it be more meaningful to express this threshold in terms
> of "# of failed consecutive flushes" instead of the size of the backlog? It
> seems unlikely that anyone will ever reach the 1000 backlogged DB operations
> limit + we could end up in a state where the storage system is AWOL (e.g.
> USB stick) but we keep trying to flush the same 5 or 10 operations forever.

Agree, this was just a first idea of how to impl the limit.  Also, I was testing with an sd card, a usb stick and a net drive.  After hard (non-gracefull) removal or NIC shutdown, sqlite and sys buffers ate the changes as "flushed".  After replugging the card/stick or turning NIC up again writes were failing anyway, so retry is IMO not possible.  This is how win7 behaves.

But trying say 5 flushes tops is a good idea.

> >+// For now we want to start the database immediately
> >+// in the first before-first-paint.
> >+static const char kStartupTopic[] = "before-first-paint";
> >+static const uint32_t kStartupDelay = 0;
> 
> So if I'm understanding correctly, you're setting up the timer on the
> before-first-paint event, then presumably the first paint happens, then the
> timer callback spawns the helper thread, and the helper thread opens the DB.
> Did you evaluate the impact of this change on the time required to reach the
> "session-restored" milestone?

No, I think.  Will do that.

> 
> >+  "LOCALDOMSTORAGE_INIT_DATABASE_MS": {
> >+    "kind": "exponential",
> >+    "high": "3000",
> >+    "n_buckets": 10,
> >+    "description": "Time to open the localStorage database (ms)"
> >+  },
> 
> Can you also add a probe for shutdown time? i.e. time to PR_JoinThread. It
> will help us catch any unexpected shutdown behavior.

I will also check how the exit(0) work can influence this.


(In reply to Vladan Djeric (:vladan) from comment #65)
> Comment on attachment 707087 [details] [diff] [review]
> v2.1 [part5] - changes to tests
> 
> Why did you remove the frameBug624047 regression test?

bug 624047 test and all its parts were removed since those were strictly implementation specific test that took 15+s to execute.  Now we have test_bug600307-DBOps.html that does much more in a fully deterministic way and in few hundreds of milliseconds only.
(In reply to Honza Bambas (:mayhemer) from comment #78)
> Ian, you cannot access sessionStorage when cookies are set to session-only? 
> With the current code or with this new patch?
> 
> Now I can see we only have test for localStorage behavior in session-only
> mode.  I will introduce a test for sessionStorage as well and check.
> 
> This may be a separate bug, long lasting.  I'll report asap my build env is
> back on line (should be soon).

So, I've adapted test_cookieSession-phase1.html, test_cookieSession-phase2.html and test_localStorageBaseSessionOnly.html to work with sessionStorage.

test_cookieSession fails for the current code, but not that accessing sessionStorage would throw an exception, however items are not correctly removed (reverted) when resetting the cookie behavior back to default.  With the patch for this bug (the new DOM storage code) the test behaves as expected.

test_sessionStorageBaseSessionOnly works correctly with the current code and with the new code as well.

Try run for the current (faulty) code: https://tbpl.mozilla.org/?tree=Try&rev=8c91901d9a60
This is all very confusing.  Some more testing suggests that session storage can be set by content when the cookie permissions are session only, but my chrome javascript cannot see them (no private browsing here).  When cookies are fully allowed, chrome javascript can see them using window.content.sessionStorage.  It is as if chrome is being shown a different storage area when the cookie permissions (global or by site exception) are session only.  I've tried to confirm this with Firebug, but it uses an old workaround to find storage items and it seems to be buggy with the FF21 build.
(In reply to Ian Nartowicz from comment #81)

> When cookies are fully allowed, chrome javascript can see them using 
> window.content.sessionStorage

OK, and when cookies are session-only and you access window.content.sessionStorage, you don't get what you expect?  And, with the original (current) code OR with patch for this bug?
(In reply to Vladan Djeric (:vladan) from comment #64)
> So if I'm understanding correctly, you're setting up the timer on the
> before-first-paint event, then presumably the first paint happens, then the
> timer callback spawns the helper thread, and the helper thread opens the DB.
> Did you evaluate the impact of this change on the time required to reach the
> "session-restored" milestone?

As I'm thinking of this, I am not sure how to actually measure this.  Do you have an idea how to do this?  I'm also worried this needs some system cache flushing that is hard to do 100% well ; warm starts are very fast, I expect very low influence.  This makes automation hard.  Since the open happens on the background thread it will not block execution time of the main thread radically.  But it may influence I/O on the main thread happening during start.  That is also a reason I wanted to put this to before-first-paint, expecting that critical stuff has loaded that time.  I though don't have that good knowledge of how startup actually works.  So I'd like to ask for some help here.  Thanks.
(In reply to Honza Bambas (:mayhemer) from comment #82)
> (In reply to Ian Nartowicz from comment #81)
> 
> > When cookies are fully allowed, chrome javascript can see them using 
> > window.content.sessionStorage
> 
> OK, and when cookies are session-only and you access
> window.content.sessionStorage, you don't get what you expect?  And, with the
> original (current) code OR with patch for this bug?

Found it!  In canUseStorage, mIsSessionOnly is set for content based on the cookie permissions, but is not set for chrome because it bailed out earlier.  So chrome sees the default area, but content sees the session only area.  In private browsing windows, both content and chrome see the private browsing area and all is well.
(In reply to Honza Bambas (:mayhemer) from comment #78)
> This may be a separate bug, long lasting.  I'll report asap my build env is
> back on line (should be soon).

bug 365772 is the first mention I found of it.
bug 771890 is the most recent.
(In reply to Mardeg from comment #85)
> (In reply to Honza Bambas (:mayhemer) from comment #78)
> > This may be a separate bug, long lasting.  I'll report asap my build env is
> > back on line (should be soon).
> 
> bug 365772 is the first mention I found of it.
> bug 771890 is the most recent.

There are quite a few bugs related to DOM storage throwing or being inaccessible, most of them sorted by this rewrite.  These two are unrelated to the one I describe.  Some more are: bug 709238 and bug 710204.  All fixed by this rewrite so far as I can tell.

The particular bug I describe was effectively present in the original code, but couldn't possibly be observed because chrome code simply couldn't access sessionStorage and localStorage directly.  In fact because of the workarounds necessary to access DOM storage from chrome code, the previous versions (prior to this patch) worked better in the particular case of session only cookie permissions.
Vladan, please see comment 83.
Flags: needinfo?(vdjeric)
(In reply to Vladan Djeric (:vladan) from comment #64)
> So if I'm understanding correctly, you're setting up the timer on the
> before-first-paint event, then presumably the first paint happens, then the
> timer callback spawns the helper thread, and the helper thread opens the DB.
> Did you evaluate the impact of this change on the time required to reach the
> "session-restored" milestone?

The patches as are (timer(0) at before-first-paint) *do have impact*.

Tested with an optimized build.

On i7@4Ghz/ssd machine I can see 3ms average regression, 1ms mean value regression.  Stddev of samples w/ preload is 9.3ms, of samples w/o preload 18ms.  So hard to say.

On Core2@2Ghz/hdd old imac I can see 4.8ms average regression, 1.2 mean value gain (probably just bad set of values, I don't know whether mean value is proper for compare here).  Stddev of sets are 60.4 and 59ms.

Anyway, which ever average function I use (arithmetic, geometric, harmonic) I'm getting larger number in both scenarios when doing the preload.

Based on that, this seems not to be a good way to start the database.  Maybe the idea of the preload is bad at all.  If the session-restored point is a major milestone, we could fire the timer from there with some or zero delay.
(In reply to Honza Bambas (:mayhemer) from comment #83)
> As I'm thinking of this, I am not sure how to actually measure this.  Do you
> have an idea how to do this?

I don't know what a "typical session restore" page set would look like, so maybe the next best option would be to use pages from a browser benchmark? e.g. this Lifehacker browser benchmark I looked into recently: http://lifehacker.com/5575535/how-we-test-web-browsers

>  I'm also worried this needs some system cache
> flushing that is hard to do 100% well ; warm starts are very fast, I expect
> very low influence. 

Right. I would suggest testing cold & warm startups separately. You can test the cold startup (unprepared disk cache) by rebooting your computer. You should also probably wait for the other programs to stop using the hard drive (i.e. for the hard drive to stop whirring) before launching Firefox after a reboot.

> This makes automation hard.  

I think doing manual testing across 5 runs for each scenario is enough to answer our questions for now. We can try to draw more general conclusions from Telemetry field data.

> Since the open happens on
> the background thread it will not block execution time of the main thread
> radically.  But it may influence I/O on the main thread happening during
> start.  That is also a reason I wanted to put this to before-first-paint,
> expecting that critical stuff has loaded that time.

I think xul.dll, startup cache, extensions and a few other things will be in memory by the time "before-first-paint" fires, but various other Firefox components read in configuration/state from disk after "before-first-paint".
Flags: needinfo?(vdjeric)
Could somebody from the QA team do the manual warm/cold start up testing?


I'm just about to push to try a new version.  The start up is implemented as following:
- preloads for "about:blank" that would invoke early db start are disabled (hardcoded)
- I do: observe("sessionstore-windows-restored").timeout(0).startDB()

After the sessionstore-windows-restored point the database will start very soon anyway by showing about:home or a regular page from session restore.

I have an interesting data on this scenario, having an http:// homapage that has some localstorage data:

INIT: for scope 0x0036f0fc "knalb.:about" << until the home page loads, the tab origin is about:blank that is prevented from performing a preload and thus preventing initiation of the database
NOTIFY: 0x52954cd0 "content-document-global-created"
NOTIFY: 0x528f24b8 "before-first-paint"
NOTIFY: 0x0cfc2f90 "inner-window-destroyed"
NOTIFY: 0x528f24b8 "before-first-paint"
NOTIFY: 0x0d279668 "browser-window-before-show"
NOTIFY: 0x529244d8 "user-interaction-active"
NOTIFY: 0x52926c70 "ime-enabled-state-changed"
NOTIFY: 0x0d2c9e10 "StartDocumentLoad"
NOTIFY: 0x0cfd57b8 "FailDocumentLoad"
NOTIFY: 0x52821598 "http-on-opening-request"
NOTIFY: 0x0d5a4b48 "sessionstore-windows-restored"  << our milestone, so far unaffected
NOTIFY: 0x0d40be78 "sessionstore-state-write"
NOTIFY: 0x0d6c6080 "browser-delayed-startup-finished"
NOTIFY: 0x529bbbcc "xul-window-visible"
NOTIFY: 0x528f1380 "document-shown"
NOTIFY: 0x528f1380 "document-shown"
NOTIFY: 0x528f1380 "document-shown"
NOTIFY: 0x5282eef8 "cookie-db-read"
NOTIFY: 0x528215b0 "http-on-modify-request"  << http channel of the homepage opens
INIT: 	xul.dll!mozilla::dom::DOMStorageDBThread::Init()  << on main thread, we init the db
	xul.dll!mozilla::dom::DOMStorageCache::StartDatabase() 
	xul.dll!mozilla::dom::DOMStorageObserver::Observe()  << from the obsever
	xul.dll!nsTimerImpl::Fire() 
	xul.dll!nsTimerEvent::Run() 
NOTIFY: 0x528f1380 "document-shown"
NOTIFY: 0x0d75d668 "sessionstore-state-write-complete"
START: Function: mozilla::dom::DOMStorageDBThread::InitDatabase(void), Thread: 0xDC0 localStorage DB
NOTIFY: 0x52821604 "http-on-examine-cached-response"
END: Function: mozilla::dom::DOMStorageDBThread::InitDatabase(void), Thread: 0xDC0 localStorage DB << DB opening on the background thread is done
NOTIFY: 0x529148d8 "dom-window-destroyed"
INIT: for scope 0x0036f028 "7i.lacol.tset.:https:443"  << preloading the storage


NOTIFY are traces of ObserverService::Notify() calls.
INIT are logs of some calls of the DOMStorage code, I think self explanatory after some basic examination.

The point is that we init the database while the http channel of the homepage waits for the net or cache.  Which is IMO perfect timing.
QA: see comment 91.  (Honza, you might want to push a build to try so they can download it w/o having to build w/patch themselves?)
Keywords: qawanted
To QA people:

What we need to check on is how these patches influence cold and warm Firefox start up time (the patches opens localStorage database at some point during the start, what in the current Firefox code doesn't happen.)

- try builds with most recent patches at [1]
- to compare, at [2] will soon be optimized builds of the changeset the try run [1] is based on

[1] https://tbpl.mozilla.org/?tree=Try&rev=1326b9c22816
[2] https://tbpl.mozilla.org/?tree=Try&rev=1be53796f505

Thanks!
Honza: Would it make sense to remove all call-sites to the preloading code and check that in as a first step? Or does the patch change enough behavior that even that can change performance enough that we need to do the same types of tests?
(In reply to Jonas Sicking (:sicking) from comment #94)
> Honza: Would it make sense to remove all call-sites to the preloading code
> and check that in as a first step? Or does the patch change enough behavior
> that even that can change performance enough that we need to do the same
> types of tests?

Also might be worth checking, but...  If we remove the early database open (I think this covers yours "all call-sites to the preloading code") then the change is that localStorage for a page pre-loads on a background thread when the window gets the document.  In the current code localStorage opens the DB + loads the origin data when window.localStorage is first accessed by a content script, all done on the main thread.

Since we really are concern only about the start up time influence and the way we (pre)read data from the localStorage database in the new code cannot influence start up time milestones, I think it's more important to check against the current code then just against tweaked new code.

Testing whether page load time is better is another task, that can be automated by talos.
Attached patch v2.2 [Complete patch] (obsolete) — Splinter Review
- interdiff between 2.1 and 2.2 will be supplied as well
- patch parts follows as well
- recent jdm's changes to private event leaks have been merged to this patch

Try runs, including talos:
https://tbpl.mozilla.org/?tree=Try&rev=1326b9c22816
https://tbpl.mozilla.org/?tree=Try&rev=706446aa2fc2 (clobbered)
Attachment #695186 - Attachment is obsolete: true
Attachment #695187 - Attachment is obsolete: true
Attachment #695188 - Attachment is obsolete: true
Attachment #695189 - Attachment is obsolete: true
Attachment #695190 - Attachment is obsolete: true
Attachment #707084 - Attachment is obsolete: true
Attachment #707085 - Attachment is obsolete: true
Attachment #707087 - Attachment is obsolete: true
Attachment #714711 - Flags: review?(vdjeric)
Attachment #714711 - Flags: review?(bugs)
- white space ignoring patch to prevent bloat with * and & placement corrections
- removed use of nsIPrivacyTransitionObserver
- DOMStorageCache::Release called on a non-main thread calls Release directly when dispatch of release call to the main thread fails (it was causing shutdown leaks of caches)
- keeping the cache alive after it has been preloaded (instead after the object it self has been created)
- keeping alive only for 20 seconds (however, I checked with mobile people the timeout can be as long as we wish to)
- LOCALDOMSTORAGE_SHUTDOWN_DATABASE_MS telemetry introduced
- s/CheckTaskTransit/CheckForCoalesceOpportunity/
- DB switches to a failure state after 5 continuously failed flushes
- s/DOMStorageDBParent::Cache/DOMStorageDBParent::NewCache/
- preload for "about:blank" pages is hard-blocked when database is still not up (then its blocked by fact there are no data for about:blank scope to preload)
- localStorage database is opened as: observe("sessionstore-windows-restored").timeout(0).startDBthread() ; the DB thread then opens the DB (see comment 91 for more info)
So the complete patch contains all the changes the other non-obsolete patches have?
(In reply to Olli Pettay [:smaug] from comment #104)
> So the complete patch contains all the changes the other non-obsolete
> patches have?

Yes, you may chose to review the whole patch or each part separately.  If you combine par1 - part5 patches (they don't overlap, btw) you get the complete patch.
Comment on attachment 714711 [details] [diff] [review]
v2.2 [Complete patch]

:mak, adding you as a reviewer here, please check the sqlite parts.  It is all in just a single file - DOMStorageDBThread.cpp/.h.  Thanks.
Attachment #714711 - Flags: review?(mak77)
No longer blocks: 836039
Here is a summary of Talos runs from comment 93:

Values are medians rather then average.

System:      unpatched vs patched: -speedup/+slowdown

o/tscrollr_paint
Fedora32:    10055.6 vs 10055.5: -0.1
Fedora64:    10055.5 vs 10055.2: -0.3
OS X 10.8:   11308.8 vs 11308.4: -0.4
Windows 7:   10097.4 vs 10091.1: -6.3

tp/tp5n_paint
Fedora32:    353.9 vs 351.4: -2.5
Fedora64:    311.1 vs 310.2: -0.9
OS X 10.8:   170.4 vs 169.4: -1.0
Windows 7:   296.7 vs 296.3: -0.4

o/ts_paint
Fedora32:    770.4 vs 776.2: +5.8
Fedora64:    721.5 vs 723.5: +2.0
OS X 10.8:   664.2 vs 658.8: -5.4 (stdev unpatched: 13.9)
Windows 7:   805.3 vs 804.4: -0.9 (stdev unpatched: 32.3)

o/tpaint
Fedora32:    239.0 vs 239.0:  0.0
Fedora64:    225.5 vs 221.0: -4.5
OS X 10.8:   258.0 vs 250.0: -8.0 (stdev both ~10.5)
Windows 7:   235.0 vs 232.0: -3.0


Not sure what to think of ts_paint..
How stable are those numbers. I would expect to see differences bigger than that simply due to noise, but if you've retested multiple times and the numbers are consistently worse, then that might be a problem.
Comment on attachment 714711 [details] [diff] [review]
v2.2 [Complete patch]

Review of attachment 714711 [details] [diff] [review]:
-----------------------------------------------------------------

I just skimmed through the SQLite parts, so this is not to be considered a review for all of DOMStorageDBThread.cpp logic.

Apart some possible hints, I also have some questions, especially regarding the mixed usage of the same connection across 2 threads.

::: dom/src/storage/DOMStorageDBThread.cpp
@@ +44,5 @@
> +}
> +
> +nsresult
> +DOMStorageDBThread::Init()
> +{

You must get a handle mozIStorageService here while you are on main-thread, since in bug 836493 we are changing it to be like that and avoid race conditions, just get a handle so you are sure the service is up, you can do whatever you want later (there's no need to pass this handle down the chain)

@@ +107,5 @@
> +  }
> +
> +  // Bypass sync load when an update is pending in the queue to write, we would
> +  // get incosistent data in the cache
> +  if (mWALModeEnabled) {

Could you please better explain me why wal makes a difference here?

@@ +367,5 @@
> +    rv = service->OpenUnsharedDatabase(mDatabaseFile, getter_AddRefs(mConnection));
> +  }
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  mozStorageTransaction transaction(mConnection, false);

I there a specific reason you don't set journal and other eventual options before starting the migration?
To me looks like would be logically cleaner to TryJournalMode() just after opening.
As an additional hint, you may even try to use PRAGMA locking_mode = EXCLUSIVE; just before setting the WAL journal for the first time, that would make so we don't need a shared memory file (.shm) and speed up some of the wal internals (less locking). The downside is that it locks the database to your connection so nothing else can access it, I don't think we care about this since you appear to have just one connection.

@@ +369,5 @@
> +  NS_ENSURE_SUCCESS(rv, rv);
> +
> +  mozStorageTransaction transaction(mConnection, false);
> +
> +  // Ensure Gecko 1.9.1 storage table

Instead of doing these ensure queries every time, you may consider using Get/SetSchemaVersion, the first time you won't find the version and may set it to 0, do everything and bump it to 1, next times you will know you already prepared the database properly.
The logic we usually follow is:
if (schema == CURRENT)
  return;
if (schema < SCHEMA_N)
  upgrade_db_to_N()
if (schema < CURRENT_SCHEMA) {
  upgrade_db_to_CURRENT()
}
schema = CURRENT_SCHEMA;

this way downgrades set the number back and on upgrade it runs again the upgrade steps (this is the most common behavior, it enforces you to have backwards compatible upgrades that can be applied multiple times though, but that's not too hard, indeed your upgrade here has already that protection)
in the end you would save some cpu and IO cycles, that even if are on another thread, still exist (moreover you are executing an empty transaction out of wal mode here, that is basically a direct fsync call)

@@ +446,5 @@
> +                                    getter_AddRefs(stmt));
> +  NS_ENSURE_SUCCESS(rv, rv);
> +  mozStorageStatementScoper scope(stmt);
> +
> +  while (NS_SUCCEEDED(rv = stmt->ExecuteStep(&exists)) && exists) {

Don't think you need the rv assignment here

@@ +468,5 @@
> +    MOZ_STORAGE_UNIQUIFY_QUERY_STR "PRAGMA journal_mode = ");
> +  if (aIsWal)
> +    stmtString.AppendLiteral("wal");
> +  else
> +    stmtString.AppendLiteral("truncate");

why not directly passing the literal instead of an unreadable bool?

@@ +485,5 @@
> +  nsAutoCString journalMode;
> +  rv = stmt->GetUTF8String(0, journalMode);
> +  NS_ENSURE_SUCCESS(rv, rv);
> +  if ((aIsWal && !journalMode.EqualsLiteral("wal")) ||
> +      (!aIsWal && !journalMode.EqualsLiteral("truncate"))) {

and here you could just compare the literals

@@ +505,5 @@
> +    rv = SetJournalMode(false);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +  }
> +  else {
> +    mWALModeEnabled = true;

here you want to PRAGMA wal_autocheckpoint = N (pages), in Cookies/Places we use 512KB (N=16). The problem is that we changed the page_size from 1024 to 32768 some years ago, and I don't know if there could be any webappsstore.sqlite using the old page size in the wild... if so, you should first read page_size and do a calculation (wanted_wal_size / page_size).
The default wal size is 32MB, that means in this case there will be a single checkpoint at shutdown, and read performances may degrade with a large wal.  Likely not what we want.

Another useful pragma, for both the journal modes, is PRAGMA journal_size_limit = M, the journal always grows and is never truncated, this means that if you run a very large transaction it may become even bigger than the database itself. This makes so that if the journal goes over the specified size, it is truncated at the next checkpoint. I usually set M = 3N.

@@ +666,5 @@
> +    // Main thread synchronous load must use a different statememt otherwise
> +    // the thread-shared cached statement gets out of sync, e.g. mParams may
> +    // get flooded or nullified when accessed concurently.
> +    nsCOMPtr<mozIStorageStatement> stmt = thread->mStatements.GetCachedStatement(
> +      nsDependentCString(NS_IsMainThread() ? SYNCLOAD_STATEMENT : PRELOAD_STATEMENT)

I'm not sure what happens here, I though this was all running on a separate thread, are we using the same connection on two different threads? That looks a bit risky regarding thread contention.
When does this happen on mainthread and when on the other one? Is this just on startup before launching the async thread?

@@ +787,5 @@
> +  case taskClearMatchingScope:
> +  {
> +    nsCOMPtr<mozIStorageStatement> stmt = thread->mStatements.GetCachedStatement(
> +      "DELETE FROM webappsstore2"
> +      " WHERE scope GLOB :scope"

if you have examples of the scopes we may hit here, it's possible this could be made faster. GLOB is slow, and here looks like you are doing scope GLOB token*, that looks basically the same as "scope BETWEEN token AND token || X'FFFF'" (|| is not an OR, it's the SQLite string concatenation operator), that is _much_ faster if you have an index on scope.
Attachment #714711 - Flags: review?(mak77)
Comment on attachment 714722 [details] [diff] [review]
v2.2 [part1] - removal of current implementation files

(I couldn't find any use for nsIDOMStorageObsolete in addon's mxr)
rs=me for the removal assuming other patches look ok.
Obviously this can't land without rest of the patches.
Attachment #714722 - Flags: review+
Comment on attachment 714711 [details] [diff] [review]
v2.2 [Complete patch]

I'll be reviewing the partial patches
Attachment #714711 - Flags: review?(bugs)
Attachment #714723 - Flags: review?(bugs)
Attachment #714724 - Flags: review?(bugs)
Attachment #714725 - Flags: review?(bugs)
Attachment #714726 - Flags: review?(bugs)
Attachment #714727 - Flags: review?(bugs)
Comment on attachment 714723 [details] [diff] [review]
v2.2 [part2] - changes to existing interfaces

I'm a bit worried about changing nsIDOMStorageManager so heavily.
The change will break quite some addons, I believe.
Any chance to stay more backwards compatible?

( https://mxr.mozilla.org/addons/search?string=nsIDOMStorageManager )
Attachment #714723 - Flags: review?(bugs) → review-
It will break almost any addon that uses DOM storage.  It would break mine for sure except that I've already modified it.  nsIDocShell also changed, I think, for the session storage side of things.

However it is worth pointing out that DOM storage is currently broken in many ways for addons, so giving addons that use it a kick might be no bad thing once we have something that works.
Addons really shouldn't be using DOM storage but rather indexedDB. Though of course it's nice if we can give them a heads'up so that they have time to migrate.
A(In reply to Jonas Sicking (:sicking) from comment #114)
> Addons really shouldn't be using DOM storage but rather indexedDB. Though of
> course it's nice if we can give them a heads'up so that they have time to
> migrate.

I agree that addons shoudn't really be using DOM storage for their own storage requirements, but they certainly should be able to look at the storage items that have been set by pages.  Currently they have to jump through hoops to do this, using the interface methods that are being removed by this patch (very very few are doing this, it would be interesting to know just how many, only two that I know of).  Even then it doesn't always work.  When this bug is finished, addons will be able to access content.sessionStorage, etc. directly and it will work all the time.

Addons should also be able to modify DOM storage.  For example my addon allows some or all storage keys to be deleted.  And not just addons.  There are bugs to make Firefox itself display DOM storage in the same way that it displays cookies, which currently cannot be done from a script and would have to be written in C++.
Well, I hope there will be a decent management interface for DOM Storage in the way like we do have interfaces for cookies, passwords, content prefs, etc. - Even right now doing something like I do with Tahoe Data Manager sucks for all kinds of web storage, including DOM Storage, indexedDB, and appCache.
(In reply to Marco Bonardo [:mak] from comment #109)
> Comment on attachment 714711 [details] [diff] [review]
> v2.2 [Complete patch]
> 
> Review of attachment 714711 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I just skimmed through the SQLite parts, so this is not to be considered a
> review for all of DOMStorageDBThread.cpp logic.

Thanks.  This is great feedback.

> @@ +107,5 @@
> > +  }
> > +
> > +  // Bypass sync load when an update is pending in the queue to write, we would
> > +  // get incosistent data in the cache
> > +  if (mWALModeEnabled) {
> 
> Could you please better explain me why wal makes a difference here?

This is based on Vladan's work and claims.  WAL according him allows one thread to write and another to read (a different set of keys).  W/o WAL this may not be possible.

My knowledge of sqlite in this area is not that good, so this might be all wrong.

> 
> @@ +367,5 @@
> > +    rv = service->OpenUnsharedDatabase(mDatabaseFile, getter_AddRefs(mConnection));
> > +  }
> > +  NS_ENSURE_SUCCESS(rv, rv);
> > +
> > +  mozStorageTransaction transaction(mConnection, false);
> 
> I there a specific reason you don't set journal and other eventual options
> before starting the migration?
> To me looks like would be logically cleaner to TryJournalMode() just after
> opening.
> As an additional hint, you may even try to use PRAGMA locking_mode =
> EXCLUSIVE; just before setting the WAL journal for the first time, that
> would make so we don't need a shared memory file (.shm) and speed up some of
> the wal internals (less locking). The downside is that it locks the database
> to your connection so nothing else can access it, I don't think we care
> about this since you appear to have just one connection.

Re EXCLUSIVE locking: in debug mode, I'd rather not use it.  I like to watch changes in the database through e.g. sqlite3 shell.  In opt builds however, this may be good think to do.

> 
> @@ +369,5 @@
> > +  NS_ENSURE_SUCCESS(rv, rv);
> > +
> > +  mozStorageTransaction transaction(mConnection, false);
> > +
> > +  // Ensure Gecko 1.9.1 storage table
> 
> Instead of doing these ensure queries every time, you may consider using
> Get/SetSchemaVersion, the first time you won't find the version and may set
> it to 0, do everything and bump it to 1, next times you will know you
> already prepared the database properly.
> The logic we usually follow is:
> if (schema == CURRENT)
>   return;
> if (schema < SCHEMA_N)
>   upgrade_db_to_N()
> if (schema < CURRENT_SCHEMA) {
>   upgrade_db_to_CURRENT()
> }
> schema = CURRENT_SCHEMA;
> 
> this way downgrades set the number back and on upgrade it runs again the
> upgrade steps (this is the most common behavior, it enforces you to have
> backwards compatible upgrades that can be applied multiple times though, but
> that's not too hard, indeed your upgrade here has already that protection)
> in the end you would save some cpu and IO cycles, that even if are on
> another thread, still exist (moreover you are executing an empty transaction
> out of wal mode here, that is basically a direct fsync call)

I'll do this from now (increment version), but I'll leave the original code (which I'm not author of all of it!) as is.

> 
> @@ +446,5 @@
> > +                                    getter_AddRefs(stmt));
> > +  NS_ENSURE_SUCCESS(rv, rv);
> > +  mozStorageStatementScoper scope(stmt);
> > +
> > +  while (NS_SUCCEEDED(rv = stmt->ExecuteStep(&exists)) && exists) {
> 
> Don't think you need the rv assignment here
> 
> @@ +468,5 @@
> > +    MOZ_STORAGE_UNIQUIFY_QUERY_STR "PRAGMA journal_mode = ");
> > +  if (aIsWal)
> > +    stmtString.AppendLiteral("wal");
> > +  else
> > +    stmtString.AppendLiteral("truncate");
> 
> why not directly passing the literal instead of an unreadable bool?
> 
> @@ +485,5 @@
> > +  nsAutoCString journalMode;
> > +  rv = stmt->GetUTF8String(0, journalMode);
> > +  NS_ENSURE_SUCCESS(rv, rv);
> > +  if ((aIsWal && !journalMode.EqualsLiteral("wal")) ||
> > +      (!aIsWal && !journalMode.EqualsLiteral("truncate"))) {
> 
> and here you could just compare the literals

This is just took from vladan's patch.  Many of the code is.

> 
> @@ +505,5 @@
> > +    rv = SetJournalMode(false);
> > +    NS_ENSURE_SUCCESS(rv, rv);
> > +  }
> > +  else {
> > +    mWALModeEnabled = true;
> 
> here you want to PRAGMA wal_autocheckpoint = N (pages), in Cookies/Places we
> use 512KB (N=16). The problem is that we changed the page_size from 1024 to
> 32768 some years ago, and I don't know if there could be any
> webappsstore.sqlite using the old page size in the wild... if so, you should
> first read page_size and do a calculation (wanted_wal_size / page_size).
> The default wal size is 32MB, that means in this case there will be a single
> checkpoint at shutdown, and read performances may degrade with a large wal. 
> Likely not what we want.

Is this what bug 842852 deals with?

> 
> Another useful pragma, for both the journal modes, is PRAGMA
> journal_size_limit = M, the journal always grows and is never truncated,
> this means that if you run a very large transaction it may become even
> bigger than the database itself. This makes so that if the journal goes over
> the specified size, it is truncated at the next checkpoint. I usually set M
> = 3N.

Same?

> 
> @@ +666,5 @@
> > +    // Main thread synchronous load must use a different statememt otherwise
> > +    // the thread-shared cached statement gets out of sync, e.g. mParams may
> > +    // get flooded or nullified when accessed concurently.
> > +    nsCOMPtr<mozIStorageStatement> stmt = thread->mStatements.GetCachedStatement(
> > +      nsDependentCString(NS_IsMainThread() ? SYNCLOAD_STATEMENT : PRELOAD_STATEMENT)
> 
> I'm not sure what happens here, I though this was all running on a separate
> thread, are we using the same connection on two different threads? That
> looks a bit risky regarding thread contention.
> When does this happen on mainthread and when on the other one? Is this just
> on startup before launching the async thread?

A bug, next patch version will have two connections, main thread r/o and bck thread r/w

> 
> @@ +787,5 @@
> > +  case taskClearMatchingScope:
> > +  {
> > +    nsCOMPtr<mozIStorageStatement> stmt = thread->mStatements.GetCachedStatement(
> > +      "DELETE FROM webappsstore2"
> > +      " WHERE scope GLOB :scope"
> 
> if you have examples of the scopes we may hit here, it's possible this could
> be made faster. GLOB is slow, and here looks like you are doing scope GLOB
> token*, that looks basically the same as "scope BETWEEN token AND token ||
> X'FFFF'" (|| is not an OR, it's the SQLite string concatenation operator),
> that is _much_ faster if you have an index on scope.

Good idea!
Since I want to move forward on this quickly now and we are closing the next merge (OK, 4 weeks, but still...) I want to do only a minimum number of changes to the current patch to land it, and finish all the rest only as followups.


The major changes to add to this patch(es):
- idl compatibility
- two connections (main and worker thread)


All remaining suggestions, mainly comment 109, will be addressed in separate followup bugs.

So, I want to ask all interested parties for really seriously review this stuff and help getting this new code land ASAP.

So far, thanks everybody involved here!
(In reply to Honza Bambas (:mayhemer) from comment #118)
> > Could you please better explain me why wal makes a difference here?
> 
> This is based on Vladan's work and claims.  WAL according him allows one
> thread to write and another to read (a different set of keys).  W/o WAL this
> may not be possible.

From a SQLite point of view this is true, I'm just unsure it's true for Storage since it uses the sqlite mutex to control the connection resource. Plus your setup using the sync API on 2 threads is exotic and sort of unexpected, so I'm not sure we can make assumptions without testing.
Btw, your approach shouldn't hurt afaict, at a maximum is additional flow control and if you move to multiple connections it will be surely useful.

> I'll do this from now (increment version), but I'll leave the original code
> (which I'm not author of all of it!) as is.

Unfortunately that loses the biggest advantage of the migration approach, that is to win some startup time avoiding useless IO work. And if you start doing incremental changes from now on it's unlikely we can go back and introduce a version 0 to save on startup.
At this point would probably be better to leave it as-is and file a bug apart to rewrite all the db initialization properly, before adding any versioning on top.

> > here you want to PRAGMA wal_autocheckpoint = N (pages), in Cookies/Places we
> > use 512KB (N=16). The problem is that we changed the page_size from 1024 to
> > 32768 some years ago, and I don't know if there could be any
> > webappsstore.sqlite using the old page size in the wild... if so, you should
> > first read page_size and do a calculation (wanted_wal_size / page_size).
> > The default wal size is 32MB, that means in this case there will be a single
> > checkpoint at shutdown, and read performances may degrade with a large wal. 
> > Likely not what we want.
> 
> Is this what bug 842852 deals with?

yes, but my assumption on that bug was that it was relative to the current implementation, not this new one, not sure what's the plan, if it's for this version please add dependency.

> A bug, next patch version will have two connections, main thread r/o and bck
> thread r/w

good, one connection per thread is better to avoid any contention issue.
- reintroduced nsIDOMStorageManager.getLocalStorageForPrincipal
- reintroduced nsIDocShell.getSessionStorageForPrincipal and addSessionStorage
  - personally I'd love to remove this one too, the storage cloning is processed using a different and better API right now

- nsIDocShell cloneSessionStoragesTo and clearSessionStorages still left removed, those are unused and should be removed anyway ; also would just make implementation harder with the current approach
Attachment #721265 - Flags: review?(bugs)
- there are now two connections: worker and reader, each with it's own statement cache
- worker is used strictly (ensured by the code logic) on the background I/O thread for all read and write operations
- reader is used strictly on the main thread only for read operations ; actually just in one case of doing a sync load when preload didn't make it (should be rare) and in IPC race case (rare, and will be removed as part of bug 627635)
- moz storage service ensured on the main thread first
Attachment #721266 - Flags: review?(mak77)
Comment on attachment 714724 [details] [diff] [review]
v2.2 [part3] - new implementation files

Olly asked me to delegate sqlite and general database code review to mak.

Marco, can you take a look at those parts (DOMStorageDBThread.* files) please?
Attachment #714724 - Flags: review?(mak77)
Attached patch v2.3 [Complete patch] (obsolete) — Splinter Review
jst, can you please check only the idl file changes?  Compatibility is preserved, I only remove two unused methods on nsIDOMStorageManager that I wanted to remove regardless this DOM storage work anyway (comment 121), none of the two methods is used by extensions.
Attachment #721413 - Flags: superreview?(jst)
(In reply to Honza Bambas (:mayhemer) from comment #124)
> For ref v2.3 try run: https://tbpl.mozilla.org/?tree=Try&rev=14945a899357

Looks like I haven't chosen the best cs to base on:
https://tbpl.mozilla.org/?rev=c95439870e05
So, a better one, v2.3 try run:
https://tbpl.mozilla.org/?tree=Try&rev=b266c5aa822e
Comment on attachment 714724 [details] [diff] [review]
v2.2 [part3] - new implementation files

Review of attachment 714724 [details] [diff] [review]:
-----------------------------------------------------------------

the db part looks ok, modulo the other patch I still have to check.

::: dom/src/storage/DOMStorageDBThread.cpp
@@ +21,5 @@
> +
> +// How long we collect write oprerations
> +// before they are flushed to the database
> +// In milliseconds.
> +#define FLUSHING_INTERVAL 5000

nit: I suggest delegate the unit to the name than to the comment like FLUSHING_INTERVAL_MS

@@ +113,5 @@
> +                     mPendingTasks.IsScopeClearPending(cache->Scope());
> +    }
> +
> +    if (!pendingTasks) {
> +      // Do the load synchronously on the main thread, WAL is enabled.

the second part of the comment looks truncated, it should read as "WAL is enabled, thus..."

@@ +281,5 @@
> +        }
> +
> +        if (!mPendingTasks.Finalize(rv)) {
> +          mStatus = rv;
> +          NS_WARNING("localStorage DB access broken");

Unless there is a test explicitly checking for this, I think a MOZ_ASSERT here could make more sense, it shouldn't be expected.

@@ +636,5 @@
> +      break;
> +
> +    // OFFSET is an optimization when we have to do a sync load
> +    // and cache has already loaded some parts asynchronously.
> +    // It skips keys we have already loaded.

just as a side note, afaik OFFSET doesn't have any kind of optimization in SQLite, it still gets the row and skips it. It should basically be identical to you calling executeStep() offset times. It can save some function call overhead though.

@@ +768,5 @@
> +  case taskClearMatchingScope:
> +  {
> +    nsCOMPtr<mozIStorageStatement> stmt = thread->mStatements.GetCachedStatement(
> +      "DELETE FROM webappsstore2"
> +      " WHERE scope GLOB :scope"

doesn't look like you implemented my suggestion to use between here or in the other patch
Are you keeping optimization for a later time or did you figure something may be wrong?

::: dom/src/storage/DOMStorageDBThread.h
@@ +149,5 @@
> +    // to flush what indicates a long standing issue with the database access.
> +    bool Finalize(nsresult rv);
> +
> +    // true when a clear that deletes the given |scope| is among the pending tasks;
> +    // when a preload for that scope is being scheduled, it must be finished right a way

typo: a way

@@ +214,5 @@
> +  PRThread* mThread;
> +  // The monitor we drive the thread with
> +  Monitor mMonitor;
> +  // Flag to stop, protected by the monitor
> +  bool mStop;

it's not that clear what "stop" means in this context

@@ +232,5 @@
> +  // Flag to force immediate flush of all pending tasks
> +  bool mFlushImmediately;
> +
> +  // List of preloading tasks, in chronological or priority order
> +  nsTArray<Task*> mPreloads;

Could be nice to expand comment explaining who has ownership of these Tasks, or how it's managed

@@ +258,5 @@
> +  // Called when flush of pending tasks is being executed
> +  void UnscheduleFlush();
> +  // - PR_INTERVAL_NO_TIMEOUT when no pending tasks are scheduled
> +  // - some non zero value when tasks have been scheduled, but it is still not
> +  //   time to perform the flush

"some non zero value" is a bit generic, is it microseconds until the next flush?
Attachment #714724 - Flags: review?(mak77) → review+
(In reply to Marco Bonardo [:mak] from comment #128)
> Comment on attachment 714724 [details] [diff] [review]
> v2.2 [part3] - new implementation files
> the db part looks ok, modulo the other patch I still have to check.

Thanks!

> > +        if (!mPendingTasks.Finalize(rv)) {
> > +          mStatus = rv;
> > +          NS_WARNING("localStorage DB access broken");
> 
> Unless there is a test explicitly checking for this, I think a MOZ_ASSERT
> here could make more sense, it shouldn't be expected.

I don't think I agree.  This may happen as part of a normal operation when a hdd or netlink failure occurs that disallows writes to the database.  This is handling of expected error, definitely not something to assert(false) for.

> just as a side note, afaik OFFSET doesn't have any kind of optimization in
> SQLite, it still gets the row and skips it. It should basically be identical
> to you calling executeStep() offset times. It can save some function call
> overhead though.

This will go away in the "don't let remote dom storage block the main thread" bug.

> > +      "DELETE FROM webappsstore2"
> > +      " WHERE scope GLOB :scope"
> 
> doesn't look like you implemented my suggestion to use between here or in
> the other patch
> Are you keeping optimization for a later time or did you figure something
> may be wrong?

I will do this in a followup bug.  I don't want to mess with these patches here more then necessary.  See comment 119 for my plan.

> @@ +258,5 @@
> > +  // Called when flush of pending tasks is being executed
> > +  void UnscheduleFlush();
> > +  // - PR_INTERVAL_NO_TIMEOUT when no pending tasks are scheduled
> > +  // - some non zero value when tasks have been scheduled, but it is still not
> > +  //   time to perform the flush
> 
> "some non zero value" is a bit generic, is it microseconds until the next
> flush?

No, it's interval time.  Hard to say what is proper here.  But for me "non-zero" is quit clear ;)  Result of this method is used two ways, both in the thread loop:
- result is passed to monitor.Wait()
- when result is == 0, do the flush

I can try to explain this a bit better in the comment

Thanks again!
Comment on attachment 721266 [details] [diff] [review]
v2.3 interdiff 2 - added two connection per-thread

Review of attachment 721266 [details] [diff] [review]:
-----------------------------------------------------------------

The idea looks correct, I'm unsure about the concurrency on schema init, could you please clarify it?

::: dom/src/storage/DOMStorageDBThread.cpp
@@ +396,5 @@
>    Telemetry::AutoTimer<Telemetry::LOCALDOMSTORAGE_INIT_DATABASE_MS> timer;
>  
>    nsresult rv;
>  
> +  // Here we are on the worker thread. This opens the worker connection.

a MOZ_ASSERT that this is not on mainthread would be nice.

Btw, all of the schema init is done in the worker, I think it's worth to add a comment to explain how it prevents a synchronous preload to happen while the schema is still being applied by the worker thread, it doesn't look obvious just looking at it

@@ +689,5 @@
>  
> +    StatementCache* statements;
> +    if (MOZ_UNLIKELY(NS_IsMainThread())) {
> +      if (MOZ_UNLIKELY(!thread->mReaderConnection)) {
> +        // Do lazy main thread connection openning

typo "openning"

@@ +739,5 @@
>  
>    case taskAddItem:
>    case taskUpdateItem:
>    {
> +    NS_ASSERTION(!NS_IsMainThread(), "localStorage write operation on main thread?");

while I'm aware there is an effort ongoing to convert NS_ASSERTION to an abort, I think for new code is better to use MOZ_ASSERT
(In reply to Marco Bonardo [:mak] from comment #130)
> Comment on attachment 721266 [details] [diff] [review]
> v2.3 interdiff 2 - added two connection per-thread
> 
> Review of attachment 721266 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> The idea looks correct, I'm unsure about the concurrency on schema init,
> could you please clarify it?

Actually, there is no protection.  Good point.  It can be solved by a simple volatile flag - initially false, when db init is up -> change it to true.  In the sync preload invocation code check this flag is true, otherwise do an async preload.
ok, then I think my positive review is just blocked by protecting the sync preload from a non-inited database. Next patch implementing that should be fine.
(In reply to Marco Bonardo [:mak] from comment #132)
> ok, then I think my positive review is just blocked by protecting the sync
> preload from a non-inited database. Next patch implementing that should be
> fine.

OK, I'll create an interdiff with this fix to let you review + v2.4 full patch to look over it.

Thanks!
- fixes review objections from comment 128 to the DB thread code
- carrying forward mac's r+
Attachment #723742 - Flags: review+
- addresses concerns from comment 130
- main change is addition of a protection flag to prevent main thread access while DB opening is not finished on the background I/O thread

Marco, please also finish review of v2.3 interdiff 2 along with this one (r- or r+), it's still pending.  Thanks.
Attachment #723747 - Flags: review?(mak77)
Attachment #721266 - Flags: review?(mak77) → review+
Comment on attachment 723747 [details] [diff] [review]
v2.4 interdiff 2 - db thread opening vs main thread access concurency

Review of attachment 723747 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/src/storage/DOMStorageDBThread.cpp
@@ +121,5 @@
>  
>    // Bypass sync load when an update is pending in the queue to write, we would
> +  // get incosistent data in the cache.  Also don't allow sync main-thread preload
> +  // when DB open is still pending on the background thread.
> +  if (mWALModeEnabled && mDBOpened) {

could you please add a MOZ_ASSERT(mDBOpened) (or whatever other name it will get) where we open the synchronous connection just as added safety?

@@ +478,5 @@
>    NS_ENSURE_SUCCESS(rv, rv);
>  
> +  // Database open and all initiation operation are done.  Switching this flag
> +  // to true allow main thread to read directly from the database when it's found
> +  // more optimal.

nit: this comment is not scary enough, the db would be totally broken before this point, not just suboptimal, at a point that the sync read would fail immediately :)

::: dom/src/storage/DOMStorageDBThread.h
@@ +219,5 @@
>    // Whether WAL is enabled
>    bool mWALModeEnabled;
> +  // Whether DB has already been open, avoid races between main thread reads
> +  // and pending DB init in the background I/O thread
> +  bool volatile mDBOpened;

the db has not just been opened, also inited and updated, maybe mDBReady?
Attachment #723747 - Flags: review?(mak77) → review+
Comment on attachment 714711 [details] [diff] [review]
v2.2 [Complete patch]

Since Marco made his reviews (thanks!), after discussion with Vladan he doesn't want to do another review round on this.
Attachment #714711 - Flags: review?(vdjeric)
(In reply to Marco Bonardo [:mak] from comment #137)
> could you please add a MOZ_ASSERT(mDBOpened) (or whatever other name it will
> get) where we open the synchronous connection just as added safety?

I'm not sure I understand this requirement... did you want MOZ_ASSERT(!mDBOpened) ?
(In reply to Honza Bambas (:mayhemer) from comment #139)
> (In reply to Marco Bonardo [:mak] from comment #137)
> > could you please add a MOZ_ASSERT(mDBOpened) (or whatever other name it will
> > get) where we open the synchronous connection just as added safety?
> 
> I'm not sure I understand this requirement... did you want
> MOZ_ASSERT(!mDBOpened) ?

No, I'd like an assert than when we try to access the db synchronously it has been properly inited before by the async open. The point where we open the sync connection was looking a promising point, but maybe there's a better one?
That's just to catch possible future coding errors.
Attachment #714726 - Flags: review?(bugs) → review+
Comment on attachment 721265 [details] [diff] [review]
v2.3 interdiff 1 - IDL compatibility

>+nsDocShell::GetSessionStorageForPrincipal(nsIPrincipal* aPrincipal,
>+                                          const nsAString& aDocumentURI,
>+                                          bool aCreate,
>+                                          nsIDOMStorage** aStorage)
>+{
>+    nsCOMPtr<nsIDOMStorageManager> manager = TopSessionStorageManager();
>+    if (!manager)
>+        return NS_ERROR_UNEXPECTED;
>+
>+    if (aCreate)
>+        return manager->CreateStorage(aPrincipal, aDocumentURI,
>+                                      mInPrivateBrowsing, aStorage);
>+
>+    return manager->GetStorage(aPrincipal, mInPrivateBrowsing, aStorage);
>+}
DocShell a bit odd coding style, but please use {} with if/else/while/etc.


>+NS_IMETHODIMP
>+DOMStorageManager::GetLocalStorageForPrincipal(nsIPrincipal* aPrincipal,
>+                                               const nsAString& aDocumentURI,
>+                                               bool aPrivate,
>+                                               nsIDOMStorage * *_retval)
nsIDOMStorage** aRetval
Attachment #721265 - Flags: review?(bugs) → review+
Why is the complete v2.4 patch 50kb smaller than the complete v2.3 patch?

Did you test the OpenDatabaseConnection logic with WAL mode off? Does mReaderConnection ever become non-null in that case?
(In reply to Vladan Djeric (:vladan) from comment #142)
> Why is the complete v2.4 patch 50kb smaller than the complete v2.3 patch?
> 

Answered on IRC, but also for others: I forgot to remove nsDOMStorage.cpp file after merge.

> Did you test the OpenDatabaseConnection logic with WAL mode off? Does
> mReaderConnection ever become non-null in that case?

In the past I did test, but not with the most recent changes.  Good point.

When WAL is off, with the most recent patch in this bug (this changes in bug 850362), mReaderConnection will be left null.
Comment on attachment 714727 [details] [diff] [review]
v2.2 [part5] - changes to tests and test changes to DOMStorage code


>+  if (unloadFlags & kReload) {
>+    WaitForPreload(Telemetry::LOCALDOMSTORAGE_UNLOAD_BLOCKING_MS);
>+
>+    mData[kDefaultSet].mKeys.Clear();
>+    mLoaded = false; // ONLY FOR TESTING, NEVER CALLED IN PRODUCTION CODE
No need to shout in the comment.
And if kReload is for testing only, better to say so. kTestReload and perhaps the
notification activating reload should be "test-reload" or some such to make it less likely that
addons will misuse this (for whatever reason).


>+  if (!strcmp(aTopic, "reload")) {
>+    // FOR TESTING PURPOSES ONLY
No need to shout

>+    if (mType != LocalStorage)
>+      return NS_OK;
This code lives under dom/ so please use Gecko coding style.
if (expr) {
  stmt;
}

>+++ b/dom/tests/mochitest/localstorage/test_localStorageQuota.html
>@@ -35,70 +35,64 @@ function doNextTest()
> 
>       slaveOrigin = "http://example.com";
>       slave.location = slaveOrigin + slavePath + "frameQuota.html?add&A&success";
>       break;
> 
>     // In subdomain now set another key with length 500 bytes, i.e.
>     // allocate 501 bytes
>     case 2:
>-      slaveOrigin = "http://test1.example.com";
>+      slaveOrigin = "http://example.com";
Why are you changing this?
It is worrisome to change existing tests.

>     case 3:
>-      slaveOrigin = "http://test1.example.com";
>+      slaveOrigin = "http://example.com";
ditto, and elsewhere


r- because we shouldn't need to change existing tests. If changes are required, it means there is high probability for regression bugs.
But if you can explain why the changes are ok, ask review again.
Attachment #714727 - Flags: review?(bugs) → review-
(In reply to Olli Pettay [:smaug] from comment #144)
> >-      slaveOrigin = "http://test1.example.com";
> >+      slaveOrigin = "http://example.com";
> Why are you changing this?
> It is worrisome to change existing tests.
And remember http://feross.org/fill-disk/
I just passed localStorage mochitests with WAL disabled.  Everything ok.  The only change when WAL==off is actually that the sync load operation is dispatched to IO thread and is "synchronized" by waiting for the cache to load.

More important is to test WAL==on, the concurrency is something that needs thorough testing.
(In reply to Olli Pettay [:smaug] from comment #144)
> Comment on attachment 714727 [details] [diff] [review]
> v2.2 [part5] - changes to tests and test changes to DOMStorage code

> r- because we shouldn't need to change existing tests. If changes are
> required, it means there is high probability for regression bugs.
> But if you can explain why the changes are ok, ask review again.

We have to since the functionality according quota has changed.

This test is (was) built to test we enforce quota eTLD+1 scope.  This patch for simplicity and mainly performance reasons changes quota enforcement to origin only.

This opens us to filldisk attack, but other browsers (IE, chrome) are behaving the same way - enforce only per-origin quota for localStorage.

There is still a patch to re-enforce eTLD+1 for this new code, however, it's not 100% reliable (it's asynchronous, and when eTLD+1 consumption is unknown during first write, it's assumed to be 0).

This has been announced very early in this bug and all the work is based on fact eTLD+1 restriction will not be enforced.

There is a work under progress on having a much better quota management in bug 767944.  I want to use it for this new code when done.
Comment on attachment 714727 [details] [diff] [review]
v2.2 [part5] - changes to tests and test changes to DOMStorage code

Politely asking for review again, based on comment 147.

Other concerns will be addressed.
Attachment #714727 - Flags: review- → review?(bugs)
I don't think we can land the patch if it opens us to trivial fulldisk attacks, especially given the
recent buzz around http://feross.org/fill-disk/. It was a major PR win to us.
Should we land bug 767944 first?

I want to hear what sicking says about this.
Flags: needinfo?(jonas)
Jonas originally confirmed me to bypass this ;)  However, we can adapt the async patch (the first patch attached to this bug), it's some merging and work to do, but definitely doable.  It can protect us against fill disk easily.
(In reply to Olli Pettay [:smaug] from comment #149)
> I don't think we can land the patch if it opens us to trivial fulldisk
> attacks, especially given the
> recent buzz around http://feross.org/fill-disk/. It was a major PR win to us.
> Should we land bug 767944 first?
> 
> I want to hear what sicking says about this.

https://bugzilla.mozilla.org/show_bug.cgi?id=767944#c11
The interesting part of that comment is
"Additionally, as long as we have a centralized quota manager which limits total filesize per eTLD+1 we are fulfilling the above rule."

We don't have that without bug 767944.
Flags: needinfo?(Jan.Varga)
Please note that the bug 767944 doesn't provide the temporary storage, although it's almost ready to land and is required for temp storage. Temporary storage is bug 785884 and that's only generic temporary storage infrastructure + IndexedDB support, reviews will take some time and then we need to plugin LocalStorage to QuotaManager (implement the quota client interface) and that's not trivial since we need to figure out how to check quota synchronously, etc.

So I guess we need a "temporary" solution here.
Flags: needinfo?(Jan.Varga)
Btw, I didn't check the latest patch here ...
Do we create sqlite files per origin now ?
(In reply to Olli Pettay [:smaug] from comment #152)
> The interesting part of that comment is
> "Additionally, as long as we have a centralized quota manager which limits
> total filesize per eTLD+1 we are fulfilling the above rule."
> 
> We don't have that without bug 767944.

Yup and based on Jan's comments as well it seems we should preserve it then.

Olli, how quickly can you then review interdiff for eTLD+1 quota?  I can have it tomorrow afternoon.
(In reply to Jan Varga [:janv] from comment #154)
> Btw, I didn't check the latest patch here ...
> Do we create sqlite files per origin now ?

No.
(In reply to Honza Bambas (:mayhemer) from comment #156)
> (In reply to Jan Varga [:janv] from comment #154)
> > Btw, I didn't check the latest patch here ...
> > Do we create sqlite files per origin now ?
> 
> No.

It's a bit off topic, but do you think it will be possible to do it w/o hurting performance, the design, etc. of LocalStorage ?
Comment on attachment 714727 [details] [diff] [review]
v2.2 [part5] - changes to tests and test changes to DOMStorage code

So this is still r- because of those test changes.
Attachment #714727 - Flags: review?(bugs) → review-
Comment on attachment 714724 [details] [diff] [review]
v2.2 [part3] - new implementation files

Coding style is wrong.
* Use {} always with 'if', 
* Arguments should be in form aArgumentName
* Be consistent where * and & go with arguments are variables (* and & are with type, not with name, nsIFoo* foo, not nsIFoo *foo)
* Change _retval to something better, or at least aRetVal
* It is
  if (expr) {
    stmt1;
  } else {
    stmt2;
  }


>+void
>+DOMStorage::BroadcastChangeNotification(const nsSubstring& aKey,
>+                                        const nsSubstring& aOldValue,
>+                                        const nsSubstring& aNewValue)
>+{
>+  nsresult rv;
>+  nsCOMPtr<nsIDOMEvent> domEvent;
>+  NS_NewDOMStorageEvent(getter_AddRefs(domEvent), nullptr, nullptr);
>+
>+  nsCOMPtr<nsIDOMStorageEvent> event = do_QueryInterface(domEvent);
>+  rv = event->InitStorageEvent(NS_LITERAL_STRING("storage"),
>+                               false,
>+                               false,
>+                               aKey,
>+                               aOldValue,
>+                               aNewValue,
>+                               mDocumentURI,
>+                               static_cast<nsIDOMStorage*>(this));
>+  if (NS_FAILED(rv)) {
>+    return;
>+  }
>+
>+  nsRefPtr<StorageNotifierRunnable> r = new StorageNotifierRunnable(event);
>+  NS_DispatchToMainThread(r);
Could you keep the existing comment 
http://mxr.mozilla.org/mozilla-central/source/dom/src/storage/nsDOMStorage.cpp?rev=eaff15332579&mark=1631-1632#1624

>+NS_IMETHODIMP_(nsrefcnt)
>+DOMStorageCache::Release(void)
>+{
>+  // We must actually release on the main thread since the cache removes it
>+  // self from the manager's hash table.  And we don't want to lock access to
>+  // that hash table.
>+  if (NS_IsMainThread())
>+    return DOMStorageCacheBridge::Release();
>+
>+  nsrefcnt result = mRefCnt - 1; // 32-bit is atomic
Please use NS_AtomicDecrementRefcnt


>+  // Flag that is initially false.  When the cache is about to work with
>+  // the database (i.e. it is persistent) this flags is set to true after
>+  // all keys and coresponding values are loaded from the database.
>+  // This flag never goes from true back to false.
>+  volatile bool mLoaded;
I don't understand why this is volatile.
Note, C++ volatile != Java volatile


>+
>+  // Result of load from the database.  Valid after mLoaded flag has been set.
>+  nsresult mLoadResult;
>+
>+  // Init() method has been called
>+  bool mInitialized : 1;
>+  // This cache is about to be bound with the database (i.e. it has
>+  // to load from the DB first and has to persist when modifying the
>+  // default data set.)
>+  bool mPersistent : 1;
>+  // false when the session-only data set was never used
>+  // true after access to session-only data is first made, we also fill
>+  // session-only data set with the default one at that moment
>+  // drops back to false when session-only data are cleared by chrome
>+  bool mSessionOnlyDataSetActive : 1;
>+  // Whether we have already captured state of the cache preload on our first access.
>+  bool mPreloadTelemetryRecorded : 1;
Could you add a newline after each member variable.



Will go through this few times, but r- anyhow.
Attachment #714724 - Flags: review?(bugs) → review-
Comment on attachment 714725 [details] [diff] [review]
v2.2 [part4] - integration of the new implementation

Same coding style nits as for the previous patch.

nsDocShell is special (it is really old and odd).
It uses 4 spaces indentation, but DOM code uses 2.

>+nsGlobalWindow::PrecacheLocalStorage()
Maybe PreCacheLocalStorage ?

>+++ b/toolkit/components/telemetry/Histograms.json
You may want to ask vladan to look at these changes.
Attachment #714725 - Flags: review?(bugs) → review+
(In reply to Jan Varga [:janv] from comment #157)
> It's a bit off topic, but do you think it will be possible to do it w/o
> hurting performance, the design, etc. of LocalStorage ?

It will be a bigger change, but definitely doable.  In a followup.
(In reply to Olli Pettay [:smaug] from comment #159)

Thanks!

> Comment on attachment 714724 [details] [diff] [review]
> v2.2 [part3] - new implementation files
> 
> * Be consistent where * and & go with arguments are variables (* and & are
> with type, not with name, nsIFoo* foo, not nsIFoo *foo)

Some had to be missed with regexp.  I already have fixed this one time.

> Could you keep the existing comment 
> http://mxr.mozilla.org/mozilla-central/source/dom/src/storage/nsDOMStorage.
> cpp?rev=eaff15332579&mark=1631-1632#1624

It's already in the latest version (2.4), returned as part of merging process.  

I am adding this note, since the change won't be in the interdiff.

> >+  nsrefcnt result = mRefCnt - 1; // 32-bit is atomic
> Please use NS_AtomicDecrementRefcnt

No.  mRefCnt must stay intact here.  Actual decrement is about to happen on the main thread in standard isupports thread-safe implementation of Release().

Here we only have to return some meaningful value to the caller, i.e. refcnt - 1.  And no need to atomize it.

> >+  volatile bool mLoaded;
> I don't understand why this is volatile.
> Note, C++ volatile != Java volatile

Because it's checked on in a loop like:

while (!mLoaded)
  monitor.wait(INDEFINITELY);

During wait the value may change.  When not volatile, compiler may optimize the read and actual change will not be read.

I understand how c++ volatile works.
Raised bug 851310 to keep visibility on the issue with chrome code not seeing storage under certain permission conditions.
Crap! Yeah, I think we need to do the eTLD+1 limit for now. I think I was thinking too far ahead.

Once we integrate localStorage with Jan's new temporary-storage backend we can remove the eTLD+1 limit in the localStorage code, since the temporary-storage backend will enforce something similar. But for now we'll need the eTLD+1 limit in localStorage.

Really sorry I sent you down the wrong path :(
Flags: needinfo?(jonas)
(In reply to Jonas Sicking (:sicking) from comment #164)
> Yeah, I think we need to do the eTLD+1 limit for now.

Just to make sure, this means google.com has 5MB, and docs.google.com shares those 5MB? I.e. the current (in Release) functionality does not change?
Attached patch v2.5 [Complete patch] (obsolete) — Splinter Review
- reintroduced eTLD+1 quota checking
- with it, changes to existing quota tests were completely reverted
- addresses comments 141, 144, 159, 160

Interdiffs follow
Attachment #687198 - Attachment is obsolete: true
Attachment #714711 - Attachment is obsolete: true
Attachment #714715 - Attachment is obsolete: true
Attachment #714722 - Attachment is obsolete: true
Attachment #714723 - Attachment is obsolete: true
Attachment #714724 - Attachment is obsolete: true
Attachment #714725 - Attachment is obsolete: true
Attachment #714726 - Attachment is obsolete: true
Attachment #714727 - Attachment is obsolete: true
Attachment #721265 - Attachment is obsolete: true
Attachment #721266 - Attachment is obsolete: true
Attachment #721413 - Attachment is obsolete: true
Attachment #723742 - Attachment is obsolete: true
Attachment #723747 - Attachment is obsolete: true
Attachment #723748 - Attachment is obsolete: true
Attachment #721413 - Flags: superreview?(jst)
[ currently not reflected in the implementation document, will update shortly ]

- DOMStorageDBThread (and DOMStorageDBChild) objects keep cache (indefinitely) of how many bytes an eTLD+1 scope consumes
- initially this information is not cached for any etld+1 scope
- usage cache is created and is asynchronously shortly loaded from database after first load of some scope falling into that eTLD+1 scope
- the usage cache is updated at runtime when adding/deleting data by deltas of |new - old| ; this is happening independently whether usage cache has already been loaded from the database 
- for a very short time the usage cache may not be filled with actual consumption on the disk, but still it the mechanism protects against filldisk


However, let's imagine following scenario:
- you've just started firefox ; hence, there is are no caches, neither key+values nor usage
- an evil.com has cached 5MB already during the last browser session, just at the quota
- you visit evil.com/ that very early does localStorage.setItem('evil key', '5mb long data...');
- when async etld+1 usage determination doesn't make it before setItem is invoked by the script, the script may store another 5MB of data to disk (in sum 10MB)
- with this patch, there is some chance that such an evil script may fill your disk on every first access during a browser session
- however, filldisk cannot work (we are protected) even async usage determination is late before setItem call

So, we may want to further improve this as one of:
1. do etld+1 load before data preload (may prolog blocking of main thread)
2. don't write the data to disk (only keep in the cache) until etld+1 is loaded and new usage is checked against it (a bit more work and we falsely tell the script the data has been stored, but we may do that on write errors anyway)
Attachment #725480 - Flags: review?(bugs)
Attachment #725480 - Flags: feedback?(jonas)
- address comment 144:
- added "test" string portions to notifications
- and fixed comments
- testing instrumentation now compiles only in a test-enabled build
Attachment #725484 - Flags: review?(bugs)
- addresses comment 159 and comment 160
- coding style fixes ( {}, *, &, aArgName )
- removed usage of volatile on all mLoaded flags implementations
Attachment #725488 - Flags: review?(bugs)
(In reply to Honza Bambas (:mayhemer) from comment #169)
> - with this patch, there is some chance that such an evil script may fill
> your disk on every first access during a browser session

Better explained: 
- evil.com/ can potentially store additional 5MB at first access during every new browser session
Comment on attachment 725480 [details] [diff] [review]
2.4 -> 2.5 interdiff 1 of 3 [eTLD+1 quota checking]

The policy described in comment 169 sounds like a good compromise to me. Once we have the new storage backend none of this will be a problem at all.
Attachment #725480 - Flags: feedback?(jonas) → feedback+
Comment on attachment 725484 [details] [diff] [review]
2.4 -> 2.5 interdiff 2 of 3 [testing instrumentation improvement]

drop _ prefix from the define name.
Attachment #725484 - Flags: review?(bugs) → review+
Attachment #725455 - Flags: review?(bugs)
Comment on attachment 725480 [details] [diff] [review]
2.4 -> 2.5 interdiff 1 of 3 [eTLD+1 quota checking]

(I need to look at certain things in the full patch and in the patch which fixes coding style.)
Attachment #725480 - Flags: review?(bugs) → review+
Comment on attachment 725488 [details] [diff] [review]
2.4 -> 2.5 interdiff 3 of 3 [code styling, no volatile]


> nsTArray<nsString> *
>-DOMStorageCache::GetKeys(const DOMStorage* storage)
Extra space after nsTArray<nsString>


> // static
> DOMStorageDBBridge *
> DOMStorageCache::StartDatabase()
Extra space after DOMStorageDBBridge

>-DOMStorageDBThread::GetScopesHavingData(InfallibleTArray<nsCString> *scopes)
>+DOMStorageDBThread::GetScopesHavingData(InfallibleTArray<nsCString> *aScopes)
InfallibleTArray<nsCString>* aScopes
>       mFlushImmediately = true;
>     }
>-    else if (mPendingTasks.IsScopeClearPending(task->Scope())) {
>+    else if (mPendingTasks.IsScopeClearPending(aTask->Scope())) {
else if ... goes to the previous line

if (expr) {
  stmt;
} else if (expr2) {
  stmt2;
}
>+ForgetUpdatesForScope(const nsACString& aMapping,
>+                      nsAutoPtr<DOMStorageDBThread::Task> &aPendingTask,
>+                      void* aArg)
nsAutoPtr<DOMStorageDBThread::Task>& aPendingTask

>-CollectTasks(const nsACString& mapping, nsAutoPtr<DOMStorageDBThread::Task> &task, void* arg)
>+CollectTasks(const nsACString& aMapping, nsAutoPtr<DOMStorageDBThread::Task> &aTask, void* aArg)
nsAutoPtr<DOMStorageDBThread::Task>& aTask

> nsTHashtable<nsCStringHashKey> &
> DOMStorageDBChild::ScopesHavingData()
nsTHashtable<nsCStringHashKey>&


Thanks for doing this boring (yet IMO useful) change.
Attachment #725488 - Flags: review?(bugs) → review+
(In reply to Honza Bambas (:mayhemer) from comment #172)
> (In reply to Honza Bambas (:mayhemer) from comment #169)
> > - with this patch, there is some chance that such an evil script may fill
> > your disk on every first access during a browser session
> 
> Better explained: 
> - evil.com/ can potentially store additional 5MB at first access during
> every new browser session


I need to fix these comments event more.


tl;dr: it's even better!  evil.com cannot store more then 5MB.


In detail:

When there has been already 5MB from the previous browser session stored by evil.com and evil.com/ script does setItem('rnd-key-name', '5MB long string...') before eTLD+1 consumption is know, it won't pass, since we first load all data for the evil.com origin before we do any actual changes, thus we do check the per-origin quota that already is on 5MB edge.

So, to attack the potential delay in eTLD+1 loading one have to:
- some mechanism must make the browser do access to <random-name-everytime>.evil.com/ that stores 5MB
- since it has to be always a different domain, it always has to load from server, that gives us advantage to win the etld+1 load race when the server is not ultra-super fast (RTT)

Since the low impact (5MB per restart) and forcing the user to visit evil webpage somehow and the need for subdomain randomization, this attack would be impractical.
Attachment #725455 - Flags: review?(bugs) → review-
Comment on attachment 726743 [details]
comments for attachment 725455 [details] [diff] [review] (full patch)

>checkStorage is quite vague name. 
>But couldn't come up with any better name :/

it could be something like isMyStorage, but that also sounds crazy.

>It is a bit odd if a method named CanUseStorage sets some flags.

Yes, but not sure about a better name.

>SetIndex sounds really odd. 

GetDataIndex ?

> Also, having two boolean parameters is odd.
>Better to use an enum for tri-state.

No. Those are two independent flags provided by the cache and it is very clear this way what they mean.  I will not change this API.

>+  nsrefcnt result = mRefCnt - 1; // 32-bit is atomic
>Is it guaranteed that mRefCnt is aligned properly? Or why is the read atomic?

Good question.

>But it isn't clear to me why you need to have this method returning nsresult - 1.
>Couldn't the method be void Release() in DOMStorageCacheBridge and in this class?
>The classes don't inherit nsISupports so virtual void Addref(); and virtual void Release(); should work, I think.

And good suggestion.  I think it should work.

>And why isn't EnumerateRead causing perf problems?

It actually is and I plan to fix it in a followup bug.  This hasn't changes from the current code.

>+  nsTArray<nsString> * result = new nsTArray<nsString>();
>ditto

Grrr... still escaping!!!

>Could you perhaps move refcounting to DOMStorageDBBridge and always NS_RELEASE here.

I'm not very happy to do such changes so closely to point of landing.  Followup?

>Could you rename mSize? maybe mQuotaUsage?

Then better would be mOriginQuotaUsage.


>Rather magical value "262144". Any

It's just chosen number... no arguments why this one exactly.

>Also, have you tested if using normal priority would work?

Low priority is intentional.  I want the I/O thread block anything else as little as possible.  When something urgent is about to happen, like an urgent preload (when preload hasn't started before script accesses the storage) the priority is raised.

>
>Is there a reason to not use nsIThread? I'd prefer that. That way you wouldn't need to re-invent Tasks, just
>use nsRunnables.
>Or is there some strong reason to use raw PRThread?

I thought it was clear when looked at the thread function body.  I could use raw thread but it would reduce to have just a single Run() implementation that would do what the current thread function does.  I see no point in posting events to this thread.  I need to wake it up when necessary (periodic flushes) and I really don't see any point why to mess with timers that are btw not thread safe.

>+    nsRefPtr<DOMStorageCache> mCacheRef;
>Maybe mSessionStorageRef ?

It's not referring SessionStorage.

> "dom/tests/mochitest/pointerlock/test_pointerlock-api.html":" uncaught exception - NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIPermissionManager.addFromPrincipal] at chrome://specialpowers/content/specialpowersAPI.js:1219",
>+    "dom/tests/mochitest/sessionstorage/test_cookieSession.html":" uncaught exception - NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICookiePermission.setAccess] at chrome://specialpowers/content/specialpowersAPI.js:120",
>     "dom/tests/mochitest/sessionstorage/test_sessionStorageBase.html":" received the correct number of events - got 0, expected 8",
>+    "dom/tests/mochitest/sessionstorage/test_sessionStorageBaseSessionOnly.html":" uncaught exception - NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICookiePermission.setAccess] at chrome://specialpowers/content/specialpowersAPI.js:120",
>Please explain these changes.

Where?
(In reply to Honza Bambas (:mayhemer) from comment #179)
> 
> I thought it was clear when looked at the thread function body.
Certainly not clear. You're effectively reimplementing nsThread and nsRunnables with some scheduling stuff.
As far as I see, all that could be done with nsIThread too. If you need to do certain things before handling a runnable, 
you can use threadobserver, or just have a "prehandle" phase in Runnable's Run.

>  I could use
> raw thread
Raw thread? You're using more raw thread than nsIThread.

> but it would reduce to have just a single Run() implementation
> that would do what the current thread function does.  I see no point in
> posting events to this thread.  I need to wake it up when necessary
> (periodic flushes) and I really don't see any point why to mess with timers
> that are btw not thread safe.
When you need to wake nsIThread, you just post a runnable to it.
Using nsIThread API is less error prone than raw PRThread, IMHO. Less manual lock/monitor usage needed.

> >Please explain these changes.
> 
> Where?
Here :)
(In reply to Honza Bambas (:mayhemer) from comment #179)
> >But it isn't clear to me why you need to have this method returning nsresult - 1.
> >Couldn't the method be void Release() in DOMStorageCacheBridge and in this class?
> >The classes don't inherit nsISupports so virtual void Addref(); and virtual void Release(); should work, I think.
> 
> And good suggestion.  I think it should work.

Taking back.  This prevents me from using nice NS_IMPL_THREADSAFE_RELEASE to implement the release method.

I have no idea how class allocation are aligned in memory.  mRefCnt.mValue is the first member of this class, so I would presume it's aligned.

The method should return the resulting number of references, but I don't know who is actually the consumer here.
NS_IMPL_THREADSAFE_RELEASE is trivial
http://mxr.mozilla.org/mozilla-central/source/xpcom/glue/nsISupportsImpl.h?rev=c5dd0f5ef517&mark=1384-1396#1383
so you can just reimplement it without the return refcnt parts.
(In reply to Olli Pettay [:smaug] from comment #180)
> (In reply to Honza Bambas (:mayhemer) from comment #179)
> > 
> > I thought it was clear when looked at the thread function body.
> Certainly not clear. You're effectively reimplementing nsThread and
> nsRunnables with some scheduling stuff.
> As far as I see, all that could be done with nsIThread too. If you need to
> do certain things before handling a runnable, 
> you can use threadobserver, or just have a "prehandle" phase in Runnable's
> Run.
> 
> >  I could use
> > raw thread
> Raw thread? You're using more raw thread than nsIThread.

Ah, sorry, wanted to write DOM thread.

But I don't know why to insist on using nsIThread when PRThread is more optimal for this code.

> 
> > but it would reduce to have just a single Run() implementation
> > that would do what the current thread function does.  I see no point in
> > posting events to this thread.  I need to wake it up when necessary
> > (periodic flushes) and I really don't see any point why to mess with timers
> > that are btw not thread safe.
> When you need to wake nsIThread, you just post a runnable to it.
> Using nsIThread API is less error prone than raw PRThread, IMHO. Less manual
> lock/monitor usage needed.

I need a timer.  And timers can only be used on the main thread.

Tasks are not to be run as separate events on the thread.  Those are representations of database operations that are about to be run in a single shot, atomically, during flush.  That was what I hope was clear.  Maybe the better name would then be DBOperation instead of Task.

Also, I need to jump some tasks over, like urgent preloads.  Not sure the thread API can do this - put an event before other events.
Ok, perhaps I can live with DBOperation and PRThread.

But please change the names Run and Execute. It is not clear what the difference is.
And in DOMStorageDBThread::InsertTask, don't call .forget() at the end, but when you're
passing the Task (==DBOperation) to some array, call .forget().
Attached patch v2.6 [Complete patch] (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=bf2647a56e1b
Attachment #725455 - Attachment is obsolete: true
Attachment #725480 - Attachment is obsolete: true
Attachment #725484 - Attachment is obsolete: true
Attachment #725488 - Attachment is obsolete: true
Attached patch 2.5 -> 2.6 interdiff (obsolete) — Splinter Review
Attachment #726996 - Flags: review?(bugs)
(In reply to Honza Bambas (:mayhemer) from comment #179)
> > "dom/tests/mochitest/pointerlock/test_pointerlock-api.html":" uncaught exception - NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIPermissionManager.addFromPrincipal] at chrome://specialpowers/content/specialpowersAPI.js:1219",
> >+    "dom/tests/mochitest/sessionstorage/test_cookieSession.html":" uncaught exception - NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICookiePermission.setAccess] at chrome://specialpowers/content/specialpowersAPI.js:120",
> >     "dom/tests/mochitest/sessionstorage/test_sessionStorageBase.html":" received the correct number of events - got 0, expected 8",
> >+    "dom/tests/mochitest/sessionstorage/test_sessionStorageBaseSessionOnly.html":" uncaught exception - NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsICookiePermission.setAccess] at chrome://specialpowers/content/specialpowersAPI.js:120",
> >Please explain these changes.

So, test_cookieSession and test_sessionStorageBaseSessionOnly are using cookie service.  That cannot be used on content process and these two test are then added to the group of test expected to fail here.
(In reply to Olli Pettay [:smaug] from comment #184)
> Ok, perhaps I can live with DBOperation and PRThread.
> 
> But please change the names Run and Execute. It is not clear what the
> difference is.
> And in DOMStorageDBThread::InsertTask, don't call .forget() at the end, but
> when you're
> passing the Task (==DBOperation) to some array, call .forget().

Ah, I forgot to address there... 2.7 coming to you soon.

I was thinking of changing the the code to use nsIThread but it would actually end up close to what I have implemented with PRThread with just lesser control.

If there would be a strong recommendation to not use any NPSR stuff in DOM, then I for sure would change it somehow.
Attached patch v2.7 [Complete patch] (obsolete) — Splinter Review
Attachment #726994 - Attachment is obsolete: true
Attachment #726996 - Attachment is obsolete: true
Attachment #726996 - Flags: review?(bugs)
Attached patch 2.5 -> 2.7 idiff (obsolete) — Splinter Review
I will push another try after I check this builds on Linux/Mac ok (no warns).
Attachment #727007 - Flags: review?(bugs)
I'm investigating the B2G VM opt oranges.  Seems to be caused by this patch.
So I'm waiting for the fixes for those b2g oranges, but you can create a patch on top of the
latest full patch.
I won't get to this sooner then the next week (after March 25) since I'm occupied with other stuff now (mainly cache design).
(In reply to Honza Bambas (:mayhemer) from comment #191)
> Comment on attachment 727004 [details] [diff] [review]
> v2.7 [Complete patch]
> 
> https://tbpl.mozilla.org/?tree=Try&rev=76ff16f64bf5

INFO | setup.js | error loading or parsing 'http://mochi.test:8888/b2g.json'

I forgot to add a trailing comma at the end of the newly added record to this file...  This patch will slip a release because of a missing ',' !
Attached patch v2.7.1 [Complete patch] (obsolete) — Splinter Review
- merged to current m-c
- only change is an added comma after my first added line in b2g.json file that, when missing, was causing all the oranges on b2g VM...

https://tbpl.mozilla.org/?tree=Try&rev=e2be14a6e502
Attachment #727004 - Attachment is obsolete: true
Attachment #730462 - Flags: superreview?(bugs)
https://tbpl.mozilla.org/?tree=Try&rev=e2be14a6e502&showall=1 - we are green.  Ready to land after sr!
At the risk of throwing a spanner in the works: should chrome be accessing DOM storage using window.sessionStorage, content.sessionStorage, or through the shell or dom storage manager?  Behaviour is different with each, and in particular I cannot see DOM storage from chrome when dom.storage.enabled is false.  The items still exist and presumably shouldn't be visible to content, but chrome should be able to see them, as with cookies that exist but are not sent depending on permissions.
(In reply to Ian Nartowicz from comment #198)
> At the risk of throwing a spanner in the works: should chrome be accessing
> DOM storage using window.sessionStorage, content.sessionStorage, or through
> the shell or dom storage manager?  Behaviour is different with each, and in
> particular I cannot see DOM storage from chrome when dom.storage.enabled is
> false.  The items still exist and presumably shouldn't be visible to
> content, but chrome should be able to see them, as with cookies that exist
> but are not sent depending on permissions.


In case this bug affects the behavior, please report a new bug.

If that is something that *does* work in the current code and you want to change it, please report a new bug.

Unless this bug introduces a serious behavioral malfunction, the patches here shouldn't be further modified.

Thanks.
Attachment #730462 - Flags: superreview?(bugs) → superreview+
Comment on attachment 727007 [details] [diff] [review]
2.5 -> 2.7 idiff


> NS_IMPL_THREADSAFE_ADDREF(DOMStorageCacheBridge)
>-NS_IMPL_THREADSAFE_RELEASE(DOMStorageCacheBridge)
>+NS_IMETHODIMP_(void) DOMStorageCacheBridge::Release(void)
>+{
>+  MOZ_ASSERT(int32_t(mRefCnt) > 0, "dup release");
>+  nsrefcnt count = NS_AtomicDecrementRefcnt(mRefCnt);
>+  NS_LOG_RELEASE(this, count, "DOMStorageCacheBridge");
>+  if (0 == count) {
>+    mRefCnt = 1; /* stabilize */
>+    /* enable this to find non-threadsafe destructors: */
>+    /* NS_ASSERT_OWNINGTHREAD(_class); */
>+    delete (this);
>+  }
>+}
Could you add some comment here why this needs to return void and why the
NS_IMPL_THREADSAFE_RELEASE can't be used. (Something about DOMStorageCache::Release )



Please rename Task to DBOperation as you suggested earlier.
Attachment #727007 - Flags: review?(bugs) → review+
If you could update those, I'd like to do still a _quick_ go-through before you land the patch.
Attached patch v2.7.2 [Complete patch] (obsolete) — Splinter Review
- Task -> DBOperation
- comments, localvars and args also renamed
Attachment #727007 - Attachment is obsolete: true
Attachment #730462 - Attachment is obsolete: true
Attachment #735388 - Flags: superreview?(bugs)
Attachment #735388 - Flags: review?(bugs)
Attached patch v2.7.1 -> 2.7.2 idiff (obsolete) — Splinter Review
Comment on attachment 735389 [details] [diff] [review]
v2.7.1 -> 2.7.2 idiff

>+    // The operation type
>+    OperationType Type() { return mOperation; }
> 
>-    // The task scope (=origin)
>+    // The operation scope (=origin)
>     const nsCString Scope();
> 
>-    // |Scope + key| the task is working with
>+    // |Scope + key| the operation is working with
>     const nsCString Target();
> 
>   private:
>-    // The task implementation body
>+    // The operation implementation body
>     nsresult Perform(DOMStorageDBThread* aThread);
> 
>     friend class PendingTasks;
>-    TaskType mType;
>+    OperationType mOperation;
Why you call the member variable mOperation, but its getter is Type().
I think mOperation could be mType as it was before. It is the type of this Operation object.


>     nsRefPtr<DOMStorageCacheBridge> mCache;
>     DOMStorageUsageBridge* mUsage;
>     nsString mKey;
>     nsString mValue;
>     nsCString mScope;
>   };
> 
>-  // Encapsulation of collective and coalescing logic for all pending tasks
>-  // except preloads that are handled separately as a priority tasks
>+  // Encapsulation of collective and coalescing logic for all pending operations
>+  // except preloads that are handled separately as priority operations
>   class PendingTasks {
This should be PendingOperations


> 
>-  // Time a first pending task has been added to the pending tasks list
>+  // Time the first pending operation has been added to the pending tasks list
>   PRIntervalTime mDirtyEpoch;
pending operations ?

going through still the full patch...
Attachment #735389 - Flags: review+
Blocks: 851310
Comment on attachment 735388 [details] [diff] [review]
v2.7.2 [Complete patch]


>       let principal = Services.scriptSecurityManager.getDocShellCodebasePrincipal(uri, aDocShell);
>-      let storage = aDocShell.getSessionStorageForPrincipal(principal, "", true);
>+      let storageManager = aDocShell.QueryInterface(Components.interfaces.nsIDOMStorageManager);
>+
>+      // XXX there should be the document URI passed, do we know it here?
>+      let storage = storageManager.createStorage(principal, "", aDocShell.usePrivateBrowsing);
Could you add some comment here why passing "" as uri isn't catastrophic - 
something about just iterating items as you mentioned on IRC.
And file also a followup to fix this stuff (which is already problematic with the old implementation).

>@@ -2478,17 +2481,46 @@ nsGlobalWindow::SetNewDocument(nsIDocume
>     if (itemType != nsIDocShellTreeItem::typeChrome ||
>         nsContentUtils::IsSystemPrincipal(mDoc->NodePrincipal())) {
>       newInnerWindow->mHasNotifiedGlobalCreated = true;
>       nsContentUtils::AddScriptRunner(
>         NS_NewRunnableMethod(this, &nsGlobalWindow::DispatchDOMWindowCreated));
>     }
>   }
> 
>-  return NS_OK;
>+  PreloadLocalStorage();
>+
>+  return NS_OK;
>+}
Btw, have you run also Talos tests on try to see whether this patch affect to tp.
Also, have you looked at other performance results, though I don't expect this to affect those.


>+nsGlobalWindow::PreloadLocalStorage()
>+{
>+  if (!Preferences::GetBool(kStorageEnabled)) {
>+    return;
>+  }
>+
>+  if (IsChromeWindow()) {
>+    return;
>+  }
>+
>+  nsIPrincipal *principal = GetPrincipal();
nsIPrincipal*




>+inline uint32_t
>+GetDataSetIndex(const DOMStorage* aStorage)
>+{
>+  return GetDataSetIndex(aStorage->IsPrivate(), aStorage->IsSessionOnly());
>+}
This is rather error prone. It depends on whether or not DOMStorage::CanUseStorage has been
called before. 
But looks like we have a bug for this, bug 851310. Let fix the problem there.



>+void
>+DOMStorageCache::KeepAlive()
>+{
>+  // Missing reference back to the manager means the cache is not responsible
>+  // for its lifetime.  Used for keeping sessionStorage live forever.
>+  if (!mManager) {
>+    return;
>+  }
>+
>+  if (!NS_IsMainThread()) {
>+    // Timer and the holder must be initialized on the main thread.
>+    nsRefPtr<nsRunnableMethod<DOMStorageCache> > event =
>+      NS_NewRunnableMethod(this, &DOMStorageCache::KeepAlive);
>+
>+    NS_DispatchToMainThread(event);
>+    return;
>+  }
>+
>+  nsCOMPtr<nsITimer> timer = do_CreateInstance("@mozilla.org/timer;1");
>+  if (!timer) {
>+    return;
>+  }
>+
>+  nsRefPtr<DOMStorageCacheHolder> holder = new DOMStorageCacheHolder(this);
>+  timer->InitWithCallback(holder, 20000, nsITimer::TYPE_ONE_SHOT);
Perhaps make a #define for the 20000 or even a pref, which has 20000 as default value.
(and no need to add the pref to prefs.js)

>+DOMStorageCache::WaitForPreload(Telemetry::ID aTelemetryID)
>+{
>+  if (!mPersistent) {
>+    return;
>+  }
>+
>+  // mLoaded is volatile
it is not ;)

>+  bool loaded = mLoaded;
>+
>+  // Telemetry of rates of pending preloads
>+  if (!mPreloadTelemetryRecorded) {
>+    mPreloadTelemetryRecorded = true;
>+    Telemetry::Accumulate(
>+        Telemetry::LOCALDOMSTORAGE_PRELOAD_PENDING_ON_FIRST_ACCESS,
>+        !loaded);
2 space indentation



>+  // Measure which operation blocks and for how long
>+  TelemetryAutoTimer timer(aTelemetryID);
>+
>+  // If preload already started (i.e. we got some first data, but not all)
>+  // SyncPreload will just wait for it to finish rather then synchronously
>+  // read from the database.  It seems to me more optimal.
>+
>+  // TODO place for A/B testing (force main thread load vs. let preload finish)
>+  sDatabase->SyncPreload(this);
>+}
>+
>+nsresult
>+DOMStorageCache::GetLength(const DOMStorage* aStorage, uint32_t* aRetval)
>+{
>+  Telemetry::AutoTimer<Telemetry::LOCALDOMSTORAGE_GETLENGTH_MS> autoTimer;
>+
Could you add a comment to class TelemetryAutoTimer why it is needed (non-compile time telemetry id)


>+DOMStorageCache::GetKey(const DOMStorage* aStorage, uint32_t aIndex, nsAString& aRetval)
>+{
>+  // XXX: This does a linear search for the key at index, which would
>+  // suck if there's a large numer of indexes. Do we care? If so,
>+  // maybe we need to have a lazily populated key array here or
>+  // something?
Make sure have a bug filed for this.

>+class LoadUsageRunnable : public nsRunnable
>+{
>+public:
>+  LoadUsageRunnable(int64_t* aUsage, const int64_t aDelta)
>+    : mTarget(aUsage)
>+    , mDelta(aDelta)
>+  {}
>+
>+private:
>+  int64_t* mTarget;
>+  int64_t mDelta;
>+
>+  NS_IMETHOD Run() { *mTarget = mDelta; return NS_OK; }
>+};
>+
>+} // anon
>+
>+void
>+DOMStorageUsage::LoadUsage(const int64_t aUsage)
>+{
>+  if (!NS_IsMainThread()) {
>+    // In single process scenario we get this call from the DB thread
>+    nsRefPtr<LoadUsageRunnable> r =
>+      new LoadUsageRunnable(mUsage + kDefaultSet, aUsage);
>+    NS_DispatchToMainThread(r);
>+  } else {
>+    // On a child process we get this on the main thread already
>+    mUsage[kDefaultSet] += aUsage;
>+  }
>+}
This could use some comment why kDefaultSet (which is always 0) is use here.



>+DOMStorageUsage::CheckETLD1UsageDelta(uint32_t aGetDataSetIndex, const int64_t aDelta)
CheckAndSet ... since this actually sets the value

>+{
>+  int64_t newUsage = mUsage[aGetDataSetIndex] + aDelta;
>+  if (aDelta > 0 && newUsage > DOMStorageManager::GetQuota()) {
>+    return false;
>+  }
>+
>+  mUsage[aGetDataSetIndex] = newUsage;
>+  return true;
>+}
Should there be a MOZ_ASSERT(NS_IsMainThread()) here.
>+DOMStorageDBThread::OpenDatabaseConnection()
>+{
>+  nsresult rv;
>+
>+  nsCOMPtr<mozIStorageService> service
>+      = do_GetService(MOZ_STORAGE_SERVICE_CONTRACTID, &rv);
= could go to previous line and 2 space indentation
>+  if (exists) {
>+      rv = mWorkerConnection->ExecuteSimpleSQL(
>+        NS_LITERAL_CSTRING("INSERT OR IGNORE INTO "
>+                           "webappsstore2(scope, key, value, secure, owner) "
>+                           "SELECT REVERSESTRING(domain) || '.:', key, value, secure, domain "
>+                           "FROM moz_webappsstore"));
>+      NS_ENSURE_SUCCESS(rv, rv);
>+
>+      rv = mWorkerConnection->ExecuteSimpleSQL(
>+        NS_LITERAL_CSTRING("DROP TABLE moz_webappsstore"));
>+      NS_ENSURE_SUCCESS(rv, rv);
>+  }
2 space indentation

>+DOMStorageDBThread::ScheduleFlush()
>+{
>+  if (mDirtyEpoch) {
>+    return; // Already scheduled
>+  }
>+
>+  mDirtyEpoch = PR_IntervalNow() | 1; // Must be non-zero to indicate we are scheduled
>+
>+  // Wake the monitor from indefinite sleep...
>+  mMonitor.Notify();
I guess we'll assert hard if we don't have the lock here.

>+PLDHashOperator
>+CollectTasks(const nsACString& aMapping, nsAutoPtr<DOMStorageDBThread::DBOperation>& aOperation, void* aArg)
CollectOperations

>+FindPendingClearForScope(const nsACString& aMapping,
>+                         DOMStorageDBThread::DBOperation* aPendingTask,
aPendingOperation or aPendingOp

>+FindPendingUpdateForScope(const nsACString& aMapping,
>+                         DOMStorageDBThread::DBOperation* aPendingTask,
>+                         void* aArg)
ditto and fix indentation


>+} // ::dom
>+} // ::mozilla
>+  // Helper for creation of DOM storage objects
>+  nsresult GetStorageInternal(bool aCraete,
aCreate


This is so huge patch that it needs to land early in the cycle (like now), and need to be prepared to back it out if there are problems.
Attachment #735388 - Flags: superreview?(bugs)
Attachment #735388 - Flags: superreview+
Attachment #735388 - Flags: review?(bugs)
Attachment #735388 - Flags: review+
(In reply to Olli Pettay [:smaug] from comment #206)
> Btw, have you run also Talos tests on try to see whether this patch affect
> to tp.
> Also, have you looked at other performance results, though I don't expect
> this to affect those.

yep, comment 107

> Perhaps make a #define for the 20000 or even a pref, which has 20000 as
> default value.
> (and no need to add the pref to prefs.js)

Changed to #define, I'm not sure it's worth having a pref for this, but I'll file a followup for it.
Attached patch v2.7.3 (obsolete) — Splinter Review
https://tbpl.mozilla.org/?tree=Try&rev=eca80900ba6e
Attachment #735388 - Attachment is obsolete: true
Attachment #735389 - Attachment is obsolete: true
Attachment #736330 - Flags: superreview+
Attachment #736330 - Flags: review+
Attached patch 273 -> 274 idiffSplinter Review
Issues and respective fixes (everything was IPC related only):

- assertion failure on using a closed IPC channel

  FIX: protecting all Send*() calls with |if (mIPCOpen)| properly

- crash on using deleted mParent object in CacheParentBridge and UsageParentBridge

  FIX: use nsRefPtr for mParent instead of only a weak ptr

- leak of DOMStorageCache (taking DOMStorageManager along) when async preload of a cache (triggered by cache preload logic) didn't make it before the child process has exited ; the cache on IPC child side was missing any reference during preload operation so it was only "hanging in the air."

  FIX: 1. DOMStorageManager::PutCache(), a method responsible for creating and initializing a cache, now adds a ref locally, keeping the XPCOM contract "callee addrefs, caller releases" ; this prevents any such leaks in the future
       2. DOMStorageDBBridge::AsyncPreload() implementation, that is called during initiation of localStorage cache, is responsible for either hard-referring the cache for the time of preload or immediately call LoadDone (that engages KeepAlive) on the cache ; this simple contract ensurers the cache is alive or properly released after DOMStorageManager::GetStorageInternal() exits.

       to explain: in single process, there was no need for any of these fixes since preload is a DB operation keeping the cache referenced until preload is done (the contract has already been fulfilled, but not completely error free).  In IPC model, DOMStorageDBChild::AsyncPreload didn't ensure the cache got addref'ed for the time of preload.

I also removed an unnecessary call to KeepAlive() on cloned sesstionStorage cache.  There is no need to do that (was a left over only from earlier version of the patch) since sessionStorage caches are hard referenced by its manager (=docshell).


https://tbpl.mozilla.org/?tree=Try&rev=c15d1ff57b70 - failing test only try
https://tbpl.mozilla.org/?tree=Try&rev=1621860330fc - try: -a
Attachment #736330 - Attachment is obsolete: true
Attachment #736821 - Flags: review?(bugs)
Comment on attachment 736821 [details] [diff] [review]
273 -> 274 idiff

Could you please document what DOMStorageDBChild::mIPCOpen means.
Attachment #736821 - Flags: review?(bugs) → review+
https://hg.mozilla.org/mozilla-central/rev/8480e7da11d1
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Seems to stick.  Thanks everybody, mainly Olli for his final detailed reviews!
Depends on: 875631
Depends on: 883928
Depends on: 896026
Depends on: 886184
Depends on: 916781
See Also: → 1045989
You need to log in before you can comment on or make changes to this bug.