IndexedDB blocks DOM rendering

NEW
Unassigned

Status

()

Core
DOM: IndexedDB
2 years ago
2 months ago

People

(Reporter: nolan, Unassigned)

Tracking

(Depends on: 1 bug, {perf})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

2 years ago
Created attachment 8666460 [details]
database-comparison.zip

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:41.0) Gecko/20100101 Firefox/41.0
Build ID: 20150917150946

Steps to reproduce:

1. Do a long-running IndexedDB operation in a transaction
2. Meanwhile, run an animated GIF in an <img/> tag or a JavaScript animation

See a live demo here: http://nolanlawson.github.io/database-comparison/

Steps:
1. Choose "IndexedDB"
2. Choose 10000 or 100000
3. Click "Insert docs"
4. Notice that Kirby stops dancing while the documents are being inserted.

The DOM is not affected by IndexedDB when run in a web worker. Also I note that this bug also exists in Chrome, but doesn't exist in Safari or Edge.


Actual results:

The animation is paused during the IDB transaction.


Expected results:

The animation should continue running smoothly even while IndexedDB operations are in progress.
(Reporter)

Comment 1

2 years ago
Issue filed on Chromium: https://code.google.com/p/chromium/issues/detail?id=536620
(In reply to nolan from comment #0)
> Expected results:
> 
> The animation should continue running smoothly even while IndexedDB
> operations are in progress.

From running on non-e10s nightly using the Firefox devtools profiler, it appears that the expected thing is happening: the animated gif is unable to paint while the testing code on the main thread is generating 10,000 documents and issuing 10,000 put calls as part of the over-arching transaction.  Once the JS code yields control flow, painting can resume.

I took 2 screenshots of my run; unfortunately this is on an active profile because the IndexedDB cleanup logic seems broken (I got "Error: InvalidStateError: A mutation operation was attempted on a database that did not allow mutations." after trying to re-run on a cleaner profile after running clean.):

* The waterfall view: https://clicky.visophyte.org/files/screenshots/20150927-150058.png
* The JS Flame Chart view: https://clicky.visophyte.org/files/screenshots/20150927-150146.png

Familiarizing ourselves quickly, on the big waterfall and the mini-timeline waterfall, the green squares represent the paints Kirby is making us do, and the yellow stuff is the testing logic.  I've zoomed the selection area down to the approximate time window of the test.

The longest promise callback is 107.69ms in test.js:133 which is this function from http://nolanlawson.github.io/database-comparison/tester.js:
      return new Promise(function (resolve, reject) {
        var txn = db.transaction('docs', 'readwrite');
        var oStore = txn.objectStore('docs');
        for (var i = 0; i < numDocs; i++) {
          var doc = createDoc();
          doc.id = 'doc_' + i;
          oStore.put(doc);
        }
        txn.oncomplete = resolve;
        txn.onerror = reject;
        txn.onblocked = reject;
      });

Now, the testing logic for me says "Took 371ms".  This is consistent with the initial promise callbacks through to the "complete" DOM event on the waterfall.  The testing logic is tracking how long the commit took to happen.  We can see paints did happen asynchronously while the transaction was running in the background, but of course, none happened while JS was on the main thread issuing the transaction.

And if we flip over to the JS Flame Chart we can see that part of the reason the paints were further delayed was because the flipping Google +1 buttons (plusone.js) from the Chrome web extensions documentation pages are thrashing the browser because I apparently desperately need realtime updates on how many people have +1'ed the documentation page.  But I'll go address this by making a snarky tweet now.
(Reporter)

Comment 3

2 years ago
Maybe I'm misunderstanding, but why does the function you posted block for so long? When I run the "Regular object" test with 100000 documents, it only blocks the GIF for a split-second, then reports ~150ms. When I run the same number of documents with IndexedDB, though, it reports ~7600ms and blocks Kirby for at least 4 seconds by my eye.

It's creating the same number of objects, but it's 7 seconds slower with significantly more dropped frames. Isn't that IndexedDB blocking the DOM?
Your regular object test is just:
  function createDoc() {
    return {
      data: Math.random()
    };
  }

  function regularObjectTest(numDocs) {
    var obj = {};
    for (var i = 0; i < numDocs; i++) {
      obj['doc_' + i] = createDoc();
    }
  }

Although this will generate a fair amount of garbage memory, this is the type of micro-test the SpiderMonkey JIT should eat for breakfast, especially since it's creating small objects with a stable shape/type that do not require cross-compartment wrappers, etc., so it's not surprising that it runs very quickly.

The IndexedDB test I ran generated 10,000 separate IndexedDB put requests, which includes running a structured clone operation, creating non-trivial C++/WebIDL bound objects using a native non-continuous-arena allocator, etc.  This is inherently more expensive.

My point is that the JS execution model is that you are going to block other main-thread things while your JS is running on the main thread.  Unless off-the-main-thread-animation is taking responsibility for animating the gif (it may on some platforms/in some cases? or maybe in the future), during the idbTest for loop, we expect that Kirby won't animate.  What I meant to convey previously is that you really should be capturing 2 numbers:
1) The wall clock time for the write to durably complete.
2) The amount of time your code is blocking the main thread.  The dominating factor will be your for-loop, but you could also wrap other pieces of your code too.

We do expect Kirby to stop animating for the time interval(s) covered by that second number.  We may also potentially expect some secondary fallout related to the cycle collector potentially needing to be involved in garbage collecting stuff (or not).  But if Kirby is stopping significantly more than covered by that interval, then there might be something IndexedDB could do to improve its performance.  (Albeit, limited by what the test does.  For example, if you put event-handlers on the requests returned by the put() call, we would expect the main thread to be slaughtered by N callbacks firing.)

This is likely to explain why Chrome/Chromium would behave similarly (assuming they depend on the main thread to help animate gifs on your platform.)

The profiler definitely helps you and me understand what might be going on.  Especially since the JS Flame Chart is actually showing what Gecko is getting up to even for things not caused by your page/compartment.  If I zoomed in more, I saw some stuff attributed to mozStorage, which is suspicious, but quite possibly related to the Places API and its shenanigans.  It would be great if you could play around with the profiler in your Firefox instance to see if you are getting similar results, etc.
(Reporter)

Comment 5

2 years ago
I'm on OS X 10.10.5. I don't know much about Firefox internals, but Kirby is definitely blocked for several seconds if you insert 100000 objects into IndexedDB. Whereas with both LocalStorage and in-memory ("regular object"), he barely misses a beat when you insert the same number of objects.

I took a performance snapshot and attached it (profile.json), but I'm seeing roughly the same thing as you, except that the "promise callback" is taking around 4 seconds on my machine, corresponding to the amount of time Kirby is blocked.

The timer is indeed a bit inaccurate. Is there a better way to calculate the number of dropped frames during the GIF animation? requestAnimationFrame()?

Side note: this is the first time I've reported a bug on Firefox's IndexedDB, which is saying a lot because I've probably reported a dozen IndexedDB bugs on the other browsers. I've always been super happy with your work (especially the performance improvements over the past couple years!). This result only surprises me because I've always been told that IndexedDB is preferable to LocalStorage since LocalStorage blocks the DOM, but here I'm seeing nearly the opposite.
(Reporter)

Comment 6

2 years ago
Created attachment 8666576 [details]
Profile of adding 100000 docs to IndexedDB
So, fwiw, the reason the animated GIF freezes is because we don't do off-main-thread animations for GIFs, and the main thread is stalled.

It's possible that the loop falls off the JIT or something ... we should investigate.
Is this something either of you would be interested in profiling?
Flags: needinfo?(bzbarsky)
Flags: needinfo?(bugs)

Comment 9

2 years ago
I see 33% of total in IDBObjectStore::AddOrPut, and when focus on that 33% (so that it is 100%)
- almost 50% is ___memmove_ssse3_back somewhere in IDBTransaction::StartRequest. SendPBackgroundIDBRequestConstructor is 12%
- Structured clone handling 11% (in GetAddInfo) (of that 11, 3.5 is VectorBase::growStorageBy)
- IDBRequest::Create 12%, (of that 12, 6 is IDBRequest::CaptureCaller)
Flags: needinfo?(bugs)
The memmove is likely also IPDL overhead.

What is the other 66% in?  JS?
That 33% was actually with all the threads. So of main thread time it is 77% and the rest is random JS or Promise handling.
That's an important detail!

So this means it's basically all IPDL overhead.
So I poked a bit at this too, kinda cheating (running in non-e10s mode, but this stuff mostly uses PBackground anyway, so I think that's OK).  I see significant mainthread time in the following places:

1)  Under ipc::DoWorkRunnable::Run.  This is about 40% (or more; my profile includes
    some painting while I was mousing over to the right button) of the mainthread time.  
  a)  About 1/5 of that is
      mozilla::dom::indexedDB::BackgroundRequestChild::Recv__delete__
      doing the DispatchSuccessEvent thing (converting idb keys to jsvals, firing the
      actual DOM events, etc).  
  b)  About 3/5 is
      mozilla::dom::indexedDB::PBackgroundIDBTransactionChild::RemoveManagee,
      almost all of this under
      nsTArray_Impl<PBackgroundIDBRequestChild*>::RemoveElementSorted calling
      memmove.  Sounds like we have tons of transactions and end up with nice O(N^2)
      behavior here.
  c)  About 1/5 is in other stuff here and there (IPC overhead with
      MessageChannel::CxxStackFrame, etc).

In an e10s build I guess this would happen in the parent process, which might help some, if these calls from the child are non-blocking.

2)  Another 40%+ under IDBObjectStoreBinding::Put.  This includes:
  a) About 2/5 of this is under
     PBackgroundIDBTransactionChild::SendPBackgroundIDBRequestConstructor.  This is
     doing MessageChannel::Send (locks, SendMessage) and doing memmoves via
     nsTArray_Impl<PBackgroundIDBRequestChild*>::InsertElementAt.  Yay more O(N^2) stuff.
  b) About 1/4 is under IDBObjectStore::GetAddInfo, mostly doing the JS structured clone
     write.  Something like 1/3 of that time is just overhead getting to the internal JS
     "write" impl.  :(
  c) 1/5 under IDBRequest::Create, which is doing various stuff like 
     DOMEventTargetHelper::BindToOwner, but half of it is nsJSUtils::GetCallingLocation
     which does JS::DescribeScriptedCaller.  This is mostly the frameiter stuff; using
     SavedStack won't help much here I expect.  Getting rid of the need to do this, on
     the other hand, might help.  ;)
  d) A bit of long-tail stuff.
Flags: needinfo?(bzbarsky)
Oh, so conclusions:  We should fix the data structure we use for storing Managees to deal with having lots of them, and then remeasure/reprofile...  ;)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Depends on: 1212027
Keywords: perf
OS: Unspecified → All
Hardware: Unspecified → All
Version: 43 Branch → unspecified
Per comment 14 looks like we need more profiling here.
Flags: needinfo?(bzbarsky)
Zoom says 32% time in main thread goes to _raw_write_unlock_irqrestore
whatever it means. Lots of context switches?

29% is spent under IDBObjectStore::AddOrPut.
Of that 29% 35% is IDBRequest::Create, under which 19% (of that 29%) is CaptureCaller. Why we need such thing? Sort of a regression from Bug 1083285 ?
Of that 29% 21% is structured cloning write.
21% goes to SendPBackgroundRequestConstructor.
there is some odd 2.2% of that 29% going to LoggingString ctor.
Most of the interesting stuff happens on a background thread. An idle main thread would mean this bug is fixed though.
OK, so I do still see a pause in the animation if I insert 100000 things on the given page.  Happens in both e10s and non-e10s mode.

A profile, in non-e10s mode (let me know if you think an e10s mode one would be useful) shows time split about evenly across three threads:

1)  The one running indexedDB::ConnectionPool::ThreadRunnable::Run (calling DoDatabaseWork, mostly).
2)  The one running ipc::MessagePumpForNonMainThreads (several thiings going on in there).
3)  The main thread.

Restricting to just the main thread, I see 1787ms under IDBObjectStoreBinding::put calls.  Going to focus on that, because the other 500ms was probably the time it took me to move from the "start the profiler" button to the "Insert docs" button.

So the time under put() breaks down as follows (all percentages of the time under put():

1)  33% IDBTransaction::StartRequest.  The time is mostly under MessageChannel::Send, taking and releasing locks, doing MessagePump::ScheduleWork (8% of the time is under here, calling nsThread::PutEvent, which involves _more_ locks).  Overall, MessageChannel::Seend is 25% of the time.  The remainder is PBackgroundIDBTransactionChild::Write (3%), hashtable addition (2%), PBackgroundChild::Register (2%, mostly malloc).

2)  30% under IDBObjectStore::GetAddInfo.  25% of this is JSAutoStructuredCloneBuffer::write, which is mostly writing strings, traversing objects, mallocing in the destructor of JSStructuredCloneWriter, that sort of thing.  5% is KeyPath::ExtractKey, about half under Key:::EncodeString, which is mostly doing the malloc under EnsureMutable/SetCapacity.  This is the char16_t* overload of EncodeString, and that EnsureMutable looks like the mBuffer.GetMutableData call to me.

3)  20% under IDBRequest::Create.  11% of this is nsJSUtils::GetCallingLocation (!).  4% under DOMEventTargetHelper:::BindToOwner, 3% cyclecollector::HoldJSObjectsImpl, 1% malloc.

Apart from those big three, there's the usual long tail of stuff: 3% in binding code doing WrapObject, 3% LoggingString constructors under AddOrPut, 2% ~ObjectStoreAddPutParams, 2% RequestParams::MaybeDestroy, 2% RequestParams::operator=, 2% ObjectStoreAddPutParams::Assign, and smaller things of various sorts.
Flags: needinfo?(bzbarsky)
Alright, so it sounds like the next two pieces are misc IPDL overhead (under IDBTransaction::StartRequest) and nsJSUtils::GetCallingLocation.

Updated

a year ago
Depends on: 1284369
You need to log in before you can comment on or make changes to this bug.