Open Bug 1680860 Opened 5 years ago Updated 8 months ago

[meta] Performance investigations surrounding large IndexedDB responses

Tracking

()

Status:

NEW

Project Flags:

Performance Impact

low

People

(Reporter: mstange, Unassigned)

References

(Depends on 4 open bugs)

Details

(Keywords: meta, perf:responsiveness, Whiteboard: dom-lws-bugdash-triage)

Attachments

(1 file, 4 obsolete files)

simpler textcase 4 years ago Julien Wajsberg [:julienw] (PTO -> Aug 14) 4.38 KB, text/html		Details
simpler testcase 4 years ago Julien Wajsberg [:julienw] (PTO -> Aug 14) 4.98 KB, text/html		Details
testcase-idb.html 4 years ago Julien Wajsberg [:julienw] (PTO -> Aug 14) 4.99 KB, text/html		Details
testcase-idb.html 4 years ago Julien Wajsberg [:julienw] (PTO -> Aug 14) 5.19 KB, text/html		Details
testcase-idb.html 4 years ago Julien Wajsberg [:julienw] (PTO -> Aug 14) 5.41 KB, text/html		Details

Markus Stange [:mstange]

Reporter

Description

•

5 years ago

Profile: https://share.firefox.dev/2JTARaI

I just experienced severe jank when interacting with the profiler during symbolication. Then I took a profile of that (linked above). I was moving my mouse consistently all the time, and you can see the mousemove events for that in the parent process, but the mouse move feedback in the content process got interrupted multiple times, for 2.3 seconds at a time.
You can see a consistent increase in memory usage during those gaps.
After the gap, the first thing that runs is a success event from IndexedDB.

This indicates to me that "somehow IndexedDB is blocking the pipes".

Tom Tung [:tt, :ttung]

Comment 1

•

5 years ago

•

Edited

Hi,

Thanks for reporting!

To be honest, I am not so sure if the jank was introduced by IDB since the content process was idle for most of the time. I don't understand why the main thread on the content process was idle around 2.3 seconds for every 3 seconds when there are many mouse move events which are needed to be propagated to the content process.

I guess it might be helpful if we can check the Background thread on the parent process (for indexedDB actors) and the call stack/thread for mouse event on the parent process (I thought it's the main thread, but I couldn't find the call stack for it on the profiler) to ensure there is no anything blocks them to be propagated to the content process.

cc :sg and :janv in case they know something.

Severity: -- → S4

Priority: -- → P3

Jan Varga [:janv]

Comment 2

•

5 years ago

I guess, a big structure clone is being processed (de-serialized) during a "pause".

Tom Tung [:tt, :ttung]

Comment 3

•

5 years ago

(In reply to Jan Varga [:janv] from comment #2)

I guess, a big structure clone is being processed (de-serialized) during a "pause".

https://profiler.firefox.com/public/6v77h8r62tx4mgzgevq2v509phwp2hcaaj0q6bg/stack-chart/?globalTrackOrder=9-0-1-2-3-4-5-6-7-8&hiddenGlobalTracks=1-2-3-4-5-6-7&localTrackOrderByPid=97061-2-0-1~97066-0~97107-0~97102-0~97103-0~97106-0~97084-0~97083-0~97089-0~&thread=10&timelineType=stack&v=5

Right, ~SturcturedCloneReadInfo took 57 ms among the IDBCursor.delete() which was executed in the success event handler, but I thought that is acceptable. Or maybe I was wrong.

Edgar Chen [:edgar] (PTO untils 8/11, slow response)

Comment 4

•

5 years ago

The mousemove is coalesced in the content process and be processed along with the vsync. From the profiler, the content process didn't receive/process vsync during the "pause", maybe the background thread of parent process was busy on something and blocked vsync sending the content process?

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 5

•

5 years ago

Would it be possible to get a profile with a bit more threads? IPC, PBackground etc.

Markus Stange [:mstange]

Reporter

Comment 6

•

5 years ago

Yeah I'll try to reproduce this again.

(In reply to Edgar Chen [:edgar] from comment #4)

From the profiler, the content process didn't receive/process vsync during the "pause", maybe the background thread of parent process was busy on something and blocked vsync sending the content process?

That sounds likely! I see PBackgroundIDBCursor::Msg_Response in the profile, and PVsync also goes through PBackground from the parent process background thread to the content process main thread. So my guess is that vsync was blocked on serializing / deserializing a large Response message.

Markus Stange [:mstange]

Reporter

Comment 7

•

5 years ago

I managed to capture a profile of this with more detail: https://share.firefox.dev/39TjDW2

The IPC data in the profile shows that, indeed, all IPC messages are delayed while a large message is sent, not just messages that go through PBackground.

Markus Stange [:mstange]

Reporter

Comment 8

•

5 years ago

Oh, that profile wasn't actually an IndexedDB message. Here's a better profile: https://share.firefox.dev/3qEePcY

Markus Stange [:mstange]

Reporter

Comment 9

•

5 years ago

Well, that profile does show a problem, but I'm not sure that it's the same problem as in the original profile. It's only showing one pause (when we get the libxul symbol table from the database), and the pause is not very long. I'm having a hard time reproducing the original bug with multiple long pauses.

Markus Stange [:mstange]

Reporter

Comment 10

•

5 years ago

Here's a better profile showing the original bug: https://share.firefox.dev/36BchEe

450ms in GetStructuredCloneReadInfoFromExternalBlob
108ms on the "IPDL Background" thread, putting the structured clone into an IPC message
2 seconds (!) on the "IPC I/O Parent" thread, sending chunks to the child process. During this time, all other PBackground messages (such as vsync) are blocked.
One the content process, once all the chunks have arrived, the delivery to the main thread is actually pretty quick.

This bug is about the fact that the PBackground queue is clogged up by a big IndexedDB message, and in particular, the fact that this causes vsync notifications to be delayed during that time.

Markus Stange [:mstange]

Reporter

Updated

•

5 years ago

Summary: All IPC is paused while IndexedDB results are sent to the content process? → Large IndexedDB responses clog up the PBackground queue and delay vsync notifications

Whiteboard: [qf]

Markus Stange [:mstange]

Reporter

Comment 11

•

5 years ago

Jed, can you take a look at https://share.firefox.dev/36BchEe and advise? Should large IDB responses use shmems? Should they use bigger chunks?

Flags: needinfo?(jld)

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 12

•

5 years ago

This is… weird. The maximum message size is 256MiB, so if one message takes 2s to send, then that's less than 128MiB/s to send data to a local process. The way we're doing it isn't very optimized for large messages, but that's still surprising. In fact it's probably less than that: the memory track in the profiler shows an increase of ~118MB in the child process while receiving.

I remember that we'd previously found that local-domain sockets on macOS have relatively small buffers and don't perform well on messages that would exceed the buffer size, but this is still surprising.

I wonder if the profiler being active is part of this — we've had other bugs where frequently interrupting an operation with signals causes performance problems (e.g., we have to block SIGPROF while forking on Linux, because of how signals interact with the page table manipulations for copy-on-write VM).

It would help if we had a microbenchmark for sending a large message, so we could check that on different OSes, with the Gecko profiler vs. an OS-level profiler vs. no profiler, etc.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 13

•

5 years ago

(In reply to Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧ from comment #12)

I wonder if the profiler being active is part of this — we've had other bugs where frequently interrupting an operation with signals causes performance problems (e.g., we have to block SIGPROF while forking on Linux, because of how signals interact with the page table manipulations for copy-on-write VM).

Except that we use Mach APIs on Mac, not signals. But there might be a similar issue.

Markus Stange [:mstange]

Reporter

Comment 14

•

5 years ago

I noticed the bad performance without the profiler running. I only started the profiler because things were slow.

Jan Varga [:janv]

Comment 15

•

5 years ago

Large structured clones are stored as separate files on disk (outside of the database). However, the data is still read and decompressed in the parent process. We can instead send a file descriptor to the content process and let it read from the file and decompress data.
There's already some support for this which was originally added for WebAssembly.Modules. See bug 1561876 which removed the support for WebAssembly.Modules and left some infrastructure for using the "preprocessing" in future. The preprocessing is not supported by indexes and cursors, that would have to be added. There's also another bug which needs to be addressed, bug 1313185.

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Updated

•

5 years ago

Depends on: 1690921

Jed Davis [:jld] ⟨⏰|UTC-7⟩ ⟦he/him⟧

Comment 16

•

5 years ago

To summarize bug 1690921, there are serious but easily fixable problems with large message performance on Unix and especially on macOS. That should improve this situation, but there may still be noticeable latency (I've only tested on debug builds so far, which isn't an accurate measure of how much improvement we'll get on release builds).

Note that if the message had been over 256MiB, it would crash.

Flags: needinfo?(jld)

Mike Conley (:mconley) (:⚙️)

Updated

•

5 years ago

Whiteboard: [qf] → [qf:p1:responsiveness]

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 17

•

4 years ago

Attached file simpler textcase (obsolete) — Details

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 18

•

4 years ago

Here is a new profile I took from this testcase: https://share.firefox.dev/3nrWXRF

I ran it from Chrome too, we seem to have better numbers in Firefox than in Chrome in general, on my machine.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 19

•

4 years ago

This testcase is maybe different than Markus' initial bug: here I'm storing a big value in IDB, but never retrieving it.
Also I'm using requestAnimationFrame callbacks, not mousemove events, I'm not sure if this is different regarding this bug.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 20

•

4 years ago

Attached file simpler testcase (obsolete) — Details

This testcase can retrieve. Here we're significantly worse than Chrome on my machine.

Attachment #9218973 - Attachment is obsolete: true

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 21

•

4 years ago

Attached file testcase-idb.html (obsolete) — Details

(fix typo)

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 22

•

4 years ago

https://share.firefox.dev/3ntT5Q0 <= this one is when retrieving from IDB.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 23

•

4 years ago

Hey Iain, Bas told me you could be interested in this :-) We spend a lot of time in js::NativeObject::addPropertyInternal.

Flags: needinfo?(iireland)

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Updated

•

4 years ago

Attachment #9218988 - Attachment is obsolete: true

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 24

•

4 years ago

Attached file testcase-idb.html (obsolete) — Details

Actually access the result, so that Chrome actually transfers the result, this makes the comparison fair.

Attachment #9218989 - Attachment is obsolete: true

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 25

•

4 years ago

Attached file testcase-idb.html — Details

with better timestamps

Attachment #9219001 - Attachment is obsolete: true

Iain Ireland [:iain]

Comment 26

•

4 years ago

Jan points out in chat that "JSStructuredCloneReader::read looks pretty suboptimal". He's currently working on improving our representation of shapes (bug 1704430); once that's done we can bypass a lot of the unnecessary work here.

Opened bug 1708272.

Depends on: 1708272

Flags: needinfo?(iireland)

Markus Stange [:mstange]

Reporter

Comment 27

•

4 years ago

These testcases look like a rather different problem - they're spending time on the main thread. The original bug was about delays that are not due to work on the main thread, but due to work in the IPC system.

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 28

•

4 years ago

(In reply to Markus Stange [:mstange] from comment #27)

These testcases look like a rather different problem - they're spending time on the main thread. The original bug was about delays that are not due to work on the main thread, but due to work in the IPC system.

yeah, I agree.

In the profiles I took I couldn't find anything interesting in the IPDL background thread, that's why I didn't keep it when sharing. So could this specific bug be fixed by 1690921 then?

Here is another profile with all the same threads as your previous profile: https://share.firefox.dev/3u4SdE0 (store then retrieve a big object).
I think the time spent in IPC looks reasonable now, what do you think?

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 29

•

4 years ago

Some more things from looking at this profile:

moveToTenured happens repeatidly when ReadStructuredClone, can we check the size of the data and unserialize directly to the Tenured generation?
in the thread "IPDL BackGround" smaug thinks that free happens in the same task, we should move that to a separate task so that the thread isn't blocked.
Bas thinks that ComputeCrc32c could be optimized widely
IPC is ~100ms for ~90MB (that's 800MB/s), on my profiling machine from 5y ago, this looks OK.

IPC IO Parent looks pretty good now. We think that it's not p1 responsiveness anymore, please change if you disagree.

I'll file bugs for these few things.

Whiteboard: [qf:p1:responsiveness] → [qf:p3:responsiveness]

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Updated

•

4 years ago

Depends on: 1708769

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Updated

•

4 years ago

Depends on: 1708775

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Updated

•

4 years ago

Depends on: 1708779

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 30

•

4 years ago

Florian noticed that in the "store" part of the profile (the first operation), in the indexedDb thread, there are 2 long fsync operations that blocked the profiler sampling. It would be nice to double check on more profiles.

Andrew Sutherland [:asuth] (he/him)

Comment 31

•

4 years ago

(In reply to Julien Wajsberg [:julienw] from comment #30)

Florian noticed that in the "store" part of the profile (the first operation), in the indexedDb thread, there are 2 long fsync operations that blocked the profiler sampling. It would be nice to double check on more profiles.

What does "blocked" mean in this context?

Julien Wajsberg [:julienw] (PTO -> Aug 14)

Comment 32

•

4 years ago

I mean there's a big hole in samples on this thread during fsync.
Florian filed bug 1709111 about this issue.

Jon Coppeard (:jonco)

Comment 33

•

4 years ago

Does anyone know why we're moving this much memory around? Maybe there's a way to make these queries return less data. I saw from comment 16 that we will just crash when the result size hits 256MB.

Andrew Sutherland [:asuth] (he/him)

Comment 34

•

4 years ago

IndexedDB keeps an approximation of its results size and will generate a failure (and/or stop prefetching records) if the results get too large. It won't actually hit the crash size.

We can't actually stop users from putting giant things in IndexedDB without cross-browser standardization, and the lack of a computed size for structured serialization means that it can't yet be standardized, although there have been some partial efforts to that end.

There is a preprocess mechanism that could be used to avoid sending the IDB responses in a single big response, but this will inherently introduce additional latency and is not a switch that can just be flipped because it also inherently messes up response ordering required per spec and compensations are not yet introduced.

I've re-titled the bug since it sounds like the quadratic performance hit of bug 1690921 was the cause of the very serious-sounding vsync jank.

Summary: Large IndexedDB responses clog up the PBackground queue and delay vsync notifications → [meta] Performance investigations surrounding large IndexedDB responses

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Keywords: meta

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P3

Keywords: perf:responsiveness

Whiteboard: [qf:p3:responsiveness]

Jens Stutte [:jstutte]

Updated

•

8 months ago

Whiteboard: dom-lws-bugdash-triage

Jens Stutte [:jstutte]

Updated

•

8 months ago

Type: defect → task

You need to log in before you can comment on or make changes to this bug.