Closed Bug 1453454 Opened 2 years ago Closed Last year

SourceBuffer::Compact janks the main thread

Categories

(Core :: ImageLib, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox61 --- wontfix
firefox62 --- fixed

People

(Reporter: froydnj, Assigned: aosmond)

References

Details

(Whiteboard: [gfx-noted])

Attachments

(2 files, 1 obsolete file)

I have a profile with a ~700ms hang in the content process (#5), seemingly while SourceBuffer::Compact() is on the stack:

https://perfht.ml/2HhLjWS

This profile is not perfect evidence of the jank, because the actual samples in Compact() are spaced quite far apart.  Compare the ImgDecoder threads in the #3 and #4 content processes: the ImgDecoder threads appear to be taking very frequent samples.  And the main thread for content process #5 appears to be taking frequent samples...until Compact() starts, and then it only takes samples every ~0.1s.  After Compact() ends, we're back to taking frequent samples again.

I'm not sure why this is the case.  Maybe because we're so hip-deep in memcpy during that time, and memcpy (being written in assembly, typically) doesn't have the necessary bits to unwind out of it at all points, so we are getting samples taken, but we're unable to unwind, so we just give up?

In any event, it is possible that Compact() *isn't* actually taking all that time, and the samples seen are multiple calls to Compact(), for multiple images.  But then I would think that we wouldn't have a hang, because we'd be re-entering the event loop for each image?  Unless ChannelEventQueue::FlushQueue() could cause imgRequest::OnStopRequest to be called on multiple different images during a single flush?  I guess that would depend on whether we have separate channels for each image loaded (guessing "yes") and whether ChannelEventQueue aggregates events for multiple channels (probably not?).

Is it possible to split up Compact() into steps, doing the work lazily or off the main thread?
It is certainly possible to move the compacting off the main thread, it can already be happening in some cases. OnImageDataComplete is called on the main thread however, and if there is no active decoder, it will trigger the compact. I see no reason why it couldn't do a message post here if on the main thread, and defer the operation to a decoder thread.
(In reply to Nathan Froyd [:froydnj] from comment #0)
> I'm not sure why this is the case.  Maybe because we're so hip-deep in
> memcpy during that time, and memcpy (being written in assembly, typically)
> doesn't have the necessary bits to unwind out of it at all points, so we are
> getting samples taken, but we're unable to unwind, so we just give up?

If this happens, we should still get a sample, it just would only be a one-frame deep stack. And we'd still get stacks for all the other threads in this process.

But we're hardly getting any samples for a duration of 700ms. I don't know what causes this, but whatever it is, it's probably affecting both the sampler thread and the main thread, and it's probably the reason for both the fact that we're not getting samples and for the fact that SourceBuffer::Compact() is taking a long time.

In general, for times during which the profiler is having trouble sampling, I usually don't place much faith in the samples that we do get during that time, because the sampled threads are likely suffering from the same resource starvation or deprioritization.
This moves the compact to the decoder threads.
Assignee: nobody → aosmond
Status: NEW → ASSIGNED
Priority: -- → P3
Whiteboard: [gfx-noted]
Attachment #8967437 - Attachment is obsolete: true
Attachment #8973225 - Flags: review?(jmuizelaar)
Comment on attachment 8973225 [details] [diff] [review]
0001-Bug-1453454-Part-1.-Improve-SourceBuffer-support-for.patch, v1

Review of attachment 8973225 [details] [diff] [review]:
-----------------------------------------------------------------

::: image/SourceBuffer.h
@@ +372,5 @@
>  
> +  /**
> +   * The maximum chunk capacity we'll allocate.
> +   */
> +  static const size_t MAX_CHUNK_CAPACITY = 20971520;

20*1024*1024 looks less magical.

::: image/test/gtest/TestSourceBuffer.cpp
@@ +419,5 @@
>    // allocated.
>    CheckedAdvanceIterator(iterator, 1, 2, length + 1);
>    CheckIteratorIsComplete(iterator, 2, length + 1);
>  }
>  

How about we add a test that we cap the chunk size at 20MB?
Attachment #8973225 - Flags: review?(jmuizelaar) → review+
Comment on attachment 8973225 [details] [diff] [review]
0001-Bug-1453454-Part-1.-Improve-SourceBuffer-support-for.patch, v1

Review of attachment 8973225 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for working on this!

::: image/ImageFactory.cpp
@@ -161,5 @@
>      return;
>    }
>  
> -  // Bound by something reasonable
> -  uint32_t sizeHint = std::min<uint32_t>(aSize, 20000000);

Where does this magic number come from?  Can we document it somewhere?

::: image/SourceBuffer.h
@@ +372,5 @@
>  
> +  /**
> +   * The maximum chunk capacity we'll allocate.
> +   */
> +  static const size_t MAX_CHUNK_CAPACITY = 20971520;

Can we say something about how we arrived at this number?
(In reply to Nathan Froyd [:froydnj] from comment #7)
> Comment on attachment 8973225 [details] [diff] [review]
> 0001-Bug-1453454-Part-1.-Improve-SourceBuffer-support-for.patch, v1
> 
> Review of attachment 8973225 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Thanks for working on this!
> 
> ::: image/ImageFactory.cpp
> @@ -161,5 @@
> >      return;
> >    }
> >  
> > -  // Bound by something reasonable
> > -  uint32_t sizeHint = std::min<uint32_t>(aSize, 20000000);
> 
> Where does this magic number come from?  Can we document it somewhere?
> 
> ::: image/SourceBuffer.h
> @@ +372,5 @@
> >  
> > +  /**
> > +   * The maximum chunk capacity we'll allocate.
> > +   */
> > +  static const size_t MAX_CHUNK_CAPACITY = 20971520;
> 
> Can we say something about how we arrived at this number?

I just rounded 20000000 up to 20MB since we were already doing 4kB rounding. The original number doesn't seem to have any justification behind. My plan was to monitor the telemetry and adjust it upwards if we find we get images split across multiple chunks significantly more often. Currently ~97% of images are placed in a single chunk. I expect it to go up since we will now refuse to reallocate images hit the 20MB cap but hopefully not up too much.
Pushed by aosmond@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/66c63eaf500a
Improve SourceBuffer support for large encoded image data. r=jrmuizel
https://hg.mozilla.org/mozilla-central/rev/66c63eaf500a
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Does the second patch need to land here still?
Flags: needinfo?(aosmond)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #12)
> Does the second patch need to land here still?

Ideally yes. But I suspect what I have landed will fix the most egregious cases of jank that caused the bug to be filed. It may need some tweaking based on the telemetry numbers on nightly this week.
Flags: needinfo?(aosmond)
Attachment #8973228 - Flags: review?(jmuizelaar) → review+
Blocks: 1465775
You need to log in before you can comment on or make changes to this bug.