Closed Bug 1352642 Opened 7 years ago Closed 5 years ago

Video streaming demo fails to load, uses up all system memory and swap until crash

Categories

(Core :: Graphics: ImageLib, defect, P3)

52 Branch
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: owen, Unassigned)

References

Details

(Keywords: csectype-dos, Whiteboard: [MemShrink:P2][gfx-noted])

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0
Build ID: 20170329150204

Steps to reproduce:

Checked out this repository
  https://github.com/miguelgrinberg/flask-video-streaming

Installed flask
  pip install flask

Run the demo
  python app.py

Open page in firefox
  http://localhost:5000



Actual results:

The page tries to load but instead of displaying a video frame it continues loading while rapidly using up all the system memory, eventually leading to it crashing when the swap space runs out

I tried running Firefox in safe mode with no extensions and the same thing happens.


Expected results:

The video stream should play (simple sequence of three images repeating)

Even if the demo is flawed (it works in Chrome however) it should not be able to use up all the system memory in this way and essentially allows a denial of service on someone's machine by consuming all memory and halting the system by triggering this bug.
Severity: normal → critical
Status: UNCONFIRMED → NEW
Has STR: --- → yes
Ever confirmed: true
Keywords: csectype-dos
OS: Unspecified → All
Product: Firefox → Core
Hardware: Unspecified → All
Whiteboard: [MemShrink]
Assigning a possible related component.
Component: Untriaged → Audio/Video: Playback
Whiteboard: [MemShrink] → [MemShrink:P1]
Can you show me the source code of the page?
Flags: needinfo?(owen)
(In reply to JW Wang [:jwwang] [:jw_wang] from comment #2)
> Can you show me the source code of the page?

  <body>
    <h1>Video Streaming Demonstration</h1>
    <img src="/video_feed">
  </body>



>wget   http://localhost:5000/video_feed
--2017-04-10 16:46:18--  http://localhost:5000/video_feed
Resolving localhost (localhost)... ::1, ::1, 127.0.0.1, ...
Connecting to localhost (localhost)|::1|:5000... failed: Bad file descriptor.
Connecting to localhost (localhost)|::1|:5000... failed: Bad file descriptor.
Connecting to localhost (localhost)|127.0.0.1|:5000... connected.
HTTP request sent, awaiting response... 200 OK
Length: unspecified [multipart/x-mixed-replace]
Saving to: 'video_feed'

video_feed                   [              <=>                  ] 605.83M  40.6MB/s               ^C
Flags: needinfo?(owen)
Component: Audio/Video: Playback → ImageLib
This only happens in e10s mode. And it happens in very old versions (42) too, so it's not related to the recent rewrite of multipart network code.

The app used to send the multipart image sends them as fast as it can with no waiting, and it's local so it sends a lot of data fast. It looks like the main thread of the content process is only serving network related events, not even painting. So the runnables that notify the main thread of a finished decode for every part image don't ever get to run, and they hold a reference to the image. So we keep alive every part image until we run out of memory and crash.

Do we expect network code to monopolize the main thread in this situation? What should we do?
Flags: needinfo?(odvarko)
Perhaps mayhemer knows?

Honza
Flags: needinfo?(odvarko) → needinfo?(honzab.moz)
(In reply to Owen Kaluza from comment #0)
> The page tries to load but instead of displaying a video frame it continues
> loading while rapidly using up all the system memory, eventually leading to
> it crashing when the swap space runs out

Please provide references to crash reports, look for them in about:crashes.  It may tell us what's happening very quickly.  Thanks.
Flags: needinfo?(honzab.moz) → needinfo?(owen)
Sorry I don't have crash reports because Firefox isn't crashing out, it's just allocating memory until the system becomes unresponsive.
Flags: needinfo?(owen)
(In reply to Owen Kaluza from comment #7)
> Sorry I don't have crash reports because Firefox isn't crashing out, it's
> just allocating memory until the system becomes unresponsive.

Aha, ok, it wasn't clear from the comment - "eventually leading to it crashing".

Anyway, looking closer at the report, I don't think this has been so far identified as a networking issues.  I leave this to Media people to triage this bug, as it seems not to be a recent regression and doesn't see to block any Quantum work.
Component: ImageLib → Audio/Video
Please see comment 4. I have debugged this. It's a multipart image. The main thread of the content process does nothing but serve network related events, nothing else gets a chance to run.
Component: Audio/Video → ImageLib
Flags: needinfo?(honzab.moz)
I completely missed that.  Will look at this, but not very soon.
(In reply to Timothy Nikkel (:tnikkel) from comment #4)
> This only happens in e10s mode. And it happens in very old versions (42)
> too, so it's not related to the recent rewrite of multipart network code.
> 
> The app used to send the multipart image sends them as fast as it can with
> no waiting, and it's local so it sends a lot of data fast. It looks like the
> main thread of the content process is only serving network related events,
> not even painting. So the runnables that notify the main thread of a
> finished decode for every part image don't ever get to run, and they hold a
> reference to the image. So we keep alive every part image until we run out
> of memory and crash.
> 
> Do we expect network code to monopolize the main thread in this situation?
> What should we do?

If this is reproducible on a very old versions, before we've added the doc and tab groups for main thread events, it sounds impossible that network events would block other events.  I presume that the decode-done events could be tracked back to come from the net events, right?  OTOH, if the net stream is that fast, we could like super-load the main thread.  Decoders would then just put their results to the end of the queue already filled with network originated events.

This sounds like a good example how we should prioritize main thread events!  The decode-finished events should apparently be given much higher priority.

I haven't looked at this in detail (try to repro) so the above is more of a guess.

(keeping ni on me)
Note that the fix should not happen in the net code.  This a special-case content served from a local host server, hence adding a throttling code just for that is out of the question.  Prioritization is the way to go here.  Still, the buffered data could become pretty huge, but I'm not sure from the top of my head what we could do about it easily.
We do manage to handle this situation just fine in non-e10s mode.
Priority: -- → P3
Whiteboard: [MemShrink:P1] → [MemShrink:P1][gfx-noted]
Depends on: 1360591
Depends on: 1280629
No longer depends on: 1360591
FYI, I don't think 1280629 is going to be fixed for 57.  And this bug is clearly a duplicate of it.
Flags: needinfo?(honzab.moz)
Honza, now that bug 1280629 is fixed should this be closed? Dropping to P2 due to being a localhost-only issue.
Flags: needinfo?(honzab.moz)
Whiteboard: [MemShrink:P1][gfx-noted] → [MemShrink:P2][gfx-noted]
A confirmation would be good first.  Can we ask QA to run the test case here with recent Nightly?
Flags: needinfo?(honzab.moz) → needinfo?(erahm)
STR in comment 0, can we see if this still repros now that the associated bugs have been fixed?
Flags: needinfo?(erahm) → qe-verify+

(In reply to Eric Rahm [:erahm] (Away until 4/23) from comment #17)

STR in comment 0, can we see if this still repros now that the associated
bugs have been fixed?

I tested it in 66.0.3 and no problems now for me on Kubuntu 19.04, memory stable and video frames stream as expected, so that's one data point for what it's worth.

Okay, it sounds like it's fixed.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.