Closed
Bug 994877
Opened 11 years ago
Closed 10 years ago
Debug mochitest-1 nearly perma-fail in media mochitests
Categories
(Core :: Audio/Video, defect)
Core
Audio/Video
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: RyanVM, Assigned: jwwang)
References
(Depends on 1 open bug)
Details
Attachments
(2 files)
5.91 KB,
patch
|
Details | Diff | Splinter Review | |
4.26 KB,
patch
|
Details | Diff | Splinter Review |
In addition to the frequent leaks reported in bug 994289, since the end of last week, OSX 10.6 debug mochitest-1 has been nearly perma-fail in mochitest, primarily under test_seek.html, test_bug495145.html, and test_replay_metadata.html.
The spike is very visible in bugs like bug 762774 and bug 684173. We need this investigated ASAP or we will have to resort to mass test disablings.
Flags: needinfo?(cpearce)
Comment 1•11 years ago
|
||
OrangeFactor suggests that this started around April 6 or 7 PDT
hg log content/media/ -d ">Apr 4" outputs:
changeset: 177651:26d87e24848b
user: Chris Pearce <cpearce@mozilla.com>
date: Wed Apr 09 16:45:32 2014 +1200
summary: Bug 993003 - Ensure we abort media load if IMFSourceReader creation fails. r=padenot
changeset: 177644:c333abd5318d
user: Kyle Huey <khuey@kylehuey.com>
date: Tue Apr 08 17:26:33 2014 -0700
summary: Back out bug 991812 for bustage on a CLOSED TREE. r=me
changeset: 177639:88ee33546b3a
user: Kyle Huey <khuey@kylehuey.com>
date: Tue Apr 08 16:37:05 2014 -0700
summary: Bug 991812: Remove uses of RefCounted in code that lives solely in Gecko. r=ehsan
changeset: 177628:de7487db16d9
user: Boris Zbarsky <bzbarsky@mit.edu>
date: Tue Apr 08 18:27:18 2014 -0400
summary: Bug 991742 part 8. Remove the "aScope" argument of WebIDL/nsWrapperCache WrapObject() methods. r=bholley
changeset: 177626:c438f7b1d1b5
user: Boris Zbarsky <bzbarsky@mit.edu>
date: Tue Apr 08 18:27:17 2014 -0400
summary: Bug 991742 part 6. Remove the "aScope" argument of binding Wrap() methods. r=bholley
changeset: 177534:57d7504371af
user: Gabriele Svelto <gsvelto@mozilla.com>
date: Mon Apr 07 13:20:57 2014 +0200
summary: Bug 988760 - Account extra time since blocking correctly. r=karlt
changeset: 177353:a201e70b790e
user: Peter Van der Beken <peterv@propagandism.org>
date: Mon Apr 07 22:18:53 2014 +0200
summary: Back out 75c95dac7fe0 (bug 984497) and f1b0d3d13755 (bug 990475) to fix bustage on a CLOSED TREE.
changeset: 177345:d5b0e9e6a849
user: Brian Hackett <bhackett1024@gmail.com>
date: Mon Apr 07 13:04:37 2014 -0700
summary: Bug 987508 - Create array buffers lazily for small typed arrays, r=sfink.
changeset: 177342:8b87a6adad14
user: Ryan VanderMeulen <ryanvm@gmail.com>
date: Mon Apr 07 15:49:48 2014 -0400
summary: Backed out changeset e35851f07b67 (bug 987508) for non-unified bustage.
changeset: 177339:423df46d8d57
user: Randell Jesup <rjesup@jesup.org>
date: Mon Apr 07 15:42:01 2014 -0400
summary: Backed out changeset 974c4db3003e (bug 818822)
changeset: 177338:670cb6d1750a
user: Randell Jesup <rjesup@jesup.org>
date: Mon Apr 07 15:40:55 2014 -0400
summary: Backed out changeset 5349ecd9c313 (bug 818822)
changeset: 177336:5d7494ed030d
user: Randell Jesup <rjesup@jesup.org>
date: Mon Apr 07 15:37:56 2014 -0400
summary: Backed out changeset 87f437be7de5 (bug 982490)
changeset: 177333:3ae7d42531c7
user: Randell Jesup <rjesup@jesup.org>
date: Mon Apr 07 15:37:52 2014 -0400
summary: Backed out changeset e3664615ecbf (bug 694814)
changeset: 177332:20aea86b3432
user: Randell Jesup <rjesup@jesup.org>
date: Mon Apr 07 15:37:51 2014 -0400
summary: Backed out changeset 74e5c32c6fa2 (bug 694814)
changeset: 177331:63be52cd09c5
user: Randell Jesup <rjesup@jesup.org>
date: Mon Apr 07 15:37:50 2014 -0400
summary: Backed out changeset 6dc08e9fc7e8 (bug 694814)
changeset: 177329:206169eef995
user: Randell Jesup <rjesup@jesup.org>
date: Mon Apr 07 15:37:48 2014 -0400
summary: Backed out changeset daf5df0306b2 (bug 985714)
changeset: 177322:e35851f07b67
user: Brian Hackett <bhackett1024@gmail.com>
date: Mon Apr 07 11:46:54 2014 -0700
summary: Bug 987508 - Create array buffers lazily for small typed arrays, r=sfink.
changeset: 177316:0cb71c012f85
user: Randell Jesup <rjesup@jesup.org>
date: Mon Apr 07 13:50:28 2014 -0400
summary: Bug 991504 - Temporary assertion removal to fix bustage in AudioSegment r=jesup
changeset: 177288:974c4db3003e
user: Randell Jesup <rjesup@jesup.org>
date: Mon Apr 07 08:48:24 2014 -0400
summary: Bug 818822: Reduce fake audio/video rates on b2g debug only to avoid overloading mochitest emulator VMs r=padenot
changeset: 177266:e31ba8d051be
user: Matt Woodrow <mwoodrow@mozilla.com>
date: Mon Apr 07 15:17:41 2014 +1200
summary: Bug 904890 - Part 4: Enable hardware accelerated video decoding for OMTC+D3D9/11. r=cpearce
changeset: 177259:814f77d08ee7
user: Matt Woodrow <mwoodrow@mozilla.com>
date: Mon Apr 07 13:32:49 2014 +1200
summary: Bug 991028 - Remove deprecated IPDL SurfaceDescriptor types. r=nical
changeset: 177229:2579095d0f7e
user: Phil Ringnalda <philringnalda@gmail.com>
date: Sun Apr 06 21:21:38 2014 -0700
summary: Backed out 4 changesets (bug 991028) for nonunified bustage
changeset: 177225:147581a518c3
user: Matt Woodrow <mwoodrow@mozilla.com>
date: Mon Apr 07 13:32:49 2014 +1200
summary: Bug 991028 - Remove deprecated IPDL SurfaceDescriptor types. r=nical
changeset: 177108:fcd79d6f4a7e
user: Ed Morley <emorley@mozilla.com>
date: Fri Apr 04 16:32:19 2014 +0100
summary: Backed out changeset 2ac8fe9a90c5 (bug 948269) for timeouts in gaia-integration tests; CLOSED TREE
changeset: 177107:b327711444ed
user: Ed Morley <emorley@mozilla.com>
date: Fri Apr 04 16:31:44 2014 +0100
summary: Backed out changeset e00d10064639 (bug 948269)
changeset: 177060:e00d10064639
user: Matthew Gregan <kinetik@flim.org>
date: Fri Apr 04 15:31:10 2014 +1300
summary: Bug 948269 - Remove incorrect assertion from AudioSink::Drain. r=cpearce
changeset: 177054:5fb973d5e276
user: Neil Rashbrook <neil@parkwaycc.co.uk>
date: Thu Apr 03 23:06:26 2014 +0100
summary: Bug 514280 Only use nsCOMPtr for interfaces r=bsmedberg
changeset: 177052:904297de3d1e
user: Chris Pearce <cpearce@mozilla.com>
date: Fri Apr 04 10:39:42 2014 +1300
summary: Bug 986947 - Make MP3 contained in MP4 playback again on Windows with WMF backend. r=padenot
changeset: 177051:9c208ea4d63c
user: Chris Pearce <cpearce@mozilla.com>
date: Fri Apr 04 10:39:15 2014 +1300
summary: Bug 991448 - Skip Theora decode to next keyframe after seek, so that we don't get visual artifacts after a fastSeek. r=cajbir
The only thing that stands out is Bug 991448, but it merged to m-c about a day earlier than the spike started, so I'm hesitant to declare it the cause.
Jwwang, are you able to take this?
Flags: needinfo?(cpearce) → needinfo?(jwwang)
Assignee | ||
Comment 2•11 years ago
|
||
test_seek.html might be related to Bug 995090. I am still debugging test_seek.html.
Assignee: nobody → jwwang
Flags: needinfo?(jwwang)
Reporter | ||
Comment 3•11 years ago
|
||
This is really a cross-platform issue. Failure rates on media mochitests (timeouts, shutdown hangs/leaks, etc) are currently extremely high - I've heard it ballparked around 40%. Where do we stand on investigating here? I don't want to start indiscriminately disabling tests, but this is have a significantly negative impact on our overall failure rates.
OS: Mac OS X → All
Hardware: x86_64 → All
Summary: OSX 10.6 debug mochitest-1 nearly perma-fail in media mochitests → Debug mochitest-1 nearly perma-fail in media mochitests
In case JW doesn't notice your question here, ni jw here.
Flags: needinfo?(jwwang)
Assignee | ||
Comment 5•11 years ago
|
||
We have 2 bugs here that could cause timeouts:
1. Bug 995090
2. sometimes timer callbacks fail to fire and cause the MediaDecoderStateMachine stuck which I am still investigating
For 1, the bug could be hard to fix according to the current design of MediaResource. The cloned ChannelMediaResource doesn't have its own channel and depends on the cached data downloaded by the original ChannelMediaResource. If the original ChannelMediaResource is destroyed before download complete, there is no way for the cloned ChannelMediaResource to acquire new data. If we create a new channel for the cloned ChannelMediaResource, it will break the purpose of resource caching and break some test cases. Moreover, if the cloned ChannelMediaResource seeks to a position where data is not present, there is no way to notice the original ChannelMediaResource to download the requested data.
For 2, it looks like a bug in our nsITimer implementation which I am afraid will have an impact on the overall system.
I can try to find a workaround to solve the failures in test cases, but (2) should be worth investigating a bit more which really concern me.
Hi Chris, can you share your opinion about (1) since I could be wrong about (1) for I am not so familiar with the MediaResource.
Flags: needinfo?(jwwang) → needinfo?(cpearce)
Comment 6•11 years ago
|
||
Can we keep a count on the ChannelMediaResource of the number of clones, and only destroy the ChannelMediaResource when it reaches 0?
Roc wrote the MediaResource, so he may have something to say too.
Flags: needinfo?(cpearce) → needinfo?(roc)
Let's discuss that in bug 995090.
Flags: needinfo?(roc)
Assignee | ||
Comment 8•11 years ago
|
||
Disable resource cloning for some test cases that fail due to Bug 995090.
Attachment #8407499 -
Flags: review?(cpearce)
Assignee | ||
Comment 9•11 years ago
|
||
Workaround for sometimes timer callback with timeout == 0 doesn't fire.
Attachment #8407500 -
Flags: review?(cpearce)
Assignee | ||
Comment 10•11 years ago
|
||
try: https://tbpl.mozilla.org/?tree=Try&rev=a89ab19dfdea
No test_seek.html and test_bug495145.html timeouts on OSX 10.6 debug for 50 runs.
Comment 11•11 years ago
|
||
If a timer with timeout == 0 isn't firing, that's a bug, a serious bug, and we need to fix it and any fallout, and not wallpaper it or force everyone to 0-check their timer starts.
Please spin off a bug on that and CC/needinfo bsmedberg, ehsan, and bz (and me). I'm sure there are others, but that's a start
Comment 12•11 years ago
|
||
> 2. sometimes timer callbacks fail to fire and cause the MediaDecoderStateMachine stuck which I am still investigating
Please try adding something that logs timers with 0 timeouts (before actually starting them) and logs when they fire. Then we can see if they ever fail to do so, or if it's some other problem.
Comment 13•11 years ago
|
||
Comment on attachment 8407499 [details] [diff] [review]
part1_disable_resource_clone.patch
Review of attachment 8407499 [details] [diff] [review]:
-----------------------------------------------------------------
Let's try and fix the underlying issue in bug 995090. We can use this patch if we really need to.
Roc should review your patch for bug 995090.
Attachment #8407499 -
Flags: review?(cpearce)
Comment 14•11 years ago
|
||
Comment on attachment 8407500 [details] [diff] [review]
part2_dont_schedule_timeout_0.patch
Review of attachment 8407500 [details] [diff] [review]:
-----------------------------------------------------------------
I agree with Jesup, a 0 timer should still work, and we should figure out why. This could cause other bugs too.
Attachment #8407500 -
Flags: review?(cpearce)
Assignee | ||
Comment 15•10 years ago
|
||
This has been fixed in other bugs.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•