Closed Bug 994877 Opened 11 years ago Closed 10 years ago

Debug mochitest-1 nearly perma-fail in media mochitests

Categories

(Core :: Audio/Video, defect)

defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: RyanVM, Assigned: jwwang)

References

(Depends on 1 open bug)

Details

Attachments

(2 files)

In addition to the frequent leaks reported in bug 994289, since the end of last week, OSX 10.6 debug mochitest-1 has been nearly perma-fail in mochitest, primarily under test_seek.html, test_bug495145.html, and test_replay_metadata.html. The spike is very visible in bugs like bug 762774 and bug 684173. We need this investigated ASAP or we will have to resort to mass test disablings.
Flags: needinfo?(cpearce)
OrangeFactor suggests that this started around April 6 or 7 PDT hg log content/media/ -d ">Apr 4" outputs: changeset: 177651:26d87e24848b user: Chris Pearce <cpearce@mozilla.com> date: Wed Apr 09 16:45:32 2014 +1200 summary: Bug 993003 - Ensure we abort media load if IMFSourceReader creation fails. r=padenot changeset: 177644:c333abd5318d user: Kyle Huey <khuey@kylehuey.com> date: Tue Apr 08 17:26:33 2014 -0700 summary: Back out bug 991812 for bustage on a CLOSED TREE. r=me changeset: 177639:88ee33546b3a user: Kyle Huey <khuey@kylehuey.com> date: Tue Apr 08 16:37:05 2014 -0700 summary: Bug 991812: Remove uses of RefCounted in code that lives solely in Gecko. r=ehsan changeset: 177628:de7487db16d9 user: Boris Zbarsky <bzbarsky@mit.edu> date: Tue Apr 08 18:27:18 2014 -0400 summary: Bug 991742 part 8. Remove the "aScope" argument of WebIDL/nsWrapperCache WrapObject() methods. r=bholley changeset: 177626:c438f7b1d1b5 user: Boris Zbarsky <bzbarsky@mit.edu> date: Tue Apr 08 18:27:17 2014 -0400 summary: Bug 991742 part 6. Remove the "aScope" argument of binding Wrap() methods. r=bholley changeset: 177534:57d7504371af user: Gabriele Svelto <gsvelto@mozilla.com> date: Mon Apr 07 13:20:57 2014 +0200 summary: Bug 988760 - Account extra time since blocking correctly. r=karlt changeset: 177353:a201e70b790e user: Peter Van der Beken <peterv@propagandism.org> date: Mon Apr 07 22:18:53 2014 +0200 summary: Back out 75c95dac7fe0 (bug 984497) and f1b0d3d13755 (bug 990475) to fix bustage on a CLOSED TREE. changeset: 177345:d5b0e9e6a849 user: Brian Hackett <bhackett1024@gmail.com> date: Mon Apr 07 13:04:37 2014 -0700 summary: Bug 987508 - Create array buffers lazily for small typed arrays, r=sfink. changeset: 177342:8b87a6adad14 user: Ryan VanderMeulen <ryanvm@gmail.com> date: Mon Apr 07 15:49:48 2014 -0400 summary: Backed out changeset e35851f07b67 (bug 987508) for non-unified bustage. changeset: 177339:423df46d8d57 user: Randell Jesup <rjesup@jesup.org> date: Mon Apr 07 15:42:01 2014 -0400 summary: Backed out changeset 974c4db3003e (bug 818822) changeset: 177338:670cb6d1750a user: Randell Jesup <rjesup@jesup.org> date: Mon Apr 07 15:40:55 2014 -0400 summary: Backed out changeset 5349ecd9c313 (bug 818822) changeset: 177336:5d7494ed030d user: Randell Jesup <rjesup@jesup.org> date: Mon Apr 07 15:37:56 2014 -0400 summary: Backed out changeset 87f437be7de5 (bug 982490) changeset: 177333:3ae7d42531c7 user: Randell Jesup <rjesup@jesup.org> date: Mon Apr 07 15:37:52 2014 -0400 summary: Backed out changeset e3664615ecbf (bug 694814) changeset: 177332:20aea86b3432 user: Randell Jesup <rjesup@jesup.org> date: Mon Apr 07 15:37:51 2014 -0400 summary: Backed out changeset 74e5c32c6fa2 (bug 694814) changeset: 177331:63be52cd09c5 user: Randell Jesup <rjesup@jesup.org> date: Mon Apr 07 15:37:50 2014 -0400 summary: Backed out changeset 6dc08e9fc7e8 (bug 694814) changeset: 177329:206169eef995 user: Randell Jesup <rjesup@jesup.org> date: Mon Apr 07 15:37:48 2014 -0400 summary: Backed out changeset daf5df0306b2 (bug 985714) changeset: 177322:e35851f07b67 user: Brian Hackett <bhackett1024@gmail.com> date: Mon Apr 07 11:46:54 2014 -0700 summary: Bug 987508 - Create array buffers lazily for small typed arrays, r=sfink. changeset: 177316:0cb71c012f85 user: Randell Jesup <rjesup@jesup.org> date: Mon Apr 07 13:50:28 2014 -0400 summary: Bug 991504 - Temporary assertion removal to fix bustage in AudioSegment r=jesup changeset: 177288:974c4db3003e user: Randell Jesup <rjesup@jesup.org> date: Mon Apr 07 08:48:24 2014 -0400 summary: Bug 818822: Reduce fake audio/video rates on b2g debug only to avoid overloading mochitest emulator VMs r=padenot changeset: 177266:e31ba8d051be user: Matt Woodrow <mwoodrow@mozilla.com> date: Mon Apr 07 15:17:41 2014 +1200 summary: Bug 904890 - Part 4: Enable hardware accelerated video decoding for OMTC+D3D9/11. r=cpearce changeset: 177259:814f77d08ee7 user: Matt Woodrow <mwoodrow@mozilla.com> date: Mon Apr 07 13:32:49 2014 +1200 summary: Bug 991028 - Remove deprecated IPDL SurfaceDescriptor types. r=nical changeset: 177229:2579095d0f7e user: Phil Ringnalda <philringnalda@gmail.com> date: Sun Apr 06 21:21:38 2014 -0700 summary: Backed out 4 changesets (bug 991028) for nonunified bustage changeset: 177225:147581a518c3 user: Matt Woodrow <mwoodrow@mozilla.com> date: Mon Apr 07 13:32:49 2014 +1200 summary: Bug 991028 - Remove deprecated IPDL SurfaceDescriptor types. r=nical changeset: 177108:fcd79d6f4a7e user: Ed Morley <emorley@mozilla.com> date: Fri Apr 04 16:32:19 2014 +0100 summary: Backed out changeset 2ac8fe9a90c5 (bug 948269) for timeouts in gaia-integration tests; CLOSED TREE changeset: 177107:b327711444ed user: Ed Morley <emorley@mozilla.com> date: Fri Apr 04 16:31:44 2014 +0100 summary: Backed out changeset e00d10064639 (bug 948269) changeset: 177060:e00d10064639 user: Matthew Gregan <kinetik@flim.org> date: Fri Apr 04 15:31:10 2014 +1300 summary: Bug 948269 - Remove incorrect assertion from AudioSink::Drain. r=cpearce changeset: 177054:5fb973d5e276 user: Neil Rashbrook <neil@parkwaycc.co.uk> date: Thu Apr 03 23:06:26 2014 +0100 summary: Bug 514280 Only use nsCOMPtr for interfaces r=bsmedberg changeset: 177052:904297de3d1e user: Chris Pearce <cpearce@mozilla.com> date: Fri Apr 04 10:39:42 2014 +1300 summary: Bug 986947 - Make MP3 contained in MP4 playback again on Windows with WMF backend. r=padenot changeset: 177051:9c208ea4d63c user: Chris Pearce <cpearce@mozilla.com> date: Fri Apr 04 10:39:15 2014 +1300 summary: Bug 991448 - Skip Theora decode to next keyframe after seek, so that we don't get visual artifacts after a fastSeek. r=cajbir The only thing that stands out is Bug 991448, but it merged to m-c about a day earlier than the spike started, so I'm hesitant to declare it the cause. Jwwang, are you able to take this?
Flags: needinfo?(cpearce) → needinfo?(jwwang)
test_seek.html might be related to Bug 995090. I am still debugging test_seek.html.
Assignee: nobody → jwwang
Flags: needinfo?(jwwang)
Depends on: 995090
This is really a cross-platform issue. Failure rates on media mochitests (timeouts, shutdown hangs/leaks, etc) are currently extremely high - I've heard it ballparked around 40%. Where do we stand on investigating here? I don't want to start indiscriminately disabling tests, but this is have a significantly negative impact on our overall failure rates.
OS: Mac OS X → All
Hardware: x86_64 → All
Summary: OSX 10.6 debug mochitest-1 nearly perma-fail in media mochitests → Debug mochitest-1 nearly perma-fail in media mochitests
In case JW doesn't notice your question here, ni jw here.
Flags: needinfo?(jwwang)
We have 2 bugs here that could cause timeouts: 1. Bug 995090 2. sometimes timer callbacks fail to fire and cause the MediaDecoderStateMachine stuck which I am still investigating For 1, the bug could be hard to fix according to the current design of MediaResource. The cloned ChannelMediaResource doesn't have its own channel and depends on the cached data downloaded by the original ChannelMediaResource. If the original ChannelMediaResource is destroyed before download complete, there is no way for the cloned ChannelMediaResource to acquire new data. If we create a new channel for the cloned ChannelMediaResource, it will break the purpose of resource caching and break some test cases. Moreover, if the cloned ChannelMediaResource seeks to a position where data is not present, there is no way to notice the original ChannelMediaResource to download the requested data. For 2, it looks like a bug in our nsITimer implementation which I am afraid will have an impact on the overall system. I can try to find a workaround to solve the failures in test cases, but (2) should be worth investigating a bit more which really concern me. Hi Chris, can you share your opinion about (1) since I could be wrong about (1) for I am not so familiar with the MediaResource.
Flags: needinfo?(jwwang) → needinfo?(cpearce)
Can we keep a count on the ChannelMediaResource of the number of clones, and only destroy the ChannelMediaResource when it reaches 0? Roc wrote the MediaResource, so he may have something to say too.
Flags: needinfo?(cpearce) → needinfo?(roc)
Let's discuss that in bug 995090.
Flags: needinfo?(roc)
Disable resource cloning for some test cases that fail due to Bug 995090.
Attachment #8407499 - Flags: review?(cpearce)
Workaround for sometimes timer callback with timeout == 0 doesn't fire.
Attachment #8407500 - Flags: review?(cpearce)
try: https://tbpl.mozilla.org/?tree=Try&rev=a89ab19dfdea No test_seek.html and test_bug495145.html timeouts on OSX 10.6 debug for 50 runs.
If a timer with timeout == 0 isn't firing, that's a bug, a serious bug, and we need to fix it and any fallout, and not wallpaper it or force everyone to 0-check their timer starts. Please spin off a bug on that and CC/needinfo bsmedberg, ehsan, and bz (and me). I'm sure there are others, but that's a start
> 2. sometimes timer callbacks fail to fire and cause the MediaDecoderStateMachine stuck which I am still investigating Please try adding something that logs timers with 0 timeouts (before actually starting them) and logs when they fire. Then we can see if they ever fail to do so, or if it's some other problem.
Comment on attachment 8407499 [details] [diff] [review] part1_disable_resource_clone.patch Review of attachment 8407499 [details] [diff] [review]: ----------------------------------------------------------------- Let's try and fix the underlying issue in bug 995090. We can use this patch if we really need to. Roc should review your patch for bug 995090.
Attachment #8407499 - Flags: review?(cpearce)
Comment on attachment 8407500 [details] [diff] [review] part2_dont_schedule_timeout_0.patch Review of attachment 8407500 [details] [diff] [review]: ----------------------------------------------------------------- I agree with Jesup, a 0 timer should still work, and we should figure out why. This could cause other bugs too.
Attachment #8407500 - Flags: review?(cpearce)
Depends on: 997844
Depends on: 998168
This has been fixed in other bugs.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: