Closed Bug 1450607 Opened 6 years ago Closed 6 years ago

The connection was reset for partial content

Categories

(Core :: Audio/Video: Playback, defect, P2)

59 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- unaffected
firefox-esr60 61+ fixed
firefox60 - wontfix
firefox61 + fixed
firefox62 + fixed

People

(Reporter: euthanasia_waltz, Assigned: jya)

References

(Blocks 2 open bugs)

Details

(Keywords: regression)

Attachments

(3 files)

STR:
1. Go to http://media.ca7.uscourts.gov/oralArguments/oar.jsp?caseyear=&casenumber=&period=Past+week
2. Click any oral argument

AR:
'The connection was reset' page appears after a while.(within a few minutes)

ER:
Playback to the end.

mozregression:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=0ff4a36cce94d0b9cb64c3ee7204e72749707e5c&tochange=fdb2a41005f2c8ae4d3528c893821f9eb06b3f0c

Former good case, firefox was fetching remained data of partial content. However, for current bad case, firefox is not fetching it.
Blocks: 1415090
Confirming and setting to NEW

Playback will work up to 30-45 seconds before the error occurs.  Reloading the page seems to allow the mp3 to play, I let it run for over 2 mins before the error occurred again.

No errors in the Browser Console
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: regression
Jya: who would be a good person to take a look at this?
No longer blocks: 1415090
Rank: 10
Depends on: 1415090
Flags: needinfo?(jyavenard)
Priority: -- → P2
Blocks: 1454247
(In reply to Nils Ohlmeier [:drno] from comment #2)
> Jya: who would be a good person to take a look at this?

I can't think of any as of today...
Flags: needinfo?(jyavenard)
I'm going to take this one. This looks very serious to me as it would affect any long mp3 played (and likely any long files)

At a glance, the change in bug 1415090, P2 changed from a synchronous seek to an async one. The next call following the OnTransferEnded see a closed stream and abort, while the synchronous seek would have reopened the stream immediately.
Regression reports are typically puts as blocker of the bug that introduced the regression.
Blocks: 1415090
No longer blocks: 1454247
No longer depends on: 1415090
See Also: → 1454247
Assignee: nobody → jyavenard
With bug 1415090 reverted, while I don't get a connection was reset error, playback typically stalls completely.... So it's likely the problem isn't just with 1415090 but with a change that occurred earlier.
STR (not that I can't reproduce it on a linux laptop, but always on my mac pro or in VM)

Go to preferences, select "Privacy & Security" 
Click on "Clear Data", make sure only "Cached Web content" is selected, click on "Clear"

now open: http://media.ca7.uscourts.gov/sound/external/rt.17-2132.17-2132_05_17_2018.mp3

Wait between 1m30s and 4m : the message will appear.

The regression first started with this commit: https://hg.mozilla.org/mozilla-central/rev/94287653dc31

Where ChannelMediaResource::OnStopRequest was changed to perform the call to Seek() synchronously, and instead then called mCacheStream.NotifyDataEnded(aStatus); which in turned called ChannelMediaResource::CacheClientSeek(mChannelOffset, false);

CacheClientSeek queue a task on the main thread to ChannelMediaResource::Seek.

So the regression was introduced as the ChannelMediaResource::Seek is no longer called immediately. Seek closes the http channel to re-open it immediately.

With a rr recording, I placed a breakpoint on ChannelMediaResource::CacheClientSeek and one on the lambda calling seek. I then added a breakpoint to every ChannelMediaResource entries to determine if any calls were made between the time the seek was called and the time it runs: the answer is negative.

Gerald, you last worked with this code, and reviewed JW's changes. I would very much appreciate if you could have a go, this is driving me insane.
Flags: needinfo?(gsquelart)
Blocks: 1464045
[Tracking Requested - why for this release]: this is a serious issue if ever attempting to play long content. should be uplifted.
Comment on attachment 8980268 [details]
Bug 1450607 - P2. Synchronously seek to prepare for resuming following stop request.

https://reviewboard.mozilla.org/r/246428/#review252744

::: dom/media/MediaCache.cpp:2423
(Diff revision 4)
> +Pair<int64_t,int64_t>
> +MediaCacheStream::GetLengthAndOffset() const
> +{
> +  MOZ_ASSERT(NS_IsMainThread());
> +  AutoLock lock(mMediaCache->Monitor());
> +  return Pair<int64_t, int64_t>(mStreamLength, mChannelOffset);

In this case (calling a constructor by explicitly specifying the exact same type as the return type) you could just do `return { mSL, mCO };` for the same effect without redundant type.
Of course, it's a question of taste whether you actually want to show the type again, for extra clarity; so you may leave it as-is if you prefer.
(Sorry if you already knew that.)
Attachment #8980268 - Flags: review?(gsquelart) → review+
Comment on attachment 8980333 [details]
Bug 1450607 - P3. Remove unused argument.

https://reviewboard.mozilla.org/r/246482/#review252748
Attachment #8980333 - Flags: review?(gsquelart) → review+
Flags: needinfo?(gsquelart)
Pushed by jyavenard@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/0ef665046a06
P1. Fix constness. r=gerald
https://hg.mozilla.org/integration/autoland/rev/2fce9c07ec2b
P2. Synchronously seek to prepare for resuming following stop request. r=gerald
https://hg.mozilla.org/integration/autoland/rev/4688aa455ae3
P3. Remove unused argument. r=gerald
Flags: qe-verify+
Comment on attachment 8980267 [details]
Bug 1450607 - P1. Fix constness.

Approval Request Comment
[Feature/Bug causing the regression]: 1415090
[User impact if declined]: Can't play some media files (typically the ones longer than a few minute), the code introduced in bug 1415090 was racy
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: tested manually
[Needs manual test from QE? If yes, steps to reproduce]: The STR were provided in comment 7
[List of other uplifts needed for the feature/fix]: all patches attached to this bug
[Is the change risky?]: likely not.
[Why is the change risky/not risky?]: We revert to an earlier method. 
[String changes made/needed]: none
Attachment #8980267 - Flags: approval-mozilla-esr60?
Attachment #8980267 - Flags: approval-mozilla-beta?
Comment on attachment 8980267 [details]
Bug 1450607 - P1. Fix constness.

Fix for possible playback issues of long media files. Approved for 61.0b10 and ESR 60.1.
Attachment #8980267 - Flags: approval-mozilla-esr60?
Attachment #8980267 - Flags: approval-mozilla-esr60+
Attachment #8980267 - Flags: approval-mozilla-beta?
Attachment #8980267 - Flags: approval-mozilla-beta+
I've tested on Windows 7 x64 and macOS 10.13 using latest Nightly 62.0a1 (2018-05-29) and the CI builds from  https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta and I still reproduce the issue: after 1-2 minutes, 'The connection was reset' page appears. 
I used the STR provided in comment 7. 

Any thoughts?
Flags: needinfo?(jyavenard)
what version of the tree was that build using?
Flags: needinfo?(jyavenard)
I can't reproduce in 62.0a1 (2018-05-30) (64-bit)

2018-05-29 probably doesn't have the fix yet.
I've retested on Windows 7 x64 and Windows 10 x64 using Nightly 62.0a1 (2018-05-30) and Firefox 61 Beta 10 x64 and x32 builds(buildID: 20180530184300) and the issue is still reproducible. 

Please see here how I reproduced the issue: https://imgur.com/BU09E0C .
Thanks for checking on that.

Could you run mozregression on that machine to find out when this broke, and if maybe there's another problem at play (we would have only fixed one)
Flags: needinfo?(camelia.badau)
I kicked out an old Windows 7 VM (32 bits) can't reproduce it in there either...

Could with an earlier release :(
Unfortunately, mozregression can't generate me a pushlog ("Unable to find enogh data to bisect" error).
I found that the last good build is 2017-05-05 (buildID: 20170505030252) and first bad build is 2017-05-06 (buildID: 20170506030204). I hope this helps you.
Flags: needinfo?(camelia.badau)
this is great thank you...

So this is a different regression than bug 1415090 which is what this bug fixed. need to find the culprit now
Flags: needinfo?(jyavenard)
Blocks: 1466569
Hi Jean-Yyves,

Considering your comment 7 and Camelia's comment 28, comment 31 and comment 34, I am unsure what fix should be verified in this issue: for the purpose of reevaluating the need for manual qe verification on this bug I am setting the qe-verify flag to ?.
Flags: qe-verify+ → qe-verify?
So we have two bugs.

The first one was fixed here, the other will be tracked in bug 1466569 (it's a year old bug, and likely was existing before that just hidden)
Flags: needinfo?(jyavenard)
Let's leave qe-verify+ on, and revisit this bug for verification after bug 1466569 is fixed.
Flags: qe-verify? → qe-verify+
Here, I think we wanted to verify the issue in nightly 62 back in May, for more certainty for uplifting to beta 61.  At this point, verifying the issue won't affect our decisions, since the fix already landed in 61 and 62.  So, you may not want to make this a priority for verification.
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: