Closed Bug 1333384 Opened 3 years ago Closed 3 years ago

100% CPU on stream transport thread in nsTemporaryFileInputStream::ReadSegments / nsTemporaryFileInputStream::ReadSegments busy loops if it encounters unexpected end-of-file

Categories

(Core :: Networking, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox52 + fixed
firefox-esr52 --- fixed
firefox53 --- fixed
firefox54 --- fixed

People

(Reporter: mstange, Assigned: baku)

References

Details

(Keywords: power)

Attachments

(1 file)

Yesterday I was noticing that my Firefox Nightly parent process was using 100% CPU when idle. An Instruments profile revealed that one of the stream transport threads was busy in this callstack:

> [...]
> nsThreadPool::Run()
> nsAStreamCopier::Run()
> nsAStreamCopier::Process()
> nsStreamCopierOB::DoCopy(nsresult*, nsresult*)
> nsPipeOutputStream::WriteSegments(nsresult (*)(nsIOutputStream*, void*, char*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*)
> nsStreamCopierOB::FillOutputBuffer(nsIOutputStream*, void*, char*, unsigned int, unsigned int, unsigned int*)
> mozilla::net::nsInputStreamTransport::Read(char*, unsigned int, unsigned int*)
> nsTemporaryFileInputStream::ReadSegments(nsresult (*)(nsIInputStream*, void*, char const*, unsigned int, unsigned int, unsigned int*), void*, unsigned int, unsigned int*)
> (anonymous namespace)::interposedRead(PRFileDesc*, void*, int)
> pt_Read

It was stuck in the loop in nsTemporaryFileInputStream::ReadSegments at http://searchfox.org/mozilla-central/rev/02a56df6474a97cf84d94bbcfaa126979970905d/netwerk/base/nsTemporaryFileInputStream.cpp#84 , calling PR_Read over and over.

I attached a debugger but couldn't find out what file was being read. But I'm pretty sure that PR_Read returned zero bytes every time. The error branches (bytesRead < 0) were not being entered.

I think this indicates that the end of the file had been reached, but that less than "count" bytes had been read. The code limits "count" to mEndPos - mCurPos, so maybe those values have gone out of sync? Is it possible that the file decreased in size after mEndPos has been computed?
I haven't seen this problem since, and I don't know how to reproduce it.
Attached patch fd.patchSplinter Review
It seems to me that we are not handling PR_Read() == 0 correctly. We read a FD, and if PR_Read() returns 0, this means EOF.

Markus, which OK were you using? If it's posix (linux/mac/..) it can happen that the file has been modified in the meantime and it have changed size.
Assignee: nobody → amarchesini
Flags: needinfo?(mstange)
Attachment #8829828 - Flags: review?(bkelly)
I was using macOS.
Flags: needinfo?(mstange)
Comment on attachment 8829828 [details] [diff] [review]
fd.patch

Review of attachment 8829828 [details] [diff] [review]:
-----------------------------------------------------------------

Verified that prio.h states:

  "0 means end of file is reached or the network connection is closed."

Here:

  https://dxr.mozilla.org/mozilla-central/source/nsprpub/pr/include/prio.h#693

LGTM
Attachment #8829828 - Flags: review?(bkelly) → review+
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/86cd474c4894
nsTemporaryFileInputStream::ReadSegments must consider the stream as closed if PR_Read() return 0, r=bkelly
Duplicate of this bug: 1333470
I'm kind of surprised that this was regressed in bug 1005323 and we didn't find it until now.  At any rate, I think we should backport this all the way to beta.
Blocks: 1005323
https://hg.mozilla.org/mozilla-central/rev/86cd474c4894
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
(In reply to :Ehsan Akhgari from comment #7)
> I'm kind of surprised that this was regressed in bug 1005323 and we didn't
> find it until now.  At any rate, I think we should backport this all the way
> to beta.

Andrea can you please request uplift?
Flags: needinfo?(amarchesini)
Comment on attachment 8829828 [details] [diff] [review]
fd.patch

Approval Request Comment
[Feature/Bug causing the regression]: nsTemporaryFileInputStream
[User impact if declined]: 100% of CPU executing PR_Read()
[Is this code covered by automated tests?]: no
[Has the fix been verified in Nightly?]: I think it's racy. Probably it happens when the temporary file is changed locally.
[Needs manual test from QE? If yes, steps to reproduce]: no 
[List of other uplifts needed for the feature/fix]: 0
[Is the change risky?]: no
[Why is the change risky/not risky?]: I just added the check about the return value of PR_Read. If 0 we don't have to continue reading.
[String changes made/needed]: none
Flags: needinfo?(amarchesini)
Attachment #8829828 - Flags: approval-mozilla-aurora?
Modified from Uplift Dashboard.
Comment on attachment 8829828 [details] [diff] [review]
fd.patch

The user impact seems severe if you run into this race condition, Aurora53+
Attachment #8829828 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Shouldn't we uplift to beta too?
Flags: needinfo?(amarchesini)
Comment on attachment 8829828 [details] [diff] [review]
fd.patch

Yes, I agree. See comment 10.
Flags: needinfo?(amarchesini)
Attachment #8829828 - Flags: approval-mozilla-beta?
Comment on attachment 8829828 [details] [diff] [review]
fd.patch

don't go into an infinite loop on EOF, beta52+
Attachment #8829828 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.