closing an nsPipeInputStream should delete unused segments from the nsPipe

RESOLVED FIXED in Firefox 38

Status

()

defect
RESOLVED FIXED
4 years ago
2 months ago

People

(Reporter: bkelly, Assigned: bkelly)

Tracking

unspecified
mozilla39
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox38 fixed, firefox39 fixed)

Details

Attachments

(2 attachments, 15 obsolete attachments)

30.06 KB, patch
bkelly
: review+
bkelly
: checkin+
Details | Diff | Splinter Review
8.56 KB, patch
bkelly
: review+
bkelly
: checkin+
Details | Diff | Splinter Review
After bug 1100398, the nsPipe must now check to see if any other input streams are still referencing a segment before deleting it.  Once the slowest input stream finishes with a segment it can finally be deleted.

Currently the code only checks for this when advancing the read cursor from a ReadSegments().  There is, however, another way for a segment to become unused and ready for deletion.  If the slowest input stream is closed, then it should check to see if any segments can be free'd.

If we don't check for segment deletion at close we could in theory deadlock a pipe.  Consider a pipe were stream 1 has read to the end, but stream 2 has not read anything.  The writer blocks because the pipe is full.  Stream 2 is then closed, but its segments are not free'd.  The writer will never be able to write again and stream 1 will be stuck at the end.
Initial patch.  I need to add tests before submitting for review.
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Attachment #8565609 - Attachment is obsolete: true
Attachment #8566053 - Flags: review?(nfroyd)
Comment on attachment 8566053 [details] [diff] [review]
P1 Free buffer resources when an nsPipeInputStream is closed. (v1)

Review of attachment 8566053 [details] [diff] [review]:
-----------------------------------------------------------------

I think you have successfully moved code to AdvanceReadSegment, but I'm not sure about some of your other modifications.

::: xpcom/io/nsPipe3.cpp
@@ +291,5 @@
>    nsresult GetReadSegment(const nsPipeReadState& aReadState,
>                            const char*& aSegment, uint32_t& aSegmentLen);
>    void     AdvanceReadCursor(nsPipeReadState& aReadState, uint32_t aCount,
>                               uint32_t* aAvailableOut);
> +  bool     AdvanceReadSegment(nsPipeReadState& aReadState);

I think this would be a prime candidate for returning a real enum.

@@ +610,5 @@
> +}
> +
> +void
> +nsPipe::DrainInputStream(nsPipeReadState& aReadState, nsPipeEvents& aEvents,
> +                         uint32_t* aAvailableOut)

aAvailableOut doesn't appear to be modified by this function.

@@ +621,5 @@
> +  }
> +
> +  // if we've free'd up a segment, notify output stream that pipe has
> +  // room for a new segment.
> +  if (true && mOutput.OnOutputWritable(aEvents)) {

Um, I assume this is supposed to be |modified && mOutput.OnOutputWritable(aEvents)|?
Attachment #8566053 - Flags: review?(nfroyd) → feedback+
Comment on attachment 8566054 [details] [diff] [review]
P2 Add tests validating nsPipeOutputStream AsyncWait behavior. (v0)

Review of attachment 8566054 [details] [diff] [review]:
-----------------------------------------------------------------

f+ partly because the implementation patch seemed to be incomplete, so I don't know whether the tests are actually good or not.

::: xpcom/tests/gtest/TestPipes.cpp
@@ +666,5 @@
> +  nsCOMPtr<nsIAsyncInputStream> reader;
> +  nsCOMPtr<nsIAsyncOutputStream> writer;
> +
> +  uint32_t segmentSize = 1024;
> +  uint32_t numSegments = 1;

Nit: |const| on these, please.

@@ +669,5 @@
> +  uint32_t segmentSize = 1024;
> +  uint32_t numSegments = 1;
> +
> +  // Use async input streams so we can NS_ConsumeStream() the current data
> +  // while the pipe is still being written to.

Um.  I don't see tests where we intersperse NS_ConsumeStream calls with writing the data.  Is this comment still relevant, or does it need to be rewritten?

@@ +672,5 @@
> +  // Use async input streams so we can NS_ConsumeStream() the current data
> +  // while the pipe is still being written to.
> +  nsresult rv = NS_NewPipe2(getter_AddRefs(reader), getter_AddRefs(writer),
> +                           true, true,  // non-blocking - reader, writer
> +                           segmentSize, numSegments);

Nit: indentation of these continuation lines looks funny.

@@ +704,5 @@
> +  nsCOMPtr<nsIAsyncInputStream> reader;
> +  nsCOMPtr<nsIAsyncOutputStream> writer;
> +
> +  uint32_t segmentSize = 1024;
> +  uint32_t numSegments = 1;

Same comments from previous test apply here and below.  I'd like to figure out some way to refactor this, but I'm not sure it's worth it.
Attachment #8566054 - Flags: review?(nfroyd) → feedback+
Thanks.  I missed some things from when I was debugging another problem with this patch applied.  Also, I found a logic bug that needed fixing after removing the 'true' in that conditional.
Attachment #8566682 - Flags: review?(nfroyd)
Sorry, I don't have time to factor out the common parts of the test (unless you really require it).  I'm trying to get this and a few other patches in before merge next week.
Attachment #8566689 - Flags: review?(nfroyd)
I was so caught up in the fact I posted a patch with a hardcoded 'true', that I forgot to address the enum return value comment.  I'm not convinced this really improves code readability, but here it is.
Attachment #8566925 - Flags: review?(nfroyd)
This is another issue I noticed while converting the bool returns to enums.

Basically, if we only do Notify() in the cases where we want to wake up the writer stream, we run the risk of waking up a peer reader stream that's blocked instead.  This would leave the pipe deadlocked.

To avoid this we should be doing NotifyAll() in all cases.

This is only tangentially related to this bug and patch.  I could open a new bug if you prefer.  Its only a few lines changed, though, so it would be nice just to include it here.
Attachment #8566928 - Flags: review?(nfroyd)
Attachment #8566682 - Flags: review?(nfroyd) → review+
Attachment #8566689 - Flags: review?(nfroyd) → review+
Comment on attachment 8566925 [details] [diff] [review]
P1 interdiff 002 convert bool returns to enums

Review of attachment 8566925 [details] [diff] [review]:
-----------------------------------------------------------------

Hooray.

::: xpcom/io/nsPipe3.cpp
@@ +630,4 @@
>  
>    bool modified = false;
>    while(mWriteSegment >= aReadState.mSegment) {
>      modified = AdvanceReadSegment(aReadState) || modified;

This bit needs to be changed for the new return type of AdvanceReadSegment.
Attachment #8566925 - Flags: review?(nfroyd) → feedback+
Attachment #8566928 - Flags: review?(nfroyd) → review+
Good catch!  Would be nice if enums didn't coerce to boolean values in this case.
Attachment #8567136 - Flags: review?(nfroyd)
Comment on attachment 8567136 [details] [diff] [review]
P1 intediff 004 fix missed block from enum change

Review of attachment 8567136 [details] [diff] [review]:
-----------------------------------------------------------------

I approve of your restraint in not commenting how this error shows that bools are just as convenient. :)

I think you might be able to prohibit implicit coercion by defining them as enum classes.
Attachment #8567136 - Flags: review?(nfroyd) → review+
I see the problem.  I will update patch and push to try shortly.
Somewhat shocked the previous try and gtests didn't trigger this.  Basically we were deleting segments that were still being written to.  I'll run a full try before re-landing.
https://hg.mozilla.org/mozilla-central/rev/27de4b553912
https://hg.mozilla.org/mozilla-central/rev/212080d51fb7
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Depends on: 1136453

Updated

4 years ago
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
So the main issue here is that the pipe input streams do the following in ReadSegments:

1) Lock the pipe
2) Get a segment buffer
3) Unlock the pipe
4) Copy from the segment buffer
5) Lock the pipe
6) Advance the read cursor
7) Unlock the pipe

Problems occur if the pipe is closed during step (4).  The lock is not held in step (4), so the close is not blocked and can modify the pipe state out from under the read.  Previously this was happening with the mAvailable member being zero'd and then underflowed when the read completed.  (I added an assertion which catches this now.)  My previous landing here also deleted the buffer which is of course much worse in this condition.

This patch solves the problem by setting an mActiveRead in the nsPipeReadState.  If there is an active read, then the close knows not to drain the stream yet.  In these cases the completion of the read will trigger the pending drain.

To try to make this safer and easier to understand I've introduced an RAII class to represent read segments.  This is called AutoReadSegment.  It ensures that the active flag is always removed after a segment is no longer used.

This passes local gtests.  I will do a bunch of try runs now as well.
Attachment #8572852 - Flags: review?(nfroyd)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=089e8d0912a3

I'll focus on mac 10.6 mochitest-other retriggers since that is what triggered the assertion in bug 1136453.
Sorry.  I had some unsaved comments in my editor that were supposed to be included in the patch.
Attachment #8572852 - Attachment is obsolete: true
Attachment #8572852 - Flags: review?(nfroyd)
Attachment #8572871 - Flags: review?(nfroyd)
Actually, rather than try to verify with intermittents, I realized we can force the condition with a gtest.  This simulates a close-while-reading condition by calling Close() from the ReadSegments() function handler.

If we do this with only a partial segment write/read, we trigger exactly the assertion hit in bug 1136453.

If we do this with a full segment write/read, we hit the assertion on the line right before.
Attachment #8572969 - Flags: review?(nfroyd)
The gtest revealed there was still an issue in the close-while-reading case.  The segment advancing code expected the current streams state structure to be in the mInputList.  This is not the case if the stream has been closed, though.

Updating the interdiff directly since Nathan has not started reading yet.  With this update the new gtests pass.
Attachment #8572871 - Attachment is obsolete: true
Attachment #8572871 - Flags: review?(nfroyd)
Attachment #8572973 - Flags: review?(nfroyd)
Blocks: 1136453
No longer depends on: 1136453
Some minor fixes for non-debug and windows build issues revealed by try.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8b194c117fe
Attachment #8572969 - Flags: review?(nfroyd) → review+
Comment on attachment 8572973 [details] [diff] [review]
P1 interdiff 006 wait for in-flight reads to complete

Review of attachment 8572973 [details] [diff] [review]:
-----------------------------------------------------------------

OK, so.  We now have public methods in nsPipe that AFAICT are really only supposed to be called by AutoReadSegment.  But AutoReadSegment doesn't really do anything auto-ly, but it does properly trigger things in nsPipe to set flags in nsPipeState that are set in an auto-ly fashion, which nsPipe will then respect?

I feel like this is a wee bit too complicated.  But I'm not sure whether that's a property of the problem being tackled or the particular implementation we have.
Attachment #8574772 - Flags: review?(nfroyd) → review+
Comment on attachment 8572973 [details] [diff] [review]
P1 interdiff 006 wait for in-flight reads to complete

Review of attachment 8572973 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/io/nsPipe3.cpp
@@ +610,5 @@
>      return NS_FAILED(mStatus) ? mStatus : NS_BASE_STREAM_WOULD_BLOCK;
>    }
>  
> +  // The input stream locks the pipe while getting the buffer to read from,
> +  // but then unlocks while actually data copying is taking place.  In

Nit: "...while actual data copying..."
Attachment #8572973 - Flags: review?(nfroyd) → review+
https://hg.mozilla.org/mozilla-central/rev/516fb2d378b0
https://hg.mozilla.org/mozilla-central/rev/76e28af52416
Status: REOPENED → RESOLVED
Last Resolved: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Approval Request Comment
[Feature/regressing bug #]: Caused by 1100398.
[User impact if declined]: Intermittent test failures as in bug 1136453.
[Describe test coverage new/current, TreeHerder]: New gtests in P2 patch that positively confirm the fix.
[Risks and why]: Low risk.
[String/UUID change made/needed]: None.
Attachment #8572973 - Attachment is obsolete: true
Attachment #8574772 - Attachment is obsolete: true
Attachment #8566053 - Attachment is obsolete: true
Attachment #8566682 - Attachment is obsolete: true
Attachment #8566925 - Attachment is obsolete: true
Attachment #8566928 - Attachment is obsolete: true
Attachment #8567136 - Attachment is obsolete: true
Attachment #8567410 - Attachment is obsolete: true
Attachment #8580735 - Flags: review+
Attachment #8580735 - Flags: checkin+
Attachment #8580735 - Flags: approval-mozilla-aurora?
Approval Request Comment:  See P1 request.
Attachment #8566054 - Attachment is obsolete: true
Attachment #8566689 - Attachment is obsolete: true
Attachment #8572969 - Attachment is obsolete: true
Attachment #8573023 - Attachment is obsolete: true
Attachment #8580736 - Flags: review+
Attachment #8580736 - Flags: checkin+
Attachment #8580736 - Flags: approval-mozilla-aurora?
Why do you think it is low risk? The patch is pretty big and touching xpcom.
Moreover, 38 is marked as disabled.
(In reply to Sylvestre Ledru [:sylvestre] from comment #41)
> Why do you think it is low risk? The patch is pretty big and touching xpcom.
> Moreover, 38 is marked as disabled.

The previous patch was backed out and relanded with a fix.  I consider the new patches low risk because we now have tests that exercise the failure more consistently.
I would also note this has had a week of burn-in time on nightly before requesting uplift.
Also worth noting that we do appear to have several crashes (across all Firefox/Thunderbird versions) in the last week in AdvanceReadPipe.  All these crashes are version 36 or before, so they're prior to Ben's work in this area.  Can't say for certain that these patches would fix those, but it seems plausible.
Attachment #8580735 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8580736 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Updated

a year ago
See Also: → 1431646
You need to log in before you can comment on or make changes to this bug.