Open Bug 798300 Opened 8 years ago Updated 2 years ago

[mozprocess] mozprocess intermingles stdout/stderr poorly on Windows

Categories

(Testing :: Mozbase, defect)

All
Windows 7
defect
Not set
normal

Tracking

(Not tracked)

People

(Reporter: ted, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [mozprocess])

Attachments

(2 files, 2 obsolete files)

We have this same bug in our automation.py harnesses, so we might as well fix it in mozprocess and just switch them to use mozprocess.

The attached patch contains a unit test for mozprocess that demonstrates the problem.

The root cause here is that Windows only has two file handle buffering modes: character-based and block-based (4k blocks). There's no line-buffered mode. Therefore any time you connect a processes' output to a pipe you get block-buffering. If you connect stdout and stderr to the same pipe, you get them in 4k chunks and not by-line as you would on Linux.

The only way I can think to solve this is to pipe stdout and stderr to separate handles, use the same PeekNamedPipe stuff we already do to watch both handles, and do the line-buffering ourselves.
See also bug 794984. I would love to be able to differentiate between stderr and stdout *while preserving order*. If you need to line buffer before emitting, I'll accept that. A separate mode for line buffered and completely unbuffered would be nice to have as well.
Blocks: 881421
Summary: mozprocess intermingles stdout/stderr poorly on Windows → [mozprocess] mozprocess intermingles stdout/stderr poorly on Windows
Whiteboard: [mozprocess]
bug 794984 is now resolved, so the streams will be separated if you specify a processStderrOutputLine callback. This will works like explained here: bug 875508 comment 5.

Does this helps to resolve the bug ?
If you use the new callback then yes, that would work. However, I wonder if we can tweak your implementation slightly so that this works in the default case? I think this might seriously be a two-line fix, if you just:
https://hg.mozilla.org/mozilla-central/annotate/dcc86f78d75e/testing/mozbase/mozprocess/mozprocess/processhandler.py#l636

1) Always pass stderr=subprocess.PIPE
2) Change that "if processStderrLine" to instead be:
if not processStderrLine:
    processStderrLine = processStdoutLine

That way if the caller doesn't pass a stderr handler we still split stdout/stderr to separate pipes but just pass it all to the stdout handler.

Does that sound sensible?
Yep, sounds good for me.

Note that we will use three threads everytime then (for now when processStderrLine was empty, there werer only two used) - but I suppose it won't make a big difference.
So we talked about this with :ted on irc, and this bug also occurs on posix. The fix is to separate stdout / stderr streams in every case, as Ted proposed in comment 4.
Assignee: nobody → j.parkouss
Status: NEW → ASSIGNED
Attachment #8583787 - Flags: review?(ted)
Attached file MozReview Request: bz://798300/ahal (obsolete) —
/r/6133 - Bug 798300 - Fix kill and get stack regression with b2g devices/emulators, r=davehunt

Pull down this commit:

hg pull review -r d6fb638fe98234ee1c345126769ff56371c2a047
Attachment #8583801 - Flags: review?(dave.hunt)
Comment on attachment 8583801 [details]
MozReview Request: bz://798300/ahal

Sorry, put the wrong bug number in the commit message.
Attachment #8583801 - Attachment is obsolete: true
Attachment #8583801 - Flags: review?(dave.hunt)
Comment on attachment 8583787 [details] [diff] [review]
mozprocess sometimes mixes stdout/stderr

Review of attachment 8583787 [details] [diff] [review]:
-----------------------------------------------------------------

Such a simple patch compared to the other one!

::: testing/mozbase/mozprocess/tests/test_mozprocess_output.py
@@ +67,5 @@
>  
> +    def test_stdout_stderr_mixing(self):
> +        """
> +        Test that redirecting stdout and stderr to the same pipe
> +        doesn't result in them being intermingled per-line.

This isn't really an accurate description since your patch is splitting them to separate pipes. Maybe just say "Test that stdout and stderr don't get intermingled within a line."

@@ +69,5 @@
> +        """
> +        Test that redirecting stdout and stderr to the same pipe
> +        doesn't result in them being intermingled per-line.
> +        """
> +        got_output = [False]

I noticed this same silly thing when I ran the test locally (Python scoping!) You could use self.got_output if you don't want to use a list.
Attachment #8583787 - Flags: review?(ted) → review+
Carrying myself the r+ from previous patch reviewed by Ted (just fixed the test docstring).
Attachment #8583787 - Attachment is obsolete: true
Attachment #8584494 - Flags: review+
Ok, try is green.
Keywords: checkin-needed
Backed out for causing semi-frequent mochitest-bc/dt window leaks. Seems unplausible, I know, but retriggers are pretty strongly pointing to this push as when it started :(
https://hg.mozilla.org/integration/mozilla-inbound/rev/7dc91f45e235

https://treeherder.mozilla.org/logviewer.html#?job_id=8243495&repo=mozilla-inbound
https://treeherder.mozilla.org/logviewer.html#?job_id=8226618&repo=mozilla-inbound
Retriggers on the backout confirm that the leaks are gone.
I haven't looked at the logs, but we shouldn't rule out the possibility the previous code was buggy and not always printing all output, thus masking an existing leak failure. (This could also be what's happening in the new code!)
If I look at https://treeherder.mozilla.org/logviewer.html#?job_id=8226618&repo=mozilla-inbound and grep for one of the "leaking" domwindows by "[pid = 4951] [serial = 189]", there's both:

> 05:11:33 INFO - ++DOMWINDOW == 22 (0x11e7ad800) [pid = 4951] [serial = 189] [outer = 0x0] 

and

> 05:11:40 INFO - --DOMWINDOW == 1 (0x11e7ad800) [pid = 4951] [serial = 189] [outer = 0x0] [url = about:blank]

but between those the test ends, and I see:

> Completed ShutdownLeaks collections in process 4951

The ++/-- DOMWINDOW stuff comes from call like this: https://hg.mozilla.org/mozilla-central/file/8bc7561d7557/dom/base/nsGlobalWindow.cpp#l1296 going to stderr, so if this patch is changing output interleaving to see stderr later than before, it could be creating false positives from the ShutdownLeaks parser.
So, I would submit that this is a bug on the part of the shutdownleaks code. The actual DOMWINDOW logging goes to stderr, but that "Completed" method goes to stdout. This patch clearly changes the ordering of stdout and stderr messages slightly, but the messages on a single topic should be going to the same stream. I don't know if we have a straightforward way to log to stderr from chrome JS, but that doesn't seem impossible to fix.
Depends on: 1151515
chmanchester did a try run and pointed out that switching the "completed" log to stdout doesn't completely fix the problem because the ++/--DOMWINDOW log lines go to stderr but the TEST-{START,END} lines to go stdout.

What a hassle.
The run with the "completed" line logged to stderr is here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=369eabe33a48

I'm testing out the ++/-- stuff going to stdout here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a3f8a0c9c5a
That second run came back green with retriggers. As ted pointed out on irc, printf_stderr also has special behavior for Android we would want to emulate, and we'd have to get consensus from other tools that might be reading the ++/-- output, but this could be a way to go.
(In reply to Chris Manchester [:chmanchester] from comment #21)
> That second run came back green with retriggers. As ted pointed out on irc,
> printf_stderr also has special behavior for Android we would want to
> emulate, and we'd have to get consensus from other tools that might be
> reading the ++/-- output, but this could be a way to go.

Whats the ETA to get this fixed? days, weeks?
(In reply to Gregor Wagner [:gwagner] from comment #22)
> (In reply to Chris Manchester [:chmanchester] from comment #21)
> > That second run came back green with retriggers. As ted pointed out on irc,
> > printf_stderr also has special behavior for Android we would want to
> > emulate, and we'd have to get consensus from other tools that might be
> > reading the ++/-- output, but this could be a way to go.
> 
> Whats the ETA to get this fixed? days, weeks?

I'm not actively pursuing it, but I don't think I'm up to speed on what this is blocking. Are you interested in this because of bug 1140394 ?
(In reply to Chris Manchester [:chmanchester] from comment #23)
> (In reply to Gregor Wagner [:gwagner] from comment #22)
> > (In reply to Chris Manchester [:chmanchester] from comment #21)
> > > That second run came back green with retriggers. As ted pointed out on irc,
> > > printf_stderr also has special behavior for Android we would want to
> > > emulate, and we'd have to get consensus from other tools that might be
> > > reading the ++/-- output, but this could be a way to go.
> > 
> > Whats the ETA to get this fixed? days, weeks?
> 
> I'm not actively pursuing it, but I don't think I'm up to speed on what this
> is blocking. Are you interested in this because of bug 1140394 ?

Yes.
(In reply to Chris Manchester [:chmanchester] from comment #21)
> That second run came back green with retriggers. As ted pointed out on irc,
> printf_stderr also has special behavior for Android we would want to
> emulate, and we'd have to get consensus from other tools that might be
> reading the ++/-- output, but this could be a way to go.

I was trying to figure out why this is written to stderr instead of stdout in the first place and found the discussion in https://bugzilla.mozilla.org/show_bug.cgi?id=934091#c4 and below. This suggests that printing to stderr was a hard requirement to getting the shutdown leaks detector working with multiple processes, so this seems like a total non-starter.
Removing myself as assignee since we have no solution for this yet.
Assignee: j.parkouss → nobody
Status: ASSIGNED → NEW
Just for the record, since this came up in discussion on IRC again today (as we looked at a log from a Windows Try push with unreadable intermingled output), the crux of the issue is that the Visual C++ CRT does not implement `setvbuf`'s `_IOLBF` mode, which is supposed to offer line buffering:
https://docs.microsoft.com/en-us/cpp/c-runtime-library/reference/setvbuf?view=vs-2017#remarks
You need to log in before you can comment on or make changes to this bug.