Closed Bug 984194 Opened 10 years ago Closed 10 years ago

Perma-orange on Thunderbird-Trunk: TEST-UNEXPECTED-FAIL | /builds/slave/test/build/xpcshell/tests/uriloader/exthandler/tests/unit_ipc/test_encoding.js | false == true

Categories

(Core :: Networking: HTTP, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33
Tracking Status
firefox31 --- fixed
firefox32 --- fixed
firefox33 --- fixed
firefox-esr24 --- unaffected

People

(Reporter: jcranmer, Assigned: mkmelin)

References

Details

(Keywords: intermittent-failure)

Attachments

(3 files, 4 obsolete files)

hg bisect indicates that http://hg.mozilla.org/mozilla-central/rev/cfd08d9f7ffb is to blame.

Some attempt at diagnosing the failure locally seems to indicate that the resulting output in the failing line is the gzip output instead of the decompressed contents.

<https://tbpl.mozilla.org/php/getParsedLog.php?id=36239807&tree=Thunderbird-Try&full=1#error2> is an example error, although this test file has really bad error reporting (do_check_true(x == y)? seriously?)
We need to fix bug 907732 and set MOZ_JSDOWNLOADS=1 in mail/confvars.sh, or test_encoding.js should be removed from mozilla-central?
Is it possible to suggest/patch that test_encoding.js is skipped if MOZ_JSDOWNLOADS is not set?
(In reply to Mark Banner (:standard8) from comment #9)
> Is it possible to suggest/patch that test_encoding.js is skipped if
> MOZ_JSDOWNLOADS is not set?

It can not be because the test is already skipped if MOZ_JSDOWNLOADS is *set*.
The failure test case in test_encodings.js now fails if MOZ_JSDOWNLOADS is not set. So the test case is useless now.
Attachment #8395115 - Flags: review?(paolo.mozmail)
Doesn't the test failure mean file download is actually broken in Thunderbird?
File download in Thunderbird has been broken before the blame for the test failure had checked in.

The breakage is still there without the blame commit.
Mike, can you shed some light on what file download is broken in TB, or if the test in question even applies to TB
Flags: needinfo?(mconley)
Looking at this, I actually think Paolo is the better choice.

I worked on the Download Panel for Firefox, but the whole JS downloads backend switchover was handled by Paolo, Enn and others.
Flags: needinfo?(mconley) → needinfo?(paolo.mozmail)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #16)
> File download in Thunderbird has been broken before the blame for the test
> failure had checked in.
> 
> The breakage is still there without the blame commit.

Why the test didn't catch the failure before the blame commit?
As far as I know, downloading link in HTML message has been broken.
Comment on attachment 8395115 [details] [diff] [review]
Remove the failure test in test_encodins.js

This seems like a real regression with the code rather than a test issue.

I don't know why this does not happen on m-c, but maybe the tests in the "unit_ipc" folder are disabled or executed differently? Felipe, are you
familiar with the e10s testing done here?

Off the top of my head, this might be a problem with transferring some characteristics of the channel when the "divertion" is applied, like the channel encoding detection or the suspend count (given the warning right before the failure). Boris or Reuben might know more.
Attachment #8395115 - Flags: review?(paolo.mozmail)
Flags: needinfo?(reuben.bmo)
Flags: needinfo?(paolo.mozmail)
Flags: needinfo?(felipc)
Flags: needinfo?(bzbarsky)
Ah, nevermind for the reason test is not executed. I've just seen it's skipped on Android, that is the remaining m-c product where MOZ_JSDOWNLOADS is not used.

We'll need a new test if this is confirmed to be a real regression. Actually, we might need to set up proper e10s tests for downloads started by content.
Flags: needinfo?(felipc)
Comment on attachment 8395115 [details] [diff] [review]
Remove the failure test in test_encodins.js

Marking this patch with r- to record that I've already looked into this and this failing test probably refers to a real failure.
Attachment #8395115 - Flags: review-
(In reply to Hiroyuki Ikezoe (:hiro) from comment #21)
> As far as I know, downloading link in HTML message has been broken.

I far as I know links such as http://ftp.mozilla.org/pub/mozilla.org/thunderbird/nightly/latest-comm-central/thunderbird-31.0a1.en-US.win64-x86_64.zip Will download fine (Rightclick>>Save Link AS) on current trunk TB and win7 If there are other formats that are broken, please give an example, so that a bug can be filed on Thunderbird itself.
(In reply to Hiroyuki Ikezoe (:hiro) from comment #21)
> As far as I know, downloading link in HTML message has been broken.

Are you saying this bug is not a regression from bug 946239?
I'm not going to be much help here, sorry.  :(
Flags: needinfo?(bzbarsky)
(In reply to Hiroyuki Ikezoe (:hiro) from comment #21)
> As far as I know, downloading link in HTML message has been broken.

I can't reproduce it either.
Also works if gzip compression in used - at least dl of http://checkgzipcompression.com/ saves a decompressed file.
This bug is unlikely to have regressed the behavior in Thunderbird - but this automated Thunderbird test might actually have detected a Firefox OS regression.
Does anyone have an example of a file that is served with HTTP gzip compression and isn't handled by the browser?
Flags: needinfo?(reuben.bmo)
So for reference, the failure is at http://mxr.mozilla.org/comm-central/source/mozilla/uriloader/exthandler/tests/unit_ipc/test_encoding.js#258 where str=31,139,8,0,22,90,138,72,2,3,43,73,45,46,225,2,0,198,53,185,59,5,0,0,0 and decodedBody=116,101,115,116,10

testResponse3 (which fails) isn't documented like the others, but I assume it should decode test3.txt. However it disregards the "Content-Encoding: gzip" header.

And yes, I can confirm it is fallout from bug 946239.
If I modify the test not to shut down the server (and not to throw), and go to http://localhost:4444/test3.txt everything is still fine both in firefox and in thunderbird trunk AFAIKT.
In the log for the channel child (which occurs in the other process), we get entries like this:

0[c2d130]: HttpChannelChild::AsyncOpen [this=31d28a8 uri=http://localhost:4444/test3.txt]
0[c2d130]: HttpBaseChannel::SetRequestHeader [this=31d28c8 header="Cookie" value="" merge=0]
0[c2d130]: HttpChannelChild::OnStatus [this=31d28a8 status=804b000a]
0[c2d130]: HttpChannelChild::RecvOnStartRequest [this=31d28a8]
0[c2d130]: HttpBaseChannel::ApplyContentConversions [this=31d28c8]
0[c2d130]: Unexpected failure of AsyncConvertData gzip

I believe that is why this is failing. Haven't figured out yet how to get my debugger to talk to the alternate process to trace the error further.
The "Unexpected failure of AsyncConvertData gzip" occurs because mListener in HttpBaseChannel.cpp is null at this call (line 634):

      rv = serv->AsyncConvertData(from.get(),
                                  "uncompressed",
                                  mListener,
                                  mListenerContext,
                                  getter_AddRefs(converter));
Any ideas of what we can do to solve this (and by that I mean get rid of the permanent orange that TB builds are seeing)? As far as I can tell, this is a real problem that affects Firefox OS, but probably not Thunderbird, but it is the Thunderbird builds that are running the test. TB developers do not have the expertise nor motivation to fix this.
Can we disable this test on TB? From the comments in this bug it sounds like there isn't a TB error, and we also haven't seen B2G errors. This test runs and passes there…
Attached patch bug984194_disable_enc_test.patch (obsolete) — Splinter Review
Disabling it like this would seem to work at least locally.
Attached patch An attempt to fix the failure (obsolete) — Splinter Review
The root cause of this failure is that SuspendForDiversion() is too late to suspend HTTP channel. (In other words nsHttpChannel::OnDataAvailable and ::OnStopRequest had been proceeded before SuspendForDiversion.)

This patch is an attempt to fix the root cause. The patch fixes the failure of test_encoding.js.
Attachment #8395115 - Attachment is obsolete: true
Attachment #8415238 - Flags: feedback?(jduell.mcbugs)
Comment on attachment 8415238 [details] [diff] [review]
An attempt to fix the failure

Review of attachment 8415238 [details] [diff] [review]:
-----------------------------------------------------------------

Steve, can you take a first look at this?
Attachment #8415238 - Flags: feedback?(jduell.mcbugs) → feedback?(sworkman)
Comment on attachment 8415238 [details] [diff] [review]
An attempt to fix the failure

Review of attachment 8415238 [details] [diff] [review]:
-----------------------------------------------------------------

I think you're on the right track here, but I have a modification suggested below. Take a look and let me know if it makes sense.

::: netwerk/protocol/http/HttpChannelParent.cpp
@@ +641,5 @@
>    {
>      return NS_ERROR_UNEXPECTED;
>    }
> +
> +  mChannel->Suspend();

So, this suspends the channel in all cases until RecvOnStartRequestComplete is called. This means that OnStartRequest will be enqueued in the IPDL queue, but OnData and OnStop will not be enqueued until the child responds with SendOnStartRequestComplete.

Current behavior is to enqueue OnStart, OnData and OnStop events and let the child dequeue them asynchrnously. As I have understood it, this is for performance reasons.

So, the problem here is that the diversion code in ExternalAppHelperChild is called before the ExternalAppHandler gets a chance to request encoding conversion. For non-diversion cases, the child channel does the diversion, right? But in diversion cases it doesn't know that it should and just diverts the raw, unconverted data back to the parent.

Is there a way we can somehow find out from the handler if conversion is needed before diverting? Something in ExternalAppHelperChild::DivertToParent that calls mHandler->SetApplyConversionForRequest()? (Maybe we can rename it to MaybeApplyConversionForRequest()). It looks like that function only looks at the data in the channel.

So, in the child, you could call it in DivertToParent, just before diversion.
In the parent, you can call it in nsExternalAppHandler::OnStartRequest, like how you already changed it.

Then, we could change the child to do the conversion before diverting OnDataAvailable back to the parent.

How does that sound?

::: uriloader/exthandler/nsExternalHelperAppService.cpp
@@ +2561,5 @@
> +  }
> +
> +  nsresult rv;
> +  nsCOMPtr<nsIChannel> channel = do_QueryInterface(aRequest, &rv);
> +  NS_ENSURE_SUCCESS_VOID(rv);

Since you're returning nothing here, it's probably ok to not get rv from do_QueryInterface:

nsCOMPtr<nsIChannel> channel = do_QueryInterface(aRequest);
if (!channel) {
  return;
}

Same for rest of the function.

@@ +2594,5 @@
> +                                             &applyConversion);
> +    }
> +  }
> +
> +  encodedChannel->SetApplyConversion(applyConversion);

These parts of the patch in nsExternalAppHandler look ok to me because they're pretty much moving the existing code around. But you should have Reuben Morais <reuben.morais@gmail.com> look at these bits too.
Attachment #8415238 - Flags: feedback?(sworkman) → feedback+
(In reply to Steve Workman [:sworkman] from comment #241)
> Is there a way we can somehow find out from the handler if conversion is
> needed before diverting? Something in ExternalAppHelperChild::DivertToParent
> that calls mHandler->SetApplyConversionForRequest()? (Maybe we can rename it
> to MaybeApplyConversionForRequest()). It looks like that function only looks
> at the data in the channel.

As far as I've been investigating there is no chance to notify conversion necessity from the external app handler.

This is the log when the test failed:

-904800896[2b9fca214140]: nsHttpChannel::OnDataAvailable [this=2b9fcb0d7000 request=2b9fce82da20 offset=0 count=25]
-904800896[2b9fca214140]: sending status notification [this=2b9fcb0d7000 status=804b0006 progress=25/25]
-904800896[2b9fca214140]: HttpChannelParentListener::OnDataAvailable [this=2b9fcb0ea100]
-904800896[2b9fca214140]: HttpChannelParent::OnDataAvailable [this=2b9fca26c080]
-904800896[2b9fca214140]: nsHttpChannel::OnStopRequest [this=2b9fcb0d7000 request=2b9fce82da20 status=0]
-904800896[2b9fca214140]: nsHttpTransaction::DeleteSelfOnConsumerThread [this=2b9fcb8c8800]
-904800896[2b9fca214140]: Destroying nsHttpTransaction @2b9fcb8c8800
-904800896[2b9fca214140]: nsHttpChannel::FinalizeCacheEntry [this=2b9fcb0d7000]
-904800896[2b9fca214140]:   calling OnStopRequest
-946387584[2b4ac7a14140]: HttpChannelChild::RecvOnStartRequest [this=2b4acae94000]
-904800896[2b9fca214140]: HttpChannelParentListener::OnStopRequest: [this=2b9fcb0ea100 status=0l]
-904800896[2b9fca214140]: HttpChannelParent::OnStopRequest: [this=2b9fca26c080 status=0]
-904800896[2b9fca214140]: nsHttpChannel::CloseCacheEntry [this=2b9fcb0d7000] mStatus=0 mCacheEntryIsWriteOnly=1
-946387584[2b4ac7a14140]: nsExternalAppHandler::nsExternalAppHandler(nsIMIMEInfo*, const nsCSubstring&, nsIInterfaceRequestor*, nsExternalHelperAppService*, const nsAString_internal&, uint32_t, bool): [this=2b4ac7a4aaa0]

nsHttpChannel::OnStopRequest had been called before the constructor of nsExternalAppHandler, so the listener for content decoder has been already died.

Or we should move to ApplyConversion() into HttpChannelParent class and invoke it in HttpChannelParent::StartDiversion in case of diversion?
I am not sure it's possible though...
Comment on attachment 8415238 [details] [diff] [review]
An attempt to fix the failure

Review of attachment 8415238 [details] [diff] [review]:
-----------------------------------------------------------------

The nsExternalAppHandler changes look good to me.
Attachment #8415238 - Flags: feedback+
(In reply to Hiroyuki Ikezoe (:hiro) from comment #242)
> nsHttpChannel::OnStopRequest had been called before the constructor of
> nsExternalAppHandler, so the listener for content decoder has been already
> died.
> 
> Or we should move to ApplyConversion() into HttpChannelParent class and
> invoke it in HttpChannelParent::StartDiversion in case of diversion?
> I am not sure it's possible though...

Ah yes, but I'm talking about the child process. It looks like the handler is already created before ExternalHelperAppChild::OnStartRequest - http://dxr.mozilla.org/mozilla-central/source/uriloader/exthandler/ExternalHelperAppChild.cpp#61

So, it's possible we could call mHandler->SetApplyConversionForRequest() here, before diverting the channel.

Note, as far as I can tell, the content decoder is created and used in the child process for non-diverted channels. So, it seems like we could get HttpChannelChild to do the decoding.

What do you think?
Unfortunately,  another error has been happening there.  

from https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=c94e6e428f9d :

[Child 29404] WARNING: '!compMgr', file /builds/slave/tb-try-c-cen-l64-d-00000000000/build/mozilla/xpcom/glue/nsComponentManagerUtils.cpp, line 59
[Child 29404] WARNING: NS_ENSURE_TRUE(svc) failed: file /builds/slave/tb-try-c-cen-l64-d-00000000000/build/mozilla/dom/ipc/ContentChild.cpp, line 757
[Child 29404] WARNING: '!compMgr', file /builds/slave/tb-try-c-cen-l64-d-00000000000/build/mozilla/xpcom/glue/nsComponentManagerUtils.cpp, line 59
System JS : ERROR resource://gre/modules/AddonManager.jsm:20 - Error: You cannot use the AddonManager in child processes!
[Child 29404] WARNING: Cannot create startup observer : service,@mozilla.org/mail/mailglue;1: file /builds/slave/tb-try-c-cen-l64-d-00000000000/build/mozilla/embedding/components/appstartup/src/nsAppStartupNotifier.cpp, line 81
System JS : ERROR (null):0 - uncaught exception: 2147500037
System JS : ERROR (null):0 - uncaught exception: 2147500037
System JS : ERROR (null):0 - uncaught exception: 2147500037
System JS : ERROR (null):0 - uncaught exception: 2147500037
System JS : ERROR resource://gre/components/nsHandlerService.js:891 - NS_ERROR_FAILURE: Failure'Failure' when calling method: [nsIDirectoryServiceProvider::getFile]
Attached patch Decode content in child channel (obsolete) — Splinter Review
I did not think it's ok to decode the content in the child channel in case of diversion.

This patch is another attempt to fix this issue to decode the content in child channel.

Try server result seems good:
https://tbpl.mozilla.org/?tree=Thunderbird-Try&rev=8c47896408c7
Attachment #8415238 - Attachment is obsolete: true
Attachment #8423491 - Flags: feedback?(sworkman)
I've forked bug 1012917 for the code fix here, so we can keep the bot spam contained here.
Depends on: 1012917
Comment on attachment 8423491 [details] [diff] [review]
Decode content in child channel

Review of attachment 8423491 [details] [diff] [review]:
-----------------------------------------------------------------

Removing f? here as Jason has come up with a solution in the forked bug 1012917. This solution should put content-decoding in the parent, and avoid having to suspend the channel at OnStartRequest.
Attachment #8423491 - Flags: feedback?(sworkman)
This is an improvement for test_encoding.js.
Current failure message is not easy to understand what happens there.
Attachment #8423491 - Attachment is obsolete: true
Attachment #8425186 - Flags: review?(sworkman)
Comment on attachment 8425186 [details] [diff] [review]
bug984194_test_fix.patch

Review of attachment 8425186 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Hiroyuki! That will be very helpful for debugging.
Attachment #8425186 - Flags: review?(sworkman) → review+
Attachment #8412108 - Attachment is obsolete: true
https://hg.mozilla.org/integration/mozilla-inbound/rev/8c09fdf15bc6
Assignee: nobody → hiikezoe
Keywords: checkin-needed
Whiteboard: [leave open]
Unless someone comes up with a test case where this fails in Thunderbird today (Hiro wasn't able to unfortunately, and I'm told it shouldn't affect us anyway as we're not using ipc yet), I'm going to resurrect attachment 8412108 [details] [diff] [review] to disable the test for Thunderbird until the core issue is resolved.
Magnus, you mentioned that the patch you attached previously worked for you. However when I tried it on try, it didn't work, I had to use this variant.

Can you see if this works for you please?
Attachment #8450446 - Flags: feedback?(mkmelin+mozilla)
Comment on attachment 8450446 [details] [diff] [review]
Possible test disable

Review of attachment 8450446 [details] [diff] [review]:
-----------------------------------------------------------------

Yep, this works for me.

I wonder if I tested it on a browser build, but mach xpcshell-test uriloader/exthandler/ doesn't run that test... (obviously!)
Attachment #8450446 - Flags: feedback?(mkmelin+mozilla) → feedback+
Comment on attachment 8450446 [details] [diff] [review]
Possible test disable

Jason, I think we should just disable the test for Thunderbird for now, is this fine?
Attachment #8450446 - Flags: review?(jduell.mcbugs)
Comment on attachment 8450446 [details] [diff] [review]
Possible test disable

Review of attachment 8450446 [details] [diff] [review]:
-----------------------------------------------------------------

Yes, I'm fine with disabling the test.  I'll add a note to bug 1012917 that we should figure out why this test was failing (and enable it again once we have a fix).
Attachment #8450446 - Flags: review?(jduell.mcbugs) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/a42840638d09
Assignee: hiikezoe → mkmelin+mozilla
Whiteboard: [leave open]
Target Milestone: --- → mozilla33
https://hg.mozilla.org/mozilla-central/rev/a42840638d09
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Mark, buildapp is "..\mail' on Windows. We have to deal with the difference of the directory separator.
Or just disable the test for all applications until bug 1012917 is fixed?
Status: RESOLVED → REOPENED
Flags: needinfo?(standard8)
Resolution: FIXED → ---
(In reply to Hiroyuki Ikezoe (:hiro) from comment #391)
> Mark, buildapp is "..\mail' on Windows. We have to deal with the difference
> of the directory separator.

Oh typical. At the expense of being verbose, could we do:

skip-if = (os == "android" || buildapp == '../mail' || buildapp == '..\mail')

I'm hoping this is a one-off, but if it gets too bad, we might want to look at other solutions later.
Flags: needinfo?(standard8)
(In reply to Mark Banner (:standard8) from comment #398)
> (In reply to Hiroyuki Ikezoe (:hiro) from comment #391)
> > Mark, buildapp is "..\mail' on Windows. We have to deal with the difference
> > of the directory separator.
> 
> Oh typical. At the expense of being verbose, could we do:
> 
> skip-if = (os == "android" || buildapp == '../mail' || buildapp == '..\mail')
> 
> I'm hoping this is a one-off, but if it gets too bad, we might want to look
> at other solutions later.

Joshua, I keep meaning to ask you about this - do you know an easy way we could account for this in the code, and avoid the extra or, or should we just use the extended statement for now?
Flags: needinfo?(Pidgeot18)
(In reply to Mark Banner (:standard8) from comment #411)
> (In reply to Mark Banner (:standard8) from comment #398)
> > (In reply to Hiroyuki Ikezoe (:hiro) from comment #391)
> > > Mark, buildapp is "..\mail' on Windows. We have to deal with the difference
> > > of the directory separator.
> > 
> > Oh typical. At the expense of being verbose, could we do:
> > 
> > skip-if = (os == "android" || buildapp == '../mail' || buildapp == '..\mail')
> > 
> > I'm hoping this is a one-off, but if it gets too bad, we might want to look
> > at other solutions later.
> 
> Joshua, I keep meaning to ask you about this - do you know an easy way we
> could account for this in the code, and avoid the extra or, or should we
> just use the extended statement for now?

If you're asking about being able to distinguish the \ versus /, it should be fairly trivial to patch the python code to do that, probably in whatever piece of code generates mozinfo.json is most palatable to standard build peers.
Flags: needinfo?(Pidgeot18)
Blocks: 1084997
Bug 1012917 forgot to re-enable this test, and it indeed works fine on comm-central now (at least locally).
Attachment #8529910 - Flags: review?(jduell.mcbugs)
Comment on attachment 8529910 [details] [diff] [review]
bug984194_enable_test_encodingjs.patch

Review of attachment 8529910 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good but let's run it by Steve too.
Attachment #8529910 - Flags: review?(sworkman)
Attachment #8529910 - Flags: review?(jduell.mcbugs)
Attachment #8529910 - Flags: feedback+
Comment on attachment 8529910 [details] [diff] [review]
bug984194_enable_test_encodingjs.patch

Review of attachment 8529910 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry for the late review! r=me
Attachment #8529910 - Flags: review?(sworkman) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/592ac8d27270
OS: Linux → All
Hardware: x86_64 → All
https://hg.mozilla.org/mozilla-central/rev/592ac8d27270
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: