Open Bug 1057558 Opened 5 years ago Updated 10 months ago

Structured logging ignores IPC test failures.

Categories

(Testing :: Mochitest, defect, P3, critical)

defect

Tracking

(firefox33 affected, firefox34 affected)

ASSIGNED
Tracking Status
firefox33 --- affected
firefox34 --- affected

People

(Reporter: drno, Assigned: akachkach)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 4 obsolete files)

An ok(false, "Foo") in any test execute as part of IPC tests slips through unnoticed by the structured logging framework!
That basically means all IPC tests are passing no matter if there are test failures. The only problems which are reported are exceptions from within the test_ipc.html itself (e.g. failing to parse the JS from the loaded test).
Attached patch ipc_failure.patch (obsolete) — Splinter Review
To reproduce the problem first run:

./mach mochitest-plain dom/media/tests/ipc/

Then apply the patch from attachment 8477662 [details] [diff] [review] (which just adds an artificial test failure) and run again:

./mach mochitest-plain dom/media/test/ipc/

In the iframe you will see actual test failure getting reported, but the main frame just keeps reporting 5 test successes, and no failure at all.

As a result TBPL fails to report us test failures, and only reports test failures if a test failure in the iframe results in an JS error in the test_ipc.html itself.
So we're pretty sure this is caused by the fact that TestRunner is shared between the second and first iframe, and replacing _dumpMessage by a function that sends IPC messages prevents the ok and info statements in the IPC message handling function to dump them to stdout (it just sends yet another IPC message that isn't handled anywhere).

This is a proposed patch that should dump the logs using the old function too (in addition to sending IPC tests), I tested it with the provided ipc failure and it works. But the problem is the old _dumpMessage will also be executed in the second iframe (even if weirdly enough I didn't get any duplicated logs).

I will try to add a test to send IPC messages in the second iframe and do the old _dumpMessage in the first iframe.

Quick try run to see how this works: https://tbpl.mozilla.org/?tree=Try&rev=83d192feb98f
Try run looks OK, here's a new try run with a seeded failure: https://tbpl.mozilla.org/?tree=Try&rev=7e99aec80b7c
The try run with errors seems to work fine. Can you confirm, Nils?

Interestingly enough, I just realized TEST-UNEXPECTED-FAIL messages do not contain a line break. Will file a different bug for that.
Flags: needinfo?(drno)
Blocks: 1046162
Blocks: 1048775
I have no interest in accepting this patch before the code is shared.
Comment on attachment 8477729 [details] [diff] [review]
0001-Bug-1057558-Fix-IPC-test-results-not-being-reported-.patch

This patch results in the test errors being printed when mach exists. But I'm not sure if that is enough, because the IPC test still gets counted as test pass and no failed test is added to the counters.

Sorry if that was not clear from my initial report, but for IPC tests we need either:
a) all test pass and fail counters to be added/reported to the parents/first iframe counters
b) or at least if the test in the second iframe encounter any test fail report that whole test as a failure to the parent/first iframe
Flags: needinfo?(drno)
Blocks: 1041832
This results in us flying blind when trying to land any bigger changes for WebRTC. So this is really important to get fixed ASAP.

And probably worth pointing out that this needs to get uplifted, once we have a fix.
Severity: normal → critical
Priority: -- → P1
(In reply to :Ms2ger from comment #6)
> I have no interest in accepting this patch before the code is shared.

Sorry I dis-agree. We need to get this fixed really soon, especially as we are getting closer to an uplift next week and don't know how many orange we have in the IPC tests.
Re-factoring the code so that all the IPC tests share the same code base is nice to have and should be tracked in a separate bug (which can also land after the uplift).
Already filed a code for sharing the IPC tests code (bug 1042035), and started working on it this weekend. Unfortunately it's still not ready and It probably won't be ready today.

So in the meantime we worked with :drno on finding a temporary solution, will post the patch in a bit.
This is clearly a temporary fix, just to make sure the failures are logged / taken into account in the summary (one problematic thing is TEST-START messages. I'm not sure how I can get access to the test runner inside the message callback to just dump them as is)
Attachment #8477729 - Attachment is obsolete: true
Attachment #8478676 - Flags: review?(drno)
Fixed a nit from previous patch.
Attachment #8478676 - Attachment is obsolete: true
Attachment #8478676 - Flags: review?(drno)
Attachment #8478678 - Flags: review?(drno)
Comment on attachment 8478678 [details] [diff] [review]
0001-Bug-1057558-Fix-IPC-test-results-not-being-reported-.patch

Review of attachment 8478678 [details] [diff] [review]:
-----------------------------------------------------------------

Verified locally that it properly updates the test pass and failure counts to the parent iframe.
Lgtm.
Attachment #8478678 - Flags: review?(drno) → review+
We should keep this bug open to find a durable solution.
Try run here: https://tbpl.mozilla.org/?tree=Try&rev=54b441767753

Catched some errors I'm afraid :/ also the logs look a bit weird; probably because we don't use the subtest field. I wonder if we even have to log the test_end messages after all. ('should probably just take into account the test_status)
Can't land this if it's going to cause perma-orange. Sucks, I know :(
Keywords: checkin-needed
So we have a patch that fixes the regression, but it turns out that there was in fact an error being hidden?

Depending how hard it is to fix it, we can either:
A) Temporarily disable the failing test and land the patch.
B) Fix the test first.

Nils, can you take a look and let us know what route you want to take?

Looking at how test_ipc.html works, I'm not even sure it's possible to disable an individual ipc test. So option A) might mean disable all ipc devicestorage tests.

As an aside, I'm pretty sad this test_ipc.html stuff got shoe-horned into mochitest. I wonder how hard it would be to implement a marionette-based harness that accomplishes the same goal.
There is bug 1010285, fwiw.
(In reply to Andrew Halberstadt [:ahal] from comment #17)
> So we have a patch that fixes the regression, but it turns out that there
> was in fact an error being hidden?

Sounds that way :( (assuming we were catching these correctly in the first place - I wouldn't be too confident in that personally)

> Depending how hard it is to fix it, we can either:
> A) Temporarily disable the failing test and land the patch.
> B) Fix the test first.

I think you should file a bug for the failing devicestorage test and get it fixed. I'd CC :dhylands and :yxl to it.

> As an aside, I'm pretty sad this test_ipc.html stuff got shoe-horned into
> mochitest. I wonder how hard it would be to implement a marionette-based
> harness that accomplishes the same goal.

Yeah, it's really bad. Bug 1010285 will hopefully make it go away, but who knows what the timeframe is for that.
Depends on: 1058597
Bug 960729 had been closed with what appears to be a band-aid rather than a fix. This issue has been biting us for quite some time now.
See Also: → 960729
Forgot to needinfo for comment 17. Also filed bug 1058597 for the test failure. We should still decide whether to block on the fix for that or disable ipc devicestorage tests in the meantime.
Flags: needinfo?(drno)
(In reply to Andrew Halberstadt [:ahal] from comment #17)
> So we have a patch that fixes the regression, but it turns out that there
> was in fact an error being hidden?
> 
> Depending how hard it is to fix it, we can either:
> A) Temporarily disable the failing test and land the patch.
> B) Fix the test first.
> 
> Nils, can you take a look and let us know what route you want to take?

First of all device storage is not my area.

But I think withholding this fix any longer is only going to cause us more harm, as more and more oranges (or even permanent test failures) will go undiscovered (as currently everything in test_ipc.html appears always green). So even though it hurts to get this land this now, I don't really see an alternative.

Maybe an alternative would be to copy the code from dom/media/tests/ipc/test_ipc.html which allows you to configure the IPC tests to be executed on a per test level. There is a dom/media/tests/ipc/ipc.json file which allows you to do that for the WebRTC IPC tests. I can try to look into that later today.

If that is not a good option I would vote for A) and open a high priority ticket for fixing the device storage IPC test(s).
Flags: needinfo?(drno)
Here's an alternate patch that should show better log output to help you debugging the current errors. Again, this diverges from what we used to do with unstructured logs, but it's hard to just convert what the IPC tests are doing to the new logging.
Comment on attachment 8479428 [details] [diff] [review]
Alternate IPC fix

Review of attachment 8479428 [details] [diff] [review]:
-----------------------------------------------------------------

Verified locally on my system with a seeded error, all counts seem to properly reported.
LGTM.
Attachment #8479428 - Flags: review+
This patch allows to selectively enable and disable certain test for the IPC tests.
For now I disabled the test_dir.html, which is causing the test failures on TBPL. We need to track fixing it in a separate ticket.
Attachment #8479481 - Flags: review?(akachkach)
Try run for the "Alternate IPC fix" together with selectively disabling the test_dir.html test from devicestorage IPC tests: https://tbpl.mozilla.org/?tree=Try&rev=fea476d41094
Comment on attachment 8479481 [details] [diff] [review]
Choose devicestorage IPC tests selectively

Review of attachment 8479481 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me as long as it works as expected on try.
Attachment #8479481 - Flags: review?(akachkach) → review+
Attachment #8477662 - Attachment is obsolete: true
Attachment #8478678 - Attachment is obsolete: true
Try looks green, besides one intermittent orange in an IPC test. But as the re-run is green it actually proves that IPC tests are now reporting test failures properly.
Keywords: checkin-needed
(In reply to Nils Ohlmeier [:drno] from comment #28)
> Try looks green, besides one intermittent orange in an IPC test. But as the
> re-run is green it actually proves that IPC tests are now reporting test
> failures properly.

I re-triggered that failure a bunch of times. Depending how frequent it is, we may need to disable that test as well. For now let's just inform the sheriffs and keep an eye on it.
Status: NEW → ASSIGNED
Depends on: 1063569
I think we can close this once bug 1063569 has been closed.
Keywords: leave-open
Moving to p3 because no activity for at least 24 weeks.
Priority: P1 → P3
You need to log in before you can comment on or make changes to this bug.