Closed Bug 1172468 Opened 5 years ago Closed 5 years ago

Intermittent browser_parsable_script.js | application terminated with exit code 11

Categories

(Core :: General, defect)

Unspecified
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox39 --- unaffected
firefox40 --- fixed
firefox41 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 --- unaffected

People

(Reporter: cbook, Assigned: Gijs)

References

(Depends on 1 open bug, )

Details

(Keywords: intermittent-failure)

mozilla-inbound_ubuntu32_vm_test_pgo-mochitest-browser-chrome-1

https://treeherder.mozilla.org/logviewer.html#?job_id=10525639&repo=mozilla-inbound

03:47:35 WARNING - TEST-UNEXPECTED-FAIL | browser/base/content/test/general/browser_parsable_script.js | application terminated with exit code 11
Depends on: 1172193
No longer depends on: 1172193
Depends on: 1172193
Peachy, bug 1164014 was uplifted to Aurora today, so now this is hitting there as well. Where do things stand with this? Leaving tests basically permafailing is not acceptable.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #196)
> Peachy, bug 1164014 was uplifted to Aurora today, so now this is hitting
> there as well. Where do things stand with this? Leaving tests basically
> permafailing is not acceptable.

The test was already intermittent in 2-3 different ways and was quite frequent. I would argue that it should be turned off and the underlying problems should be fixed and then turn on again (like all the other intermittent tests). A randomly failing test that has nothing to do with the work I'm trying to get done should not become a blocker for me. If I could make the call I would turn it off and file a follow up on the GC bug. But if folks disagree then let me know and we can back out my patches from aurora, and I'll explain people why developer edition must keep being painfully slow for them.
Flags: needinfo?(gkrizsanits)
Flags: needinfo?(ryanvm)
(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #210)
> The test was already intermittent in 2-3 different ways and was quite
> frequent.

Evidence for that? I don't recall seeing this test fail anywhere near like it is now before bug 1164014 landed.

> I would argue that it should be turned off and the underlying
> problems should be fixed and then turn on again (like all the other
> intermittent tests). A randomly failing test that has nothing to do with the
> work I'm trying to get done should not become a blocker for me. If I could
> make the call I would turn it off and file a follow up on the GC bug.

Decisions like that should be made before uplifting the regressing patch around and spreading the failures around even more. And in consultation with the relevant test owners (which AFAICT, hasn't happened anywhere).

> But if folks disagree then let me know and we can back out my patches from aurora,
> and I'll explain people why developer edition must keep being painfully slow
> for them.

I don't think guilt-tripping is an overly productive addtition to this discussion.
Flags: needinfo?(ryanvm)
I plan to disable the test by end of day given the essential permafail here and lack of productive discussion towards getting it fixed.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(terrence)
Flags: needinfo?(jcoppeard)
Flags: needinfo?(gkrizsanits)
Flags: needinfo?(gijskruitbosch+bugs)
So the test loads all the JS we ship and checks that it does not create parse errors.

If we're triggering OOM here there are several ways to workaround. It seems bug 1172193 comment #0 already identified at least one, to wit:

> If I artificially call scheduleGC(this) after each Reflect.parse call from the test, then everything is good.

Seems like that would be a workable solution, though it might need to be combined with a requestLongerTimeout if the GCs take enough time to push the test's runtime through the time limits for our individual tests.

I don't have time to write up the patch until either late tonight or potentially tomorrow, and I'd like confirmation from the JS folks that such a workaround works and/or is acceptable considering they've not fixed 1172193 yet.

I'm also confused because the offending patches landed with a supposed test fix that moved the reflect calls into the same zone which should be avoiding the GC issues already. Do we know why that fix is not "good enough" to avoid the issue?
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #222)
> (In reply to Gabor Krizsanits [:krizsa :gabor] from comment #210)
> > The test was already intermittent in 2-3 different ways and was quite
> > frequent.
> 
> Evidence for that? I don't recall seeing this test fail anywhere near like
> it is now before bug 1164014 landed.

I did quite a few try runs and based on those it seemed to me that the failures went up from let's say 10% to 25%. I have not seen this one from around 50 runs or more, so I have no idea how it became perma orange. In fact if I saw this one once I would not have pushed it to mc even, I clearly looked over this one. I wonder if other patches made it worse or something. I thought it is about one of the two other intermittent failures that belongs to this test. And I double checked it now and the other frequent intermittent was actually from browser_social_activation.js. The other one was bug 1123438.

Because it was not backed out and I've seen progress around the GC bug I found that acceptable. (was hoping that it will get better soon with the related GC bug fixed or by turning the test off). I was not aware that it became almost perma orange, especially not with this new oom failure! So I made a second mistake by assuming too much instead of communicating it. Sorry about that.

> 
> > I would argue that it should be turned off and the underlying
> > problems should be fixed and then turn on again (like all the other
> > intermittent tests). A randomly failing test that has nothing to do with the
> > work I'm trying to get done should not become a blocker for me. If I could
> > make the call I would turn it off and file a follow up on the GC bug.
> 
> Decisions like that should be made before uplifting the regressing patch
> around and spreading the failures around even more. And in consultation with
> the relevant test owners (which AFAICT, hasn't happened anywhere).
> 
> > But if folks disagree then let me know and we can back out my patches from aurora,
> > and I'll explain people why developer edition must keep being painfully slow
> > for them.
> 
> I don't think guilt-tripping is an overly productive addtition to this
> discussion.

Sorry if that sounded passive aggressive or something that was totally not intentional. It's just a fact that we have to keep in mind. If this crash is more serious than that, it can be totally the right thing to do, I did not except anyone feeling guilt for anything here. I wonder if there is any frequency change in the oom crashes that would be a reason probably to back out my patch no matter how painful that would be.

(In reply to :Gijs Kruitbosch from comment #224)
> I'm also confused because the offending patches landed with a supposed test
> fix that moved the reflect calls into the same zone which should be avoiding
> the GC issues already. Do we know why that fix is not "good enough" to avoid
> the issue?

No. It worked fine when I pushed it to try and tested quite thoroughly. The other intermittent failures became slightly more frequent but could not reproduce this crash. And I'm afraid just backing out my patch will not fix this issue.
Flags: needinfo?(gkrizsanits)
(In reply to :Gijs Kruitbosch from comment #224)
> > If I artificially call scheduleGC(this) after each Reflect.parse call from the test, then everything is good.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c0619e89a62c
Does that run include PGO? That's where the vast majority of the failures are happening.
https://wiki.mozilla.org/ReleaseEngineering/TryChooser#What_if_I_want_PGO_for_my_build
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #235)
> Does that run include PGO? That's where the vast majority of the failures
> are happening.
> https://wiki.mozilla.org/ReleaseEngineering/
> TryChooser#What_if_I_want_PGO_for_my_build

Nope, I have not even heard about that, thanks!
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cf8a44565b91