Open Bug 1334336 Opened 7 years ago Updated 2 years ago

Intermittent toolkit/components/passwordmgr/test/browser/{browser_capture_doorhanger.js,browser_doorhanger_remembering.js} | Uncaught exception - Wait for form submission load (formsubmit.sjs) - timed out after 50 tries.

Categories

(Toolkit :: Password Manager, defect, P5)

defect

Tracking

()

People

(Reporter: intermittent-bug-filer, Unassigned)

Details

(Keywords: intermittent-failure, leave-open, Whiteboard: [stockwell disabled])

Attachments

(1 file, 2 obsolete files)

this hasn't hit our 30 failures/week threshold, but it is close.  this is on linux* debug e10s only, looking at linux32/debug bc-e10s-2 as the job:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=bc2%20linux%20debug%20e10s&tochange=a9c886b3517f9c5e2728539c6426df45b80b077b&fromchange=c726adc9b2174f3bf5dab4b12948d37537216a14&selectedJob=82012989

hopefully we can narrow this down to the root cause which would give us more information, possibly make the fix easier.  This test runs for a long time, 100+ seconds typically.

Also note, this test is only run every 5th push, so we have a much higher failure rate overall.

ni myself to follow up on the root cause.
Flags: needinfo?(jmaher)
retriggers didn't help much here, I think we just need to investigate the test as a test failure.
will wait until the failure rate picks up a bit more
Flags: needinfo?(jmaher)
Perhaps this picked up due to multiple content processes?
this has really increased in frequency, hard to tell if this is affected by multi-e10s, I believe we landed that a few weeks ago (bug 1303113) and this picked up 7-8 days ago.  The current failure rate is very high:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1334336&startday=2017-03-06&endday=2017-03-30&tree=all

still linux64-debug-bc-e10s (a little bit of linux32-debug-e10s as well).

Matt, do you have suggestions for additional debugging or others who could help out?
Flags: needinfo?(MattN+bmo)
Whiteboard: [stockwell needswork]
(In reply to Joel Maher ( :jmaher) from comment #9)
> this has really increased in frequency, hard to tell if this is affected by
> multi-e10s, I believe we landed that a few weeks ago (bug 1303113) and this
> picked up 7-8 days ago.

The bug only started 4 days after multi-e10s was enabled. They are still fixing/changing e10smulti code so I still really think this is multi-e10s related. It also sounds like another intermittent that mrbkap worked on due to e10smulti.

>  The current failure rate is very high:
> https://brasstacks.mozilla.com/orangefactor/
> ?display=Bug&bugid=1334336&startday=2017-03-06&endday=2017-03-30&tree=all
> 
> still linux64-debug-bc-e10s (a little bit of linux32-debug-e10s as well).
> 
> Matt, do you have suggestions for additional debugging or others who could
> help out?

I think disabling e10smulti and seeing if it happens would be useful.

I just scheduled a custom action on https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=03d602fd723ad6ff4588c04855884ffa1dee9410&filter-searchStr=linux32%20debug%20mochitest-browser-chrome-e10s-2 to see if it fails with 30 tries run-until-failure with processCount forced to 1.

If that doesn't fail then I think it would be useful to get input from the e10smulti team on common causes/fixes.
Flags: needinfo?(MattN+bmo)
:gbrown, would you have time to see if e10smulti is the root cause here?
Flags: needinfo?(gbrown)
Sure, I'll check into that.
(In reply to Matthew N. [:MattN] (behind on bugmail; PM if requests are blocking you) from comment #10)
> I just scheduled a custom action on
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-
> central&revision=03d602fd723ad6ff4588c04855884ffa1dee9410&filter-
> searchStr=linux32%20debug%20mochitest-browser-chrome-e10s-2 to see if it
> fails with 30 tries run-until-failure with processCount forced to 1.

It looks like that job failed because it ran for too long, but it didn't hit any of these failures.
https://treeherder.mozilla.org/#/jobs?repo=try&tochange=d4782d5ab64f07de229e86c617b5f5766ec7b371&fromchange=8cb4db8ca18fe17d804519935a6686605abff1d7&filter-searchStr=e10s compares dom.ipc.processCount=1 to normal. There are fewer failures with processCount=1, but this failure does still occur. 

I think we should look for other explanations besides multiple content processes.
Flags: needinfo?(gbrown)
(In reply to Geoff Brown [:gbrown] from comment #17)
> https://treeherder.mozilla.org/#/
> jobs?repo=try&tochange=d4782d5ab64f07de229e86c617b5f5766ec7b371&fromchange=8c
> b4db8ca18fe17d804519935a6686605abff1d7&filter-searchStr=e10s compares
> dom.ipc.processCount=1 to normal. There are fewer failures with
> processCount=1, but this failure does still occur. 
> 
> I think we should look for other explanations besides multiple content
> processes.

There were changes made to support multiple content processes that may affect both modes so I still suspect it's related to e10s multi. I'm fine with disabling on linux debug though as I don't have time to investigate.
(In reply to Geoff Brown [:gbrown] from comment #17)
> I think we should look for other explanations besides multiple content processes.

Maybe I spoke too soon!

In https://treeherder.mozilla.org/#/jobs?repo=try&revision=38efd34864c18359ab669737334e58d0deb27a1f, remembering my previous experience with multiple content processes in another bug, I added a call to releaseCachedProcesses(). Now none of the failures are browser_capture_doorhanger.js! Most of the failures are timeouts in browser_context_menu.js; that does happen on trunk currently, but perhaps not with this frequency. 

I'll check into that a bit more; if that doesn't work out, will disable on linux debug.
Assignee: nobody → gbrown
:gabor -- As in bug 1348547, here's another case where setting processCount=1 and calling releaseCachedProcesses() seems to make the test run reliably. OK to use this strategy?
Attachment #8857175 - Flags: review?(gkrizsanits)
Summary: Intermittent browser_capture_doorhanger.js | Uncaught exception - Wait for form submission load (formsubmit.sjs) - timed out after 50 tries. → Intermittent toolkit/components/passwordmgr/test/browser/browser_capture_doorhanger.js | Uncaught exception - Wait for form submission load (formsubmit.sjs) - timed out after 50 tries.
(In reply to Geoff Brown [:gbrown] from comment #21)
> Created attachment 8857175 [details] [diff] [review]
> set processCount=1 and releaseCachedProcesses
> 
> :gabor -- As in bug 1348547, here's another case where setting
> processCount=1 and calling releaseCachedProcesses() seems to make the test
> run reliably. OK to use this strategy?

Let me check this with Blake, he has more experience than me with these intermittent tests, and I would prefer to understand why this is happening instead of just turning on e10s-multi.

Blake, can you take a look at this function? http://searchfox.org/mozilla-central/source/toolkit/components/passwordmgr/test/browser/head.js#32

It times out in the waitForCondition part... do you have any idea how we might affect this code? If you don't know it either I'm leaning toward accepting this patch as it is, since it's happening only on linux debug. And then we'll figure it out in a followup.
Flags: needinfo?(mrbkap)
I'm going to try a patch for this bug tomorrow to fix this class of problems.
looking forward to the results!
I got a bit confused. I was referring to my patch in bug 1358674 (currently on try), but that doesn't answer Gabor's question. I'll look again on Monday. Sorry for the delay.
as we are 4+ weeks of very high frequency failures, lets reduce the pain.  I am happy to see this landed or the other patch for reducing to 1 process.
Attachment #8861064 - Flags: review?(gbrown)
This is very high frequency and has languished, but :mrbkap said he'd have another look today and my r? patch could be used in a pinch...let's wait one more day before skipping.
Comment on attachment 8857175 [details] [diff] [review]
set processCount=1 and releaseCachedProcesses

Review of attachment 8857175 [details] [diff] [review]:
-----------------------------------------------------------------

I think the patch Blake has and will land soon in bug 1358674 will fix this problem.
Attachment #8857175 - Flags: review?(gkrizsanits)
Attached patch set processCount=1 (obsolete) — Splinter Review
> I think the patch Blake has and will land soon in bug 1358674 will fix this problem.

We still need to set processCount though, right?
Attachment #8857175 - Attachment is obsolete: true
Attachment #8861423 - Flags: review?(gkrizsanits)
Comment on attachment 8861423 [details] [diff] [review]
set processCount=1

Review of attachment 8861423 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, that's right. And we also should file a followup bug to investigate why do we need this at all on the first place.
Attachment #8861423 - Flags: review?(gkrizsanits) → review+
Comment on attachment 8861064 [details] [diff] [review]
temporarily disable doorhanger test on linux debug

Review of attachment 8861064 [details] [diff] [review]:
-----------------------------------------------------------------

I pushed one more try run to check my change + mrbkap's, but was disappointed in the result: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dff0c5cb3e92a5662796ded0b8119d261b6bb5f9. It might be improved, but it's still failing. 

Let's skip until someone can give this test proper attention.
Attachment #8861064 - Flags: review?(gbrown) → review+
Attachment #8861423 - Attachment is obsolete: true
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/488d0f41b405
Intermittent toolkit/components/passwordmgr/test/browser/browser_capture_doorhanger.js. temporarily disable. r=gbrown
Assignee: gbrown → nobody
Whiteboard: [stockwell needswork] → [stockwell disabled]
Sorry for not updating here. I spent a bunch of time yesterday debugging this and finding (as you did) that processCount doesn't seem to affect the orange. At least locally, I've been debugging with the patches here and in bug 1358674 applied. The problem appears to be related to the password manager prompt being rolled up before the test can inspect it. Being that this only seems to affect Linux, that would suggest a bug in the gtk widget code.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=38862dde9082fbafa6362e27bc638b928b279d75 might help shed some light as to whether this is actually a gtk problem.
Flags: needinfo?(mrbkap)
Summary: Intermittent toolkit/components/passwordmgr/test/browser/browser_capture_doorhanger.js | Uncaught exception - Wait for form submission load (formsubmit.sjs) - timed out after 50 tries. → Intermittent toolkit/components/passwordmgr/test/browser/{browser_capture_doorhanger.js,browser_doorhanger_remembering.js} | Uncaught exception - Wait for form submission load (formsubmit.sjs) - timed out after 50 tries.
Priority: -- → P5
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: