Closed Bug 814430 Opened 12 years ago Closed 12 years ago

Allowing offline mode to connect to loopback doesn't shutdown network on profile-change-net-teardown

Categories

(Core :: Networking, defect)

19 Branch
All
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla20
Tracking Status
firefox17 --- unaffected
firefox18 + verified
firefox19 + verified
firefox20 + verified
b2g18 --- fixed

People

(Reporter: AlexLakatos, Assigned: unusualtears)

References

Details

(Keywords: hang, regression)

Attachments

(2 files, 2 obsolete files)

Attached file minimizedTestcase (obsolete) —
The landing of bug 87717 https://hg.mozilla.org/mozilla-central/rev/c122628565e5 introduced a regression with our Mozmill tests, on Ubuntu 12.04.

Whenever our tests use controller.startUserShutdown(), Firefox hangs on the next test. Attached is a minimized testcase to reproduce.

Steps to Reproduce:
1.Install Mozmill as described in https://developer.mozilla.org/en-US/docs/Mozmill#Ubuntu
2.Unzip the minimizedTestcase archive into a directory and run "mozmill-restart -t path/to/minimizedTestcase/ -b path/to/latest/nightly"

Expected Results:
3.The mozmill test runs and then closes

Actual Results:
4.The mozmill test starts, restarts Firefox and then hangs, as in nothing is happening anymore, and Firefox remains opened.
Blocks: 808671
No longer blocks: 808671
This is a critical issue for us with our Mozmill tests. It keeps our testruns hanging on Firefox shutdown/restart. I can't tell details because I don't know the networking code that well. So I hope someone from the networking team can jump on it hopefully soon.
Blocks: 87717
Severity: normal → critical
Keywords: regression
Hardware: x86 → All
Summary: Allowing offline mode to connect to loopback introduced a regression in mozmill on linux → Allowing offline mode to connect to loopback introduced a regression in mozmill on linux and keeps testruns hanging on shutdown/restart of Firefox
Whiteboard: [mozmill-test-failure]
Version: unspecified → 19 Branch
Attached file minimized testcase
Attachment #684422 - Attachment is obsolete: true
(In reply to Alex Lakatos[:AlexLakatos] from comment #3)
> Created attachment 684450 [details]
> minimized testcase
Here are new Steps to Reproduce to match the new minimized testcase:
1.Install Mozmill as described in https://developer.mozilla.org/en-US/docs/Mozmill#Ubuntu
2.Download the minimizedTestcase and run "mozmill -t path/to/minimizedTestCase.js -b path/to/latest/nightly"
This minimized testcase works fine for me on my own Linux64 VM. No freeze can be observed. Alex, please work on bug 797389 to get a better testcase or an idea why not all Ubuntu64 systems are affected.
No longer blocks: 812099
I quickly checked Mozmill and what I found so far is that we successfully restart Firefox but we can't connect anymore from the Python side to the browser via the bridge from localhost.
Summary: Allowing offline mode to connect to loopback introduced a regression in mozmill on linux and keeps testruns hanging on shutdown/restart of Firefox → Allowing offline mode to connect to loopback doesn't let Mozmill reconnect to the application after a restart
Whiteboard: [mozmill-test-failure] → [qa-automation-blocked]
Patrick - can you help the a-team out with this? Would also be good to understand why FF18 isn't called out as affected, given the target milestone of bug 87717.
Assignee: nobody → mcmanus
I'm going to pass this on to the author of the regressing patch.
Assignee: mcmanus → unusualtears
fwiw I doubt this is very interesting for FF proper.
I missed to update this bug after my investigation early yesterday. So there should be two sides here.

First, Mozmill needs a fix to gracefully handle a delayed disconnect of the back channel. Affected is the following code for Mozmill 1.5:

https://github.com/mozilla/mozmill/blob/hotfix-1.5/mozmill/mozmill/__init__.py#L616

In the past `frame.runTestFile(test)` never returned but an exception was thrown which meant for us that Firefox shutdown or restarted itself. Now with the patch on the other bug landed this method returns normally. As result we handle an user shutdown/restart wrongly and are trying to stop the runner. Which is wrong.

On the other side it means networking code has been changed in the Firefox product itself, which causes a delayed shutdown. Not sure what else could be affected here given that it is really hard to reproduce on other machines as the affected ubuntu 64bit one.
whimboo is correct in his analysis.  I was able to reproduce this (on Debian x86_64).

Prior to bug 87717 the network would go away entirely when offline, but in the patch it stopped going away except on shutdown.  

It should still go away on profile-change-net-teardown.  This remedies that, and Mozmill will not hang.

Pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=10e26e037305
Attachment #686342 - Flags: review?(mcmanus)
Wow, thanks Adam! That sounds reasonable. Is there a way to get a test for? If not we will be indeed able to have a test by Mozmill.

Alex, I know you are able to reproduce this too on your Mozmill machine. Can you please check with the following try server build if it is fixed?

https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/unusualtears@gmail.com-10e26e037305/
Flags: needinfo?(alex.lakatos)
Updating summary to better reflect what's going on here.
Summary: Allowing offline mode to connect to loopback doesn't let Mozmill reconnect to the application after a restart → Allowing offline mode to connect to loopback doesn't shutdown network on profile-change-net-teardown
(In reply to Henrik Skupin (:whimboo) from comment #12)
> Alex, I know you are able to reproduce this too on your Mozmill machine. Can
> you please check with the following try server build if it is fixed?
> 
> https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/
> unusualtears@gmail.com-10e26e037305/
Tested the x86 build with my local machine. It passes and does not hang.
Flags: needinfo?(alex.lakatos)
Attachment #686342 - Flags: review?(mcmanus) → review+
Patrick, can we get this landed so it will be part of tomorrows nightly?
(In reply to Henrik Skupin (:whimboo) from comment #15)
> Patrick, can we get this landed so it will be part of tomorrows nightly?

https://developer.mozilla.org/en-US/docs/Creating_a_patch_that_can_be_checked_in
(In reply to Patrick McManus [:mcmanus] from comment #16)
> https://developer.mozilla.org/en-US/docs/
> Creating_a_patch_that_can_be_checked_in

That's not that helpful. Is it only because of the missing 'r=mcmanus'? If that's the case I can add that and land it on inbound. Otherwise please tell me what's missing.
Also https://developer.mozilla.org/en-US/docs/Developer_Guide/How_to_Submit_a_Patch

This is all for the patch author to do; that's not me.
Thanks, Patrick!

Successful try: https://tbpl.mozilla.org/?tree=Try&rev=10e26e037305
Attachment #686342 - Attachment is obsolete: true
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/e2fd9e9e3788
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla20
Looks great on Linux64 and todays Nightly build. No abort happening anymore. Adam, can you please follow-up with all the necessary steps so we can land it on aurora too? Thanks.

Todays testrun:
http://10.250.73.243:8080/job/mozilla-central_functional/lastCompletedBuild/console
Status: RESOLVED → VERIFIED
Comment on attachment 686765 [details] [diff] [review]
Ready for checkin

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 87717
User impact if declined: Broken Mozmill (hanging on certain tests) on Linux
Testing completed (on m-c, etc.): m-i, m-c
Risk to taking this patch (and alternatives if risky): None
String or UUID changes made by this patch: None
Attachment #686765 - Flags: approval-mozilla-aurora?
Attachment #686765 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Keywords: checkin-needed
Whiteboard: [qa-automation-blocked] → [qa-automation-blocked], push to aurora
Target Milestone: mozilla20 → ---
Target Milestone: --- → mozilla20
https://hg.mozilla.org/releases/mozilla-aurora/rev/6952c37a602f
Keywords: checkin-needed
Whiteboard: [qa-automation-blocked], push to aurora → [qa-automation-blocked]
All fine here for Nightly and Aurora. No aborts anymore. Thanks a lot!
We are also hitting this with Firefox 18 Beta. I totally missed that bug 87717 also landed for that release. So we have to backport this patch to mozilla-beta now. 

Adam, can you please take care of?
Comment on attachment 686765 [details] [diff] [review]
Ready for checkin

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 87717
User impact if declined: Broken Mozmill (hanging on certain tests) on Linux
Testing completed (on m-c, etc.): m-i, m-c, m-a
Risk to taking this patch (and alternatives if risky): None
String or UUID changes made by this patch: None
Attachment #686765 - Flags: approval-mozilla-beta?
Attachment #686765 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Keywords: checkin-needed
Whiteboard: [qa-automation-blocked] → [qa-automation-blocked], push to beta
https://hg.mozilla.org/releases/mozilla-beta/rev/8af1e554573b
Keywords: checkin-needed
Whiteboard: [qa-automation-blocked], push to beta → [qa-automation-blocked]
No more aborts for Mozmill testruns with the latest Firefox 18.0 beta build. Marking as verified fixed!
Whiteboard: [qa-automation-blocked]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: