Marionette turns off NetworkManager's control of offline state too aggressively

VERIFIED FIXED

Status

defect
--
critical
VERIFIED FIXED
6 years ago
6 years ago

People

(Reporter: stephend, Assigned: hub)

Tracking

unspecified
Dependency tree / graph

Firefox Tracking Flags

(blocking-b2g:tef+, firefox21 wontfix, firefox22 wontfix, firefox23 fixed, b2g18+ fixed, b2g18-v1.0.0 wontfix, b2g18-v1.0.1 fixed)

Details

(Whiteboard: [qa-automation-blocked] [tef-triage])

Attachments

(4 attachments, 3 obsolete attachments)

STR:

1. Flash with the latest Unagi engineering build (unagi.zip from https://pvtbuilds.mozilla.org/pub/mozilla.org/b2g/nightly/mozilla-b2g18-unagi-eng/latest/)
2. run adb logcat
3. turn on wi-fi output in "more information" | "developer tools"
4. connect to Mozilla Guest
5. wait for the connection to be established
6. launch Mozilla Marketplace

Actual Results:

"Marketplace is having problems \n Marketplace is not loading properly"
Blake, is there anything obviously fail-ly going on in the log?
Summary: Connection to Mozilla Guest wi-fi looks successful, but actually isn → Connection to Mozilla Guest wi-fi looks successful, but actually isn't
Not seeing anything weird in the logcat, let me see if I can reproduce.
Works for me on 2/13 build.
Requesting blocking, as this affects both manual testing, and automation -- tons of folks are seeing it, at least in Mountain View, too (using the Guest Wi-Fi network).
blocking-b2g: --- → shira?
(In reply to Stephen Donner [:stephend] from comment #6)
> Requesting blocking, as this affects both manual testing, and automation --
> tons of folks are seeing it, at least in Mountain View, too (using the Guest
> Wi-Fi network).

Not a blocker until it impacts user testing (as opposed to internal testing).

Blake - since you're in MV, can you take a stab at this?
Assignee: nobody → mrbkap
blocking-b2g: shira? → -
tracking-b2g18: --- → +
(In reply to Alex Keybl [:akeybl] from comment #7)
> (In reply to Stephen Donner [:stephend] from comment #6)
> > Requesting blocking, as this affects both manual testing, and automation --
> > tons of folks are seeing it, at least in Mountain View, too (using the Guest
> > Wi-Fi network).
> 
> Not a blocker until it impacts user testing (as opposed to internal testing).

Have you seen the video? https://www.youtube.com/watch?v=RRodyY6Fg-g
(In reply to Stephen Donner [:stephend] from comment #8)
> Have you seen the video? https://www.youtube.com/watch?v=RRodyY6Fg-g

I have now :). We haven't seen these issues outside of Mozilla's strained WiFi. If we have evidence that this happens elsewhere (and with some sort of frequency), we can get this back on the blocker nom list.
(In reply to Alex Keybl [:akeybl] from comment #9)
> (In reply to Stephen Donner [:stephend] from comment #8)
> > Have you seen the video? https://www.youtube.com/watch?v=RRodyY6Fg-g
> 
> I have now :). We haven't seen these issues outside of Mozilla's strained
> WiFi. If we have evidence that this happens elsewhere (and with some sort of
> frequency), we can get this back on the blocker nom list.

I don't necessarily agree with that rationale. The rules of how wifi networks work apply here - though it happens in one place may imply a certain network configuration, network situation, etc that could happen in other places. Things like these really need analysis to diagnose the rate of probability of this happening before concluding a blocking call. And that's something we don't know right now.

Although this wasn't reproducing for me on 2/13 build, I have heard of problems going on with wifi on at the Mt View office with the phones. So there might be something we're not looking at here, which obviously is a tad bit concerning.
Blake - Can you assessment of risk and likelihood of this bug being hit out in the wild? We're trying to figure out if this is a one-off Mozilla wifi issue or a potentially more widespread issue that could happen elsewhere.
Flags: needinfo?(mrbkap)
(In reply to Jason Smith [:jsmith] from comment #11)
> Blake - Can you assessment of risk and likelihood of this bug being hit out
> in the wild? We're trying to figure out if this is a one-off Mozilla wifi
> issue or a potentially more widespread issue that could happen elsewhere.

Blake will be here (in the Mountain View office) tomorrow, so he and I will sit down together and hash this out a bit :-)
Depends on: 838909
Depends on: 838856
I'm going to morph Stephen's bug into the general bug that we're all seeing. I've linked up the two related bugs that I think all have the same root cause of this bug.

We've all confirmed something "weird" is going on with networking in the wifi stack in the Mt View office. Stephen has a reproducible test case as a gaia ui test unit test that shows this off, so we know we've got reproducibility.

The question is how likely could this happen in other places.

We need more information to make a blocking call here. I'll talk with Blake tomorrow to see what he thinks.
Summary: Connection to Mozilla Guest wi-fi looks successful, but actually isn't → Making Wifi & Data Connection in the Mozilla Mt View Office intermittently enters lame-network or complete loss of network connectivity
Whiteboard: [qa-automation-blocked]
(In reply to Jason Smith [:jsmith] from comment #13)
> I'm going to morph Stephen's bug into the general bug that we're all seeing.
> I've linked up the two related bugs that I think all have the same root
> cause of this bug.

Please don't hijack my bugs.  This is a very specific and reproducible case for which I've filed -- and while it *might* indeed be "the bug," for which your other bugs are experiencing issues, we don't know that, and I'd rather keep triage and verification issues pristine.
No longer depends on: 838856, 838909
Summary: Making Wifi & Data Connection in the Mozilla Mt View Office intermittently enters lame-network or complete loss of network connectivity → Connection to Mozilla Guest wi-fi looks successful, but actually isn't
(In reply to Stephen Donner [:stephend] from comment #14)
> (In reply to Jason Smith [:jsmith] from comment #13)
> > I'm going to morph Stephen's bug into the general bug that we're all seeing.
> > I've linked up the two related bugs that I think all have the same root
> > cause of this bug.
> 
> Please don't hijack my bugs.  This is a very specific and reproducible case
> for which I've filed -- and while it *might* indeed be "the bug," for which
> your other bugs are experiencing issues, we don't know that, and I'd rather
> keep triage and verification issues pristine.

Uhh...no. We've got the information to think they are the same root cause issue and this provides a more specific analysis of what the problem is. I spoke with Tony on this - the best way to resolve this if we think they could be the same issue is to link the bugs up as dependencies. Your bug specifies the most general case. I know mine is not an apps issue - it's definitely wifi problem with lame-networks.

In this case your bug is the best candidate to morph to the general case as it has the most general reproduction. I don't see a reason to file a new bug on this after talking with Tony - we'd rather track off of one bug and link up possible candidates. Please don't modify this again - the two of us already talked about this being the best way to track this problem in Mt View. Thanks.
Summary: Connection to Mozilla Guest wi-fi looks successful, but actually isn't → Making Wifi & Data Connection in the Mozilla Mt View Office intermittently enters lame-network or complete loss of network connectivity
Depends on: 838856, 838909
Btw - in the case that you think that they may be the "same" bug, that's when you use dependencies and not dupe directly. I've got a strong enough feeling that they are same problem that we are seeing.
(In reply to Jason Smith [:jsmith] from comment #16)
> Btw - in the case that you think that they may be the "same" bug, that's
> when you use dependencies and not dupe directly. I've got a strong enough
> feeling that they are same problem that we are seeing.

We're not going to have further discussion in this bug until we've had Blake look at my specific use-case.  If you'd like to file a new, all-purpose "Wi-Fi in MV in Gaia is flaky" bug, by all means, go right ahead; make it a [META] bug, add this and bug 838856 and bug 838909 as blockers to it.

I invite you to revisit https://bugzilla.mozilla.org/page.cgi?id=etiquette.html, particularly:

"2. Changing Fields

    No messing with other people's bugs. Unless you are the bug assignee, or have some say over the use of their time, never change the Priority or Target Milestone fields. If in doubt, do not change the fields of bugs you do not own - add a comment instead, suggesting the change."

(Restoring my original summary once more.)

I know about dependencies and duping; dependencies are for, well, dependencies, and in this case, "think that they are the same cause" is good, but not conclusive -- until we know what's going on, it makes more sense to either have a tracking bug (which this is not nor should be) or cross-reference (by comments with links to bug #s, which I encourage) and/or putting them in the "See also" textfield until we know more.
No longer depends on: 838856, 838909
Summary: Making Wifi & Data Connection in the Mozilla Mt View Office intermittently enters lame-network or complete loss of network connectivity → Connection to Mozilla Guest wi-fi looks successful, but actually isn't
Argh. We had a talk about this with tchung and concluded that this was root cause of the bug. Please talk with tchung about this. We've done the analysis to conclude this likely the root cause of the bug - please don't touch the bug again. Thanks. And the little comment on bugzilla etiquette - I do have some use of the time over this bug, so I do have the right to modify it if we know the general problem. Start making an argument for it that makes sense.

Your wiping dependencies that are completely related to the issue that we'll lose track of if they aren't linked up as possible dupe candidates.

b2g triage deemed this bug as the tracked bug for b2g18. They are the ones that have the control over how the bugs are managed. In this case, it makes this one to be the general bug.
Depends on: 838856, 838909
Summary: Connection to Mozilla Guest wi-fi looks successful, but actually isn't → Making Wifi & Data Connection in the Mozilla Mt View Office intermittently enters lame-network or complete loss of network connectivity
The rule about tracking and how bugs are modified is up to the triage drivers - in this case, we've already got one bug tracked for a release. Meta bugs are not allowed to be tracked at all - that's not an option on the table. If something already tracked, then this becomes the general bug that the others link up to.
I'm typing this from stephend's desk. I see the problem, but it isn't because wifi isn't connected. This seems to be mozbrowser thinking it's offline when it is in fact online. I'm going to see if I can debug.
Flags: needinfo?(mrbkap)
I debugged this for a while on Friday without making too much progress. What I did find while I was sitting at stephend's desk was that the phone was definitely online and the browser knew it immediately, but the marketplace, which is a hosted app thought that the phone was offline when it was started.

Annoyingly, while I saw this reproduce consistently on stephend's phone after every reboot, I never reproduced it on my Otoro and Fabrice reproduced it once on another device (and only after several attempts). With that in mind. Stephen, if you wouldn't mind trying a couple of things:
  * When the Marketplace doesn't open, can you try opening another hosted app (you'll have to install one from the marketplace) to see if this is a general problem with all hosted apps?
  * After a reboot, does connecting to wifi, and then cycling the connection (e.g. entering airplane mode and leaving it) cause marketplace to work?

I'm not really sure who should own this bug, I'm probably not the best owner.
Just a FYI to comment 21 - check the dependencies as well to see if there's any connection to this bug. I think there is based on the discussion on those bugs. That might address some of the comments you have above.

What I see as a connection between the set of networking bugs in Mt View is that there's a temporary intermittent lame network state we appear to enter occasionally.
FWIW, I've seen this behavior on my own unagi on my home wifi network.  I don't have an otoro to compare it with.
Gregor reproduced this. This is a Marionette bug as far as I can tell. We load marionette-actors.js unconditionally on phone startup and run the code found at <http://hg.mozilla.org/mozilla-central/file/cc37417e2c28/testing/marionette/marionette-actors.js#l44>. This code causes us to never set Gecko's online/offline state, which causes wacky things to happen (in particular, we think that we're online when we're not or vice versa).

Reading through bug 820617, it appears that we only need to set this pref when we're on pandaboards. Can we only run the code I linked to in that case, somehow?
Flags: needinfo?(jgriffin)
I'm pretty sure not having that breaks testing on the emulator as well.  To tell for sure, I'll have to do a couple of builds and do some testing on them.

If it's true that those lines are only needed for panda testing, then yes, we could handle that case via a special pref, although that won't be very easy to do due to the way that pandas are hooked up into test infrastructure.
Flags: needinfo?(jgriffin)
Component: Wifi → Marionette
No longer depends on: 838856, 838909
Product: Boot2Gecko → Testing
Summary: Making Wifi & Data Connection in the Mozilla Mt View Office intermittently enters lame-network or complete loss of network connectivity → Connection to Mozilla Guest wi-fi looks successful, but actually isn't
Version: unspecified → Trunk
I'm going to give this to jgriffin. Please feel free to bounce it to a more appropriate person. Based on comment 25, it seems like the Marionette startup code should detect if we're on an actual telephone, as opposed to a testing device such as the pandaboard or emulator and use that to trigger the "don't manage offline status" behavior.
Assignee: mrbkap → jgriffin
Summary: Connection to Mozilla Guest wi-fi looks successful, but actually isn't → Marionette turns off NetworkManager's control of offline state too aggressively
(In reply to Blake Kaplan (:mrbkap) from comment #26)
> I'm going to give this to jgriffin. Please feel free to bounce it to a more
> appropriate person. Based on comment 25, it seems like the Marionette
> startup code should detect if we're on an actual telephone, as opposed to a
> testing device such as the pandaboard or emulator and use that to trigger
> the "don't manage offline status" behavior.

I recall the problem we had without this setting, on unagis.  The tests toggle wifi on and off during the course of the tests, and without disabling the "manage offline status" behavior, this causes the Marionette connection to be dropped, which kills the tests, even though the Marionette connection is being delivered over adb port forwarding and not wifi.   Who owns the offline status management now that philikon is gone?

I'm moving this back to Boot2Gecko since this seems like a legitimate issue, and isn't something we can fix in Marionette.
Component: Marionette → Wifi
Product: Testing → Boot2Gecko
Version: Trunk → unspecified
Assignee: jgriffin → nobody
Not sure if this helps or not : I've seen the end result before manually testing.  It's hard to hit.
(In reply to Jonathan Griffin (:jgriffin) from comment #27)
> Who owns the
> offline status management now that philikon is gone?

For the general networking (more a consumer here) it would be me.
For the platform dependent detector of the state, I don't know, probably for windows it would be as well.  But I can take this for b2g as well.
A couple of things. The STR that Gregor was able to find yesterday were:
  * Turn off wifi (the phone should be offline)
  * Attempt to open the marketplace
    -> Close the error dialog (using the "close" option)
  * Turn on wifi and wait for a connection
  * Attempt to open the marketplace

I'd expect the marketplace to open on the first try. Instead you get the "offline" error page again.

Secondly, is there any way we could move the Marionette connection out of the parent process and into another process whose sole job is to send stuff to the other end of the Marionette stuff? That way, we could turn off Gecko's management of online/offline state in the new process and leave the main Gecko process's handling of the state alone? Looking at the code, it appears that we might have to change the platform to do that anyway, but it's an idea.
(In reply to Blake Kaplan (:mrbkap) from comment #30)
 
> Secondly, is there any way we could move the Marionette connection out of
> the parent process and into another process whose sole job is to send stuff
> to the other end of the Marionette stuff? That way, we could turn off
> Gecko's management of online/offline state in the new process and leave the
> main Gecko process's handling of the state alone? Looking at the code, it
> appears that we might have to change the platform to do that anyway, but
> it's an idea.

I'm not sure how this would work.  Marionette needs to live in the parent process, but if you can figure out a way to somehow proxy the information we get through the socket via some other process, that seems fine.  But this introduces another layer in Marionette, which makes me a bit nervous about test stability and the like.

It also seems buggy that disabling wifi would interrupt the Marionette socket, which isn't connected via wifi.  Fixing that problem might be preferable.
To reproduce the problem that we see when we don't turn off network.gonk.manage-offline-status:

1 - build and flash an engineering build with http://mxr.mozilla.org/mozilla-central/source/testing/marionette/marionette-actors.js#44 commented out
2 - adb forward tcp:2828 tcp:2828
3 - pip install marionette_client
4 - start up phone but don't enable wifi
5 - python
>>> import marionette
>>> m = marionette.Marionette()
>>> m.start_session()
u'6-b2g'
>>> m.page_source
(lots of HTML)
6 - enable wifi on the phone
7 - 
>>> m.page_source
(lots of HTML)
8 - disable wifi on the phone
>>> m.page_source
(socket exception)

Marionette is not connected to the phone via wifi, so why does disabling wifi sever the connection?
comment 30's STR looks familiar to a bug I recall Alive worked a while back with the network connection error pages that reproduced with marketplace. Alive - Any ideas?
Flags: needinfo?(alive)
Duplicate of this bug: 839690
The dupe here reveals this is most definitely not a Marionette issue. I am starting to think this is a Gaia::System bug in Alive's court.

Comment 36

6 years ago
Vicamo, could you take this on or find someone like Shian-yow or Vincent to take this on please?
Assignee: nobody → vyang
I have no idea and I couldn't reproduce this. Don't think this is a gaia issue because gecko error page is there.
Flags: needinfo?(alive)
Andrew?  Is this something that the Gecko team can take?  (based on comment 37)
Flags: needinfo?(overholt)
Hub, I recall you working on other NM-related bugs.  Do you have time to investigate here?
Flags: needinfo?(overholt) → needinfo?(hub)
(In reply to Faramarz (:faramarz) from comment #36)
> Vicamo, could you take this on or find someone like Shian-yow or
> Vincent to take this on please?

Yes, we're still investigating on it. No much idea yet. :(
For what it's worth, I can think of two (relatively) low-risk fixes here that cover both the pandaboard case and the Unagi case:
  * Only enable Marionette in special builds of Gecko and not register Marionette at all if that flag isn't set a build-time. This would mean that Marionette wouldn't work on normal user builds.
  * Add a flag to nsServerSocket called "system" or "protected" that protects that socket from being forcefully closed when Gecko is set "offline" as we do now with sockets connected to localhost. We would have to use the flag in the debug console code. That way, we could leave NetworkManager alone whether or not Marionette is enabled – this also has the advantage that Marionette wouldn't modify the behavior of the code that's being tested. I don't know if it would be considered a privacy or security problem to leave an open socket if we're set offline (what is the purpose of closing the sockets?).
(In reply to Blake Kaplan (:mrbkap) from comment #41)
> For what it's worth, I can think of two (relatively) low-risk fixes here
> that cover both the pandaboard case and the Unagi case:
>   * Only enable Marionette in special builds of Gecko and not register
> Marionette at all if that flag isn't set a build-time. This would mean that
> Marionette wouldn't work on normal user builds.

We already do this; Marionette is not enabled in user builds.

>   * Add a flag to nsServerSocket called "system" or "protected" that
> protects that socket from being forcefully closed when Gecko is set
> "offline" as we do now with sockets connected to localhost. We would have to
> use the flag in the debug console code. That way, we could leave
> NetworkManager alone whether or not Marionette is enabled – this also has
> the advantage that Marionette wouldn't modify the behavior of the code
> that's being tested. I don't know if it would be considered a privacy or
> security problem to leave an open socket if we're set offline (what is the
> purpose of closing the sockets?).

Even making this behavior dependent on a pref that we could toggle would work as well.

Comment 43

6 years ago
> I don't know if it would be considered a privacy or security problem to leave an 
> open [loopback TCP to localhost] socket if we're set offline

I can't think of any security/privacy issue with that.  :bsmith has a more devious mind, however... ;)
Flags: needinfo?(bsmith)
I thought we disabled TCP connections to localhost entirely.  There's a bunch of mischief those can cause.
(In reply to Chris Jones [:cjones] [:warhammer] from comment #44)
> I thought we disabled TCP connections to localhost entirely.  There's a
> bunch of mischief those can cause.

Do you mean from content? The code in question is chrome only.
Assignee

Comment 46

6 years ago
:vyang said he was looking at it. Do you still want me to have a look?
Flags: needinfo?(hub)
Vicamo is working on some other tef+ blockers so if you have time, that would be great, Hub!
Assignee

Comment 48

6 years ago
ok, taking it.
Assignee: vyang → hub
Posted patch Possible start of a patch (obsolete) — Splinter Review
I started sketching out my second solution in comment 41. I haven't tested it at all, but based on my understanding of how Marionette ties into the debugger server and the reason the debugger server gets terminated when we go offline, this API should allow the debugger to maintain connections out of band (via adb or ethernet as the case may be). With this, we should be able to leave Gecko's handling of on/offline state alone, which means that Marionette won't be changing the behavior of the code it's trying to test.

I also spoke with bsmith in person. He is fine with us not killing the listening debugger server socket.
Flags: needinfo?(bsmith)
Duplicate of this bug: 845384
Blocks: 824930
Assignee

Comment 51

6 years ago
I am currently being blocked by bug 847724
Depends on: 847724
Assignee

Updated

6 years ago
No longer depends on: 847724

Comment 52

6 years ago
Here is a test case based on the gaia-ui-tests[1] framework to repeat loading the marketplace x times.

There is an option to clear the cache which is intended to clear saved data of the Marketplace app
http://pastebin.mozilla.org/2198775

[1] https://github.com/mozilla/gaia-ui-tests
Assignee

Comment 53

6 years ago
Comment on attachment 718639 [details] [diff] [review]
Possible start of a patch

Review of attachment 718639 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/src/nsSocketTransportService2.cpp
@@ +707,3 @@
>              mIdleList[i].mHandler->IsLocal(&isGuarded);
> +            if (!isGuarded)
> +                mActiveList[i].mHandler->KeepWhenOffline(&isGuarded);

I think it should be mIdleList. Will of course deal with that.
Assignee

Comment 54

6 years ago
This patch works. Fix Blake previous patch, call the new API from Marionette and stop controlling the offline status.

If you feel there is a better reviewer, please let me know who.

r for Malini for the Marionette changes, Blake for the socket changes (his code)
Attachment #718639 - Attachment is obsolete: true
Attachment #726668 - Flags: review?(mrbkap)
Attachment #726668 - Flags: review?(mdas)
Comment on attachment 726668 [details] [diff] [review]
Bug 840612 - Don't close Marionette server socket when going offline.

Review of attachment 726668 [details] [diff] [review]:
-----------------------------------------------------------------

Does this patch work for the STR here: https://bugzilla.mozilla.org/show_bug.cgi?id=840612#c32? If it does, I'm surprised...

The reason I ask is that the change to marionettecomponent.js seems incomplete since the ServerSocket we open on line 94 of marionettecomponent.js is killed shortly after it is created. The Marionette server is actually started here: http://mxr.mozilla.org/mozilla-central/source/testing/marionette/components/marionettecomponent.js#132 which calls http://mxr.mozilla.org/mozilla-central/source/toolkit/devtools/debugger/server/dbg-server.js#204, so I think we'd have to patch the debugger server with the new initSpecialConnection call. That would require us to loop in the remote debugger team (robcee or past) for that particular change. If need be, you can land the ServerSocket bits, and jgriffin or I can deal with the marionette/debugger changes.
Comment on attachment 726668 [details] [diff] [review]
Bug 840612 - Don't close Marionette server socket when going offline.

I can't review my own code (and I'm not a netwerk peer). jduell, would you mind taking a look at the network/base/* changes in this patch?
Attachment #726668 - Flags: review?(mrbkap) → review?(jduell.mcbugs)
Assignee

Comment 57

6 years ago
Updated the debugger side following mdas comments. Socket part unchanged.
Attachment #726668 - Attachment is obsolete: true
Attachment #726668 - Flags: review?(mdas)
Attachment #726668 - Flags: review?(jduell.mcbugs)
Attachment #726897 - Flags: review?(jduell.mcbugs)
Assignee

Updated

6 years ago
Attachment #726897 - Flags: review?(mdas)
Comment on attachment 726897 [details] [diff] [review]
Bug 840612 - Don't close Marionette and debugger server socket when going offline.

Review of attachment 726897 [details] [diff] [review]:
-----------------------------------------------------------------

The Marionette changes look good, but we'll need remote-debugger team's approval for dbg-server.js changes.

past, we need to change the way we initialize the ServerSocket, so that if gecko is in 'offline' state, the server will persist. Right now, if gecko goes into the offline state, the server goes down.
Attachment #726897 - Flags: review?(past)
Attachment #726897 - Flags: review?(mdas)
Attachment #726897 - Flags: review+

Comment 59

6 years ago
Comment on attachment 726897 [details] [diff] [review]
Bug 840612 - Don't close Marionette and debugger server socket when going offline.

Review of attachment 726897 [details] [diff] [review]:
-----------------------------------------------------------------

Necko bits look fine to me, but Honza knows the serversocket code a lot better than I do.  Honza, I think comment 27 and comment 41 together give an idea of the use case here.

::: netwerk/base/public/nsIServerSocket.idl
@@ +64,5 @@
> +     *        Pass -1 to use the default value.
> +     */
> +    void initSpecialConnection(in long aPort,
> +                               in boolean aLoopbackOnly,
> +                               in boolean aKeepWhenOffline,

Might as well make this a flag, so we can have less IDL churn the next time we need another "special" feature?
Attachment #726897 - Flags: review?(jduell.mcbugs)
Attachment #726897 - Flags: review?(honzab.moz)
Attachment #726897 - Flags: feedback+

Comment 60

6 years ago
Comment on attachment 726897 [details] [diff] [review]
Bug 840612 - Don't close Marionette and debugger server socket when going offline.

Review of attachment 726897 [details] [diff] [review]:
-----------------------------------------------------------------

Another question for Honza: do we want to lock in the keeponline behavior of the socket at creation time, or make it a property of the socket that can be modified later on?  Latter more flexible but probably involves some locking/synchronization code complexity.
Comment on attachment 726897 [details] [diff] [review]
Bug 840612 - Don't close Marionette and debugger server socket when going offline.

Review of attachment 726897 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine to me.
Attachment #726897 - Flags: review?(past) → review+
Assignee

Comment 62

6 years ago
> > +    void initSpecialConnection(in long aPort,
> > +                               in boolean aLoopbackOnly,
> > +                               in boolean aKeepWhenOffline,
> 
> Might as well make this a flag, so we can have less IDL churn the next time
> we need another "special" feature?

I have a new version with this change. Will update the patch.
Assignee

Comment 63

6 years ago
Integrated feedback on the API from jduell. The Javascript bits have been adapted to that.
Attachment #726897 - Attachment is obsolete: true
Attachment #726897 - Flags: review?(honzab.moz)
Attachment #727693 - Flags: review?(past)
Attachment #727693 - Flags: review?(mdas)
Assignee

Updated

6 years ago
Attachment #727693 - Flags: review?(honzab.moz)
Comment on attachment 727693 [details] [diff] [review]
Bug 840612 - Don't close Marionette and debugger server socket when going offline.

Review of attachment 727693 [details] [diff] [review]:
-----------------------------------------------------------------

whoo flags, lgtm.
Attachment #727693 - Flags: review?(mdas) → review+
Comment on attachment 727693 [details] [diff] [review]
Bug 840612 - Don't close Marionette and debugger server socket when going offline.

Review of attachment 727693 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM
Attachment #727693 - Flags: review?(past) → review+
Comment on attachment 727693 [details] [diff] [review]
Bug 840612 - Don't close Marionette and debugger server socket when going offline.

Review of attachment 727693 [details] [diff] [review]:
-----------------------------------------------------------------

r=mcmanus on netwerk/* bits.
Attachment #727693 - Flags: review?(honzab.moz) → review+
Assignee

Comment 67

6 years ago
The question is which branch is this supposed to go. I'll check it in on inbound, but do I have to do this on gecko18-b2g too?
Yes, we'll want it there too if possible.  Can you request approval-mozilla-b2g18 status in your patch details?
Comment on attachment 727693 [details] [diff] [review]
Bug 840612 - Don't close Marionette and debugger server socket when going offline.

Review of attachment 727693 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/base/src/nsSocketTransportService2.cpp
@@ +700,4 @@
>              mActiveList[i].mHandler->IsLocal(&isGuarded);
> +            if (!isGuarded)
> +                mActiveList[i].mHandler->KeepWhenOffline(&isGuarded);
> +        }

When I see this (and it's duplicates) wouldn't be better to have just a single method for this?  KeepWhenOffline could also return true for local sockets.
Assignee

Comment 70

6 years ago
(In reply to Honza Bambas (:mayhemer) from comment #69)

> When I see this (and it's duplicates) wouldn't be better to have just a
> single method for this?  KeepWhenOffline could also return true for local
> sockets.

I have no problem doing a followup patch after.
(In reply to Hubert Figuiere [:hub] from comment #70)
> I have no problem doing a followup patch after.

Would be great thanks.  I understand you've needed this quickly done.
Assignee

Comment 73

6 years ago
I have a cosmetic followup to do.
Whiteboard: [qa-automation-blocked] → [qa-automation-blocked][leave-open]
(In reply to Hubert Figuiere [:hub] from comment #74)
> Backed out at philor request
> https://hg.mozilla.org/integration/mozilla-inbound/rev/123d7b2c7307

Do we have an idea on how/when to re-land?
(In reply to Jonathan Griffin (:jgriffin) from comment #76)
> It looks like this broke Marionette tests on B2G: 
> https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=96f2990b1124&showall=1

Ah, thx; would you be able to help with that, or whom else?
I can't fix this.  The problem that, with this patch, the emulator is stuck in offline mode, and can't e.g., load the Marketplace app or an external website.
Assignee

Comment 79

6 years ago
Sorry, I had issues building for the emulator here.

What angle should we take to solve that? Special case the emulator? Make sure the emulator and the NetworkManager interact properly?
(In reply to Hubert Figuiere [:hub] from comment #79)
> Sorry, I had issues building for the emulator here.
> 
> What angle should we take to solve that? Special case the emulator? Make
> sure the emulator and the NetworkManager interact properly?

Ideally the latter, I think, since the emulator is a critical part of our testing story.  If you need help building the emulator, feel free to ping me on #ateam.
Can we leave the NetworkManager hack on only for emulator and pandaboard builds?
Assignee

Comment 82

6 years ago
we could do that. Is there a reliable way to detect these at runtime?
You can use property_get (http://mxr.mozilla.org/mozilla-central/search?string=property_get) with the property "ro.product.device".  For emulator, it will be "generic",  for panda it will be "panda".
Assignee

Comment 85

6 years ago
Comment on attachment 734397 [details] [diff] [review]
Part 2: make sure that networkmanager is ignored on pandaboard and emulator. r=

This patch special case the emulator and pandaboard to deal with the offline status.

This unbreak Marionette from the previous patch
Attachment #734397 - Flags: review?(jgriffin)
Comment on attachment 734397 [details] [diff] [review]
Part 2: make sure that networkmanager is ignored on pandaboard and emulator. r=

Review of attachment 734397 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, let me run it through try.

::: testing/marionette/marionette-actors.js
@@ +53,5 @@
> +    logger.info("Platform detected is " + platform);
> +    bypassOffline = (platform == "generic" || platform == "panda");
> +  }
> +}
> +catch(e) {}

We should do a logger.warn() here, so that if this fails it shows up in the log.
Attachment #734397 - Flags: review?(jgriffin) → review+
Assignee

Comment 88

6 years ago
Comment on attachment 734397 [details] [diff] [review]
Part 2: make sure that networkmanager is ignored on pandaboard and emulator. r=

Review of attachment 734397 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/marionette/marionette-actors.js
@@ +53,5 @@
> +    logger.info("Platform detected is " + platform);
> +    bypassOffline = (platform == "generic" || platform == "panda");
> +  }
> +}
> +catch(e) {}

The problem is that it will fail also on non Gonk platforms because the component do not exit. That's the sole reason I left the exception silent.
Assignee

Comment 89

6 years ago
(In reply to Jonathan Griffin (:jgriffin) from comment #87)
> pushed to try: https://tbpl.mozilla.org/?tree=Try&rev=4f05eac5247c

BTW I already ran it on Try servers before I submitted the patch.
Cool, and this try run was green also.
Assignee

Comment 92

6 years ago
Keeping open, see comment 69. Just better style, shouldn't impact the feature.
Assignee

Comment 95

6 years ago
Comment on attachment 735508 [details] [diff] [review]
Part 3: refactor the Reset method. r=

Honza,

is that what you were talking about in comment 69?
Attachment #735508 - Flags: review?(honzab.moz)
Assignee

Comment 96

6 years ago
Comment on attachment 727693 [details] [diff] [review]
Bug 840612 - Don't close Marionette and debugger server socket when going offline.

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): marionette.

User impact if declined: Eng builds no longer have online-offline status. This cause a lot of test issues where coverage isn't completed.

Testing completed: tested device and emulator.

Risk to taking this patch (and alternatives if risky): side effect in ServerSocket clients. We believe is should be fairly localized.

String or UUID changes made by this patch: nsIServerSocket interface changes.
-[scriptable, uuid(a5b64be0-d563-46bb-ae95-132e46fcd42f)]
+[scriptable, uuid(0df6a0e2-a6b1-4d4c-b30d-f2cb093444e3)]

NOTE: this patch come with part 2 that have a much lower impact and is *required*.
Attachment #727693 - Flags: approval-mozilla-b2g18?
Assignee

Comment 97

6 years ago
re: approval, please let me check it in if approved as the patch does not apply cleanly on b2g18. Quite trivial, but it needs intervention.
Comment on attachment 727693 [details] [diff] [review]
Bug 840612 - Don't close Marionette and debugger server socket when going offline.

We're taking a small amount of risk here in order to support testing - it's the right tradeoff.
Attachment #727693 - Flags: approval-mozilla-b2g18? → approval-mozilla-b2g18+
Comment on attachment 735508 [details] [diff] [review]
Part 3: refactor the Reset method. r=

Review of attachment 735508 [details] [diff] [review]:
-----------------------------------------------------------------

Actually, no.  My suggestion was to merge IsLocal and KeepWhenOffline to a single method (having bool KeepWhenOffline() only).

However, let Patrick decide if it can be useful to have IsLocal info separated.

::: netwerk/base/src/nsSocketTransportService2.cpp
@@ +697,5 @@
> +            socketList[index].mHandler->KeepWhenOffline(&isGuarded);
> +    }
> +    if (!isGuarded)
> +        DetachSocket(socketList, &socketList[index]);
> +}

I'd slightly more prefer following structure:

if (aGuardLocals) {
  bool isGuarded = false;

  socketList[index].mHandler->IsLocal(&isGuarded);
  if (isGuarded)
    return;

  socketList[index].mHandler->KeepWhenOffline(&isGuarded);
  if (isGuarded)
    return;
}

DetachSocket(socketList, &socketList[index]);

@@ +704,1 @@
>  nsSocketTransportService::Reset(bool aGuardLocals)

Is "aGuardLocals" still the correct name for the argument?
Attachment #735508 - Flags: superreview?(mcmanus)
(In reply to Honza Bambas (:mayhemer) from comment #100)
> Comment on attachment 735508 [details] [diff] [review]

> 
> However, let Patrick decide if it can be useful to have IsLocal info
> separated.

I agree - that's useful. Not a big deal to me either way.
Attachment #735508 - Flags: superreview?(mcmanus)
Attachment #735508 - Flags: review?(honzab.moz) → review+
Assignee

Comment 102

6 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ff57f0a9265
Whiteboard: [qa-automation-blocked][leave-open] → [qa-automation-blocked]
https://hg.mozilla.org/mozilla-central/rev/7ff57f0a9265
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Verified FIXED using an Unagi engineering build, with the following:

<!-- Mercurial-Information: <project name="releases/mozilla-b2g18" path="gecko" remote="hgmozillaorg" revision="6a187d801193"/> -->
  <!-- Mercurial-Information: <project name="integration/gaia-v1-train" path="gaia" remote="hgmozillaorg" revision="2bb48b3f4cfe"/> -->
  <project name="gecko.git" path="gecko" remote="mozillaorg" revision="62328d51efcaa981e39735f07e24553b77b7d45d"/>
  <project name="gaia.git" path="gaia" remote="mozillaorg" revision="7ea7918013a012ddf04a1324592b898530488178"/>
Status: RESOLVED → VERIFIED
WebQA wants to begin running tests against 1.0.1, which would require that this patch is uplifted to mozilla-b2g_v101.  Requesting tef+ status.
blocking-b2g: - → tef?
Whiteboard: [qa-automation-blocked] → [qa-automation-blocked] [tef-triage]
in support of of testing as we move to new hardware
blocking-b2g: tef? → tef+
Assignee

Comment 107

6 years ago
taking care of the checkin
Keywords: checkin-needed
Assignee

Comment 108

6 years ago
Here is the try build for 1.0.1. Does not look good.

https://tbpl.mozilla.org/?tree=Try&rev=30dd2bda1021
You need to log in before you can comment on or make changes to this bug.