Closed Bug 1008091 Opened 10 years ago Closed 9 years ago

Send NS_NETWORK_LINK_DATA_CHANGED events on Firefox OS and Linux desktop

Categories

(Core :: Networking, defect, P3)

All
Gonk (Firefox OS)
defect

Tracking

()

RESOLVED FIXED
mozilla38
Tracking Status
b2g-v1.3T --- wontfix
b2g-v1.4 --- wontfix
b2g-v2.0 --- wontfix
b2g-v2.0M --- wontfix
b2g-v2.1 --- wontfix
b2g-v2.1S --- wontfix
b2g-v2.2 --- wontfix
b2g-master --- fixed

People

(Reporter: bagder, Assigned: bagder)

References

Details

(Whiteboard: [lame-network])

Attachments

(4 files, 13 obsolete files)

18.05 KB, patch
bagder
: review+
Details | Diff | Splinter Review
2.29 KB, patch
Details | Diff | Splinter Review
18.50 KB, patch
bagder
: review+
Details | Diff | Splinter Review
32.63 KB, application/gzip
Details
When the network topology/setup has changed, a NS_NETWORK_LINK_TOPIC CHANGED event should be sent to observers.

This functionality and event-listeners are introduced for Windows in bug 939318
Depends on: 939318
blocking-b2g: --- → 1.3T?
@ very late stage for tarako, let's move this to 1.4? to see if dolphin needs this
blocking-b2g: 1.3T? → 1.4?
Not needed in v1.4
blocking-b2g: 1.4? → -
Assignee: nobody → daniel
First take. All comments and feedback are welcome!

This patch implements code for "Linux" that reads a NETLINK socket from the kernel to figure out when there are network changes done, and subsequently sends the network changed event (NS_NETWORK_LINK_DATA_CHANGED) if so.

This is a FxOS and Linux desktop platform layer to the generic network-changed event code already landed as part of bug 939318.

Right now this code builds and gets used for XP_LINUX platforms, but I'm not sure if that is accurate enough as I guess this also includes Android - which already has been taken care of separately with bug 1024614.
Attachment #8497476 - Flags: feedback?(sworkman)
Comment on attachment 8497476 [details] [diff] [review]
0001-bug-1008091-send-events-on-network-changes-on-Linux-.patch

Review of attachment 8497476 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. Seems like a nice simple patch. f+

For the future, depending on whether it's needed, you might be able to write a common rate limiter function for all of the platforms in nsIOService. Just mentioning while the idea is in my head.

::: netwerk/build/nsNetModule.cpp
@@ +795,5 @@
>  #endif
>  #if defined(XP_WIN)
>  NS_DEFINE_NAMED_CID(NS_NETWORK_LINK_SERVICE_CID);
> +#elif defined(XP_LINUX)
> +NS_DEFINE_NAMED_CID(NS_NETWORK_LINK_SERVICE_CID);

Looks like you should put this later, after MOZ_WIDGET_ANDROID. MOZ_WIDGET_GONK is for FxOS, so you could #elif defined(MOZ_WIDGET_GONK) or else XP_LINUX if you want to cover all over cases.

Note: I think that _ANDROID is for Firefox on Android and doesn't include the Android base for FxOS.

::: netwerk/system/linux/netlink.cpp
@@ +63,5 @@
> +  *aLinkType = nsINetworkLinkService::LINK_TYPE_UNKNOWN;
> +  return NS_OK;
> +}
> +
> +void nsNotifyAddrListener::netlink()

OnNetlinkMessage? I'll not nitpick too much about names here, but there are a couple other places where var and function cases need aligned with the style guide. That's for another version.

@@ +82,5 @@
> +
> +    bool NetworkChange = false;
> +
> +    for (; NLMSG_OK(nlh, received_bytes);
> +         nlh = NLMSG_NEXT(nlh, received_bytes)) {

Could you do a checksum thing here like you did with Windows?

::: netwerk/system/linux/netlink.h
@@ +3,5 @@
> +/* This Source Code Form is subject to the terms of the Mozilla Public
> + * License, v. 2.0. If a copy of the MPL was not distributed with this
> + * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
> +#ifndef NETLINK_H_
> +#define NETLINK_H_

Let's just call the file nsNotifyAddrListener_Linux.cpp. You might want to rename the Windows one in a similar way if you do this.

@@ +21,5 @@
> +#include "nsThreadUtils.h"
> +#include "nsCOMPtr.h"
> +#include "mozilla/TimeStamp.h"
> +
> +class nsNotifyAddrListener : public nsINetworkLinkService,

namespace mozilla {
namespace net {

@@ +33,5 @@
> +    NS_DECL_NSINETWORKLINKSERVICE
> +    NS_DECL_NSIRUNNABLE
> +    NS_DECL_NSIOBSERVER
> +
> +    nsNotifyAddrListener();

We can prob make this a singleton since we'll only want one instance. XPCOM has macros to help with that a bit. See NS_GENERIC_FACTORY_SINGLETON_CONSTRUCTOR and the declaration for nsIOService in nsNetModule.cpp.

@@ +38,5 @@
> +
> +    nsresult Init(void);
> +    void CheckLinkStatus(void);
> +
> +protected:

Why protected? Are you going to extend the class?

@@ +39,5 @@
> +    nsresult Init(void);
> +    void CheckLinkStatus(void);
> +
> +protected:
> +    class ChangeEvent : public nsRunnable {

This could probably go in the cpp.

::: netwerk/system/moz.build
@@ +15,5 @@
>  
>  if CONFIG['MOZ_WIDGET_TOOLKIT'] == 'android':
>      DIRS += ['android']
>  
> +if CONFIG['OS_ARCH'] == 'Linux':

Maybe 'else if' to exclude Android from the linux list.
Attachment #8497476 - Flags: feedback?(sworkman) → feedback+
Thanks for excellent review comments, Steve. Here's another iteration of the patch with most feedback addressed, but some aren't and I'd like to mention why I didn't change a few things from your comments, see below.

I found that the b2g emulator seems to fail to create the pipe I use for telling the notifyaddr thread to shutdown which I have no explanation for and right now the patch will just make the thread not run in that case and thus it won't detect network changes properly for that setup.

Comments on feedback:

checksum

I could do it, but decided not to for these reasons:

1 - The code reading the NETLINK feed really only have to act on changes we care 
about so we should already there filter out non-intersting changes.

2 - to get a good checksum, we need to read all addresses off all interfaces
and 'getifaddrs' is not provided by bionic. We can provide our own
implementation of it but that would make the patch less simple.

3 - The checksum remains more suitable for plaforms where we cannot like in
(1) filter on which actions we get notified and thus on Windows for example we
get signalled several times when nothing we care about has changed. This is
not the case for Linux.

namespace

I went with not using the namespace there just to make it work exactly like
the Windows class as the code using this turn ugly if it becomes platform
dependent. I figure we should change both (all!) of them to use the namespace
properly but possibly as a separate patch in a single shot?

singleton

Similar argument here. It makes more sense and is simpler to fix all the
different platform backends to be singletons at once I think.

ChangeEvent

I left it in the class in the header file as that's exactly how the Windows
header does it (was implemented like that before I ever touched it) so I
decided it made sense to keep it as similar as possible to the Windows version
of the code in that regard.

Follow-ups

If we agree to the general approach to this patch for the FxOS + Linus fixes, I suggest I work on fixing (at least) namespace and singleton as separate patches and then have them done multi-platform accordingly and clean up broadly. I should probably do that as a separate bug just to keep the title and focus appropriate. I would prefer that way, to make sure we don't mix in too much clean-up work into this more basic detect-network-changes work.

Try-run: this version of the patch ran through try like this: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=95d7ce7447d8
Attachment #8497476 - Attachment is obsolete: true
Attachment #8500317 - Flags: review?(sworkman)
This patch also crashes in the "B2G ICS Emulator opt" run on a FD_SET() line. Could it be an FD_SETSIZE problem or is there something else? I'll see if I can avoid this problem if I use PR_Select() instead...
(In reply to Daniel Stenberg [:bagder] from comment #6)
> This patch also crashes in the "B2G ICS Emulator opt" run on a FD_SET()
> line. Could it be an FD_SETSIZE problem or is there something else? I'll see
> if I can avoid this problem if I use PR_Select() instead...

Scratch that. I'm trying a different take now using poll() instead and it seems to work even on the emulator. Posting an updated patch once I've did more testing.
its actually impossible to use pr_select() safely. don't do it :)
v3. This patch version mostly makes one change from v2: it moves to using poll() instead of select() to avoid FD_SET() problems I seemed to be getting on the b2g ics emulator.

This patch has been given a try and the results are here: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=579eb8df4327
Attachment #8500317 - Attachment is obsolete: true
Attachment #8500317 - Flags: review?(sworkman)
Attachment #8500870 - Flags: review?(sworkman)
Comment on attachment 8500870 [details] [diff] [review]
v3-0001-bug-1008091-send-network-change-events-on-FxOS-Li.patch

Review of attachment 8500870 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. Some style nits, a question about Init and one about GetIsLinkUp and its friends.

Otherwise, r=me.

::: netwerk/build/moz.build
@@ +42,5 @@
>      LOCAL_INCLUDES += [
>          '../system/win32',
>      ]
>  
> +if CONFIG['OS_ARCH'] == 'Linux':

Exclude MOZ_WIDGET_ANDROID, right?

::: netwerk/system/linux/nsNotifyAddrListener_Linux.cpp
@@ +44,5 @@
> +
> +NS_IMETHODIMP
> +nsNotifyAddrListener::GetIsLinkUp(bool *aIsUp)
> +{
> +    // XXX This function has not yet been implemented for this platform

Is there a plan for these functions? It doesn't look like they're being called on FxOS from a quick scan on DXR. I know they have offline status enabled, but it's network.gonk.manage-offline-status in b2g.js. I see io.offline being set directly here http://dxr.mozilla.org/mozilla-central/source/dom/system/gonk/NetworkManager.js#657

Probably fine to come back to it; just want to know if you've had thoughts.

@@ +67,5 @@
> +  *aLinkType = nsINetworkLinkService::LINK_TYPE_UNKNOWN;
> +  return NS_OK;
> +}
> +
> +void nsNotifyAddrListener::OnNetlinkMessage(int NetlinkSocket)

s/NetlinkSocket/aNetlinkSocket/

@@ +73,5 @@
> +    struct  nlmsghdr *nlh;
> +    struct  rtmsg *route_entry;
> +    char buffer[4095];
> +
> +    NS_ASSERTION(NetlinkSocket >= 0, "bad mNetlinkSocket");

"bad aNetlinkSocket"

@@ +85,5 @@
> +    size_t netlink_bytes = rc;
> +
> +    nlh = reinterpret_cast<struct nlmsghdr *>(buffer);
> +
> +    bool NetworkChange = false;

'n'

@@ +124,5 @@
> +
> +NS_IMETHODIMP
> +nsNotifyAddrListener::Run()
> +{
> +    nsresult result = NS_OK;

s/result/rv/
Declare just before first usage.

@@ +128,5 @@
> +    nsresult result = NS_OK;
> +
> +    PR_SetCurrentThreadName("Link Monitor");
> +
> +    int NetlinkSocket = socket(AF_NETLINK, SOCK_RAW, NETLINK_ROUTE);

'n'

@@ +150,5 @@
> +    // switch the socket into non-blocking
> +    int flags = fcntl(NetlinkSocket, F_GETFL, 0);
> +    (void)fcntl(NetlinkSocket, F_SETFL, flags | O_NONBLOCK);
> +
> +    bool Shutdown = false;

nit: lower case 's'. Move to just before the while loop.

@@ +163,5 @@
> +    fds[1].revents = 0;
> +
> +    while (!Shutdown) {
> +
> +        int rc = poll(fds, 2, -1);

empty line.

@@ +196,5 @@
> +}
> +
> +nsresult
> +nsNotifyAddrListener::Init(void)
> +{

if (mInited) { return NS_OK; }

Not needed?

::: netwerk/system/linux/nsNotifyAddrListener_Linux.h
@@ +51,5 @@
> +        const char *mEventID;
> +    };
> +
> +    bool mLinkUp;
> +    bool mStatusKnown;

nit: Please put vars all together at the end of the class.

Some comments for the functions and vars in this class would be good.

@@ +60,5 @@
> +    nsCOMPtr<nsIThread> mThread;
> +
> +    void OnNetlinkMessage(int NetlinkSocket);
> +
> +    int mShutdownPipe[2]; // a pipe to signal shutdown over

nit: Put the comment on the line before; Capitalize and add a period. :)
Attachment #8500870 - Flags: review?(sworkman) → review+
Summary: Send NS_NETWORK_LINK_DATA_CHANGED events on Firefox OS → Send NS_NETWORK_LINK_DATA_CHANGED events on Firefox OS and Linux desktop
v4. Addressed the style nits. https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=e993c75f651c

Two answers to Steve's comments above:

> Is there a plan for these functions?

Yes, they're used by nsIOService which is used on desktop Linux and they're part of the NS_DECL_NSINETWORKLINKSERVICE.

> if (mInited) { return NS_OK; }
> Not needed?

I don't think it is, it isn't needed for the Windows version of the same functionality and I've not seen the need here either.
Attachment #8500870 - Attachment is obsolete: true
Attachment #8501692 - Flags: review+
I'd like bug 1077084 to get a fix first before this lands in order to not rock the boat more.
Depends on: 1077084
v5. Updated commit message.
Attachment #8501692 - Attachment is obsolete: true
Attachment #8504675 - Flags: review+
Keywords: checkin-needed
Backed out for B2G cpptest perma-fail. Was visible on your Try run too.
https://hg.mozilla.org/integration/mozilla-inbound/rev/980d33b2ca6c
Frequent B2G emu timeouts all around.
I see lots of them too, but all of them seemed to have a long history already of failures and I couldn't see any clear connection with this particular patch. I'll look closer and deeper.
Here's the same patch just pushed again to try, now with just two time-out errors:

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c712eeb6bf56
No longer depends on: 237623
And even if I pref off the actual events, these two test cases still fail: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=8f02db7c4c7c
Blocks: 1097440
Per https://bugzilla.mozilla.org/show_bug.cgi?id=1097440#c11
Partner need this patch for 2.0M.
blocking-b2g: - → 2.0M+
Status update on this patch:

As it is right now, it causes random hangs in the ICS b2g emulator which makes tests fail. Especially xpcshell tests, and it can hang on the second test or it can manage to get almost everyone to run through before it hits a problem. The Linux desktop uses the exact same code path but does not show the problem.

The use of pipe seems to be what is not working reliably, but I've not been able to reproduce the problems locally in either emulator runs nor native Linux tests. The pipe is used to signal the child thread to die, and I use a pipe since the thread is already sitting in a poll() waiting for activity on the netlink socket that gets network change information.
Blocks: Woodduck
Comment on attachment 8504675 [details] [diff] [review]
v5-0001-bug-1008091-send-network-change-events-on-FxOS-an.patch

Review of attachment 8504675 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/system/linux/nsNotifyAddrListener_Linux.cpp
@@ +35,5 @@
> +}
> +
> +nsNotifyAddrListener::~nsNotifyAddrListener()
> +{
> +    NS_ASSERTION(!mThread, "nsNotifyAddrListener thread shutdown failed");

NS_ASSERTION's are pretty useless (they only warn on the console IIRC).  MOZ_ASSERT is much better--aborts hard on debug builds.

@@ +175,5 @@
> +            break;
> +        }
> +    }
> +
> +    close(netlinkSocket);

perhaps you should also close the shutdown pipe in all the cases where you currently only close netLinkSocket?  That's just a crazy guess.

@@ +185,5 @@
> +nsNotifyAddrListener::Observe(nsISupports *subject,
> +                              const char *topic,
> +                              const char16_t *data)
> +{
> +    if (!strcmp("xpcom-shutdown-threads", topic)) {

So the issue could also conceivably be that the emulator isn't delivering xpcom-shutdown-threads, right?  I.e. without it we'll never write to or close the pipe.

@@ +207,5 @@
> +
> +    Preferences::AddBoolVarCache(&mAllowChangedEvent,
> +                                 NETWORK_NOTIFY_CHANGED_PREF, true);
> +
> +    rv = NS_NewThread(getter_AddRefs(mThread), this);

Ah, so you're using the same object one two threads: it sits on the main thread waiting for process exit notifications, and it's also the object that gets run as a thread.  That seems OK, although it makes me a little nervous.
dhylands:  a little birdie told me that you're a good person to ask about the B2G emulator.  Can you think of any reason why pipe() would be misbehaving as in comment 23, i.e. seemingly losing a write?  Or if there's any way we could periodically not observe "xpcom-shutdown-threads" during shutdown?
Flags: needinfo?(dhylands)
Comment on attachment 8504675 [details] [diff] [review]
v5-0001-bug-1008091-send-network-change-events-on-FxOS-an.patch

Review of attachment 8504675 [details] [diff] [review]:
-----------------------------------------------------------------

I think that the system calls all need to handle EINTR properly.

::: netwerk/system/linux/nsNotifyAddrListener_Linux.cpp
@@ +31,5 @@
> +    : mLinkUp(true)  // assume true by default
> +    , mStatusKnown(false)
> +    , mAllowChangedEvent(true)
> +{
> +}

The constructor should initialize mShutdownPipe[0] and [1] to -1 since there are code paths which would otherwise cause them to be uninitilized when the destructor is called (i.e. if ::Init fails)

@@ +38,5 @@
> +{
> +    NS_ASSERTION(!mThread, "nsNotifyAddrListener thread shutdown failed");
> +
> +    close(mShutdownPipe[0]);
> +    close(mShutdownPipe[1]);

close can fail with errno == EINTR - see note below

This should also verify that the pipe was actually opened. If the pipe call in ::Init fails, then this destructor will still get called and this will be closing random file handles

@@ +76,5 @@
> +
> +    NS_ASSERTION(aNetlinkSocket >= 0, "bad aNetlinkSocket");
> +
> +    // Receiving netlink socket data
> +    ssize_t rc = recv(aNetlinkSocket, buffer, sizeof(buffer), 0);

recv can fail with errno == EINTR - see note below

@@ +140,5 @@
> +        RTMGRP_IPV6_IFADDR | RTMGRP_IPV6_ROUTE;
> +
> +    if (bind(netlinkSocket, (struct sockaddr *)&addr, sizeof(addr)) < 0) {
> +        // failure!
> +        close(netlinkSocket);

close can fail with errno == EINTR (see note below)

@@ +161,5 @@
> +
> +    nsresult rv = NS_OK;
> +    bool shutdown = false;
> +    while (!shutdown) {
> +        int rc = poll(fds, 2, -1);

poll can fail with errno == EINTR, in which case you shouldn't exit the loop, but call poll again.

You should check each system call that you use, and any that can return EINTR should probably use the logic from MOZ_TEMP_FAILURE_RETRY macro:
http://dxr.mozilla.org/mozilla-central/source/xpcom/glue/FileUtils.h#173

MOZ_TEMP_FAILURE_RETRY seems to be only available for GONK, but you could also make it available for linux.

@@ +175,5 @@
> +            break;
> +        }
> +    }
> +
> +    close(netlinkSocket);

close can fail with errno == EINTR

@@ +227,5 @@
> +    if (observerService)
> +        observerService->RemoveObserver(this, "xpcom-shutdown-threads");
> +
> +    // awake the thread to make it terminate
> +    write(mShutdownPipe[1], "1", 1);

write can fail with errno == EINTR
Flags: needinfo?(dhylands)
Here's a slightly updated version of the patch.
Attachment #8504675 - Attachment is obsolete: true
Attachment #8528877 - Flags: review+
Here's the relative patch from the previous one when adding EINTR protection and retry logic to all write/recv/poll/close calls. It also changes an assert and removes one, based on jduell's comments.

Unfortunately, the emulator timeout problem seems to persist and here's a link to a try-run from just now with the two patches:

https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=ec2ac366af95
This is simply the two previous patches (provided for readability) merged into a single patch.
Attachment #8528882 - Flags: review+
Is the failure that the data appears to be sent and just not received?

There is a long standing problem with the emulator where that seems to happen and I spent quite a while looking at it and couldn't find anything to pin the problem on.

If this fails often enough on try then maybe there is a hope to dig into this further.

I think we need some prints from inside the kernel to see if we can figure out if there is a weird race happening. This code all runs in the parent process right? Our build infrastructure has improved quite a bit since I first investigated this and I can now submit customized kernels to try with the emulator.

Is there a particular test that you're running? I'd like to see if I can do the kernel patches to get the logging information locally and then do some try runs and see if I can get some more information.

One thing to keep in mind is that android has its own runtime (called bionic) (where on desktop glibc is typically used) and we've run into bugs in bionic in the past. This doesn't look like it should affect this particular problem since all of the pipe stuff appears to just translate to straight system calls.
(In reply to Dave Hylands [:dhylands] from comment #30)
> Is the failure that the data appears to be sent and just not received?

Yes, that's what it looks like. I'm right now running some extra tests to figure out what the final write() before the timeout returns.

> There is a long standing problem with the emulator where that seems to
> happen and I spent quite a while looking at it and couldn't find anything to
> pin the problem on.
> 
> If this fails often enough on try then maybe there is a hope to dig into
> this further.

This problem triggers basically in every try-run I do, but it isn't easy to tell at which point in time it'll fail so it can fail for basically any test. It just seems to happen often enough that all tests very rarely go through without triggering the problem.

> I think we need some prints from inside the kernel to see if we can figure
> out if there is a weird race happening. This code all runs in the parent
> process right?

The write() to the pipe does, yes I think so.

> Is there a particular test that you're running? I'd like to see if I can do
> the kernel patches to get the logging information locally and then do some
> try runs and see if I can get some more information.

I mostly do "try -b do -p emulator -u xpcshell -t none" to get the xpcshell tests to fail. I've had problems occur in cppunit tests (in the emulator) too but quite as frequently.
blocking-b2g: 2.0M+ → ---
No longer blocks: 1097440
See Also: → 1097440
A test with simply using a socketpair() instead of pipe() to signal the death of the child thread didn't change anything: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=0749e012c58f
I haven't figured out how I can get the standard logging output in a failed emulator try-run so I've not been able to get a lot of specific details out from the failures. Any suggestions from more experienced peeps?

Further, assuming the above logging exercise keeps pointing at a problem in the write/read area, do you have any particular advice Dave, on what logging/printing/monitoring I could add to try to get me further in this not always so straight-forward hunt?
Flags: needinfo?(dhylands)
I've filed bug 1112499 to track this b2g emulator bug separately, and I will implement a work-around for this patch.
Flags: needinfo?(dhylands)
This patch version introduces a work-around for the b2g emulator problems. When it detects that it runs in the emulator, it will poll a shared variable for when to shutdown the child thread.
Attachment #8528877 - Attachment is obsolete: true
Attachment #8528880 - Attachment is obsolete: true
Attachment #8528882 - Attachment is obsolete: true
Attachment #8537691 - Flags: review+
Ok, that wasn't a so successful patch. I started out with this try-run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9856cc0fbf1e which runs all the emulator xpcshell tests green in multiple runs (which never happened with earlier generations of the patch).

However, I suspect my cleanup that only is supposed to do the variable-thing for the emulator doesn't properly detect the the emulator...
Update.

This code no longer tries to detect it being run in the emulator (mostly because I failed  to make it work) and instead it unconditionally times out the poll and loops every second. This should have miniscule impact on the real world and yet it will make the b2g emulator tests stop failing. A recent try-run for emulator-only shows all green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5a98cd997a7a and I'm running a fresh full run to make sure there are no nasty surprises.
Attachment #8537691 - Attachment is obsolete: true
Attachment #8538756 - Flags: review+
Attachment #8538756 - Flags: review+ → review?(sworkman)
Comment on attachment 8538756 [details] [diff] [review]
v9-0001-bug-1008091-send-network-change-events-on-FxOS-an.patch

Review of attachment 8538756 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/system/linux/nsNotifyAddrListener_Linux.cpp
@@ +178,5 @@
> +    nsresult rv = NS_OK;
> +    bool shutdown = false;
> +    while (!shutdown) {
> +        // Loop every second to avoid emulator problem bug 1112499.
> +        int rc = EINTR_RETRY(poll(fds, 2, 1000));

I remember when looking an nsSocketTransportService that I suggested lowering the timeout there. We do a similar thing there, i.e. there's a socket pair for interrupts rather than timing out. I seem to remember Pat having a concern that it would keep the CPU awake, draining the battery. Now, I'm not an expert on battery life, and I may have picked up on this wrong, so I'd like Pat to take a quick look and see what he thinks.
Flags: needinfo?(mcmanus)
steve's right here. This has to be event based in the real world.

I'm fine with polling on the emulator to work around an emulator bug, but not on the phone in your pocket.
Flags: needinfo?(mcmanus)
I left the loop at once per second just to make sure the emulator never actually gets stuck, the loop time is not used on real targets and I made it  this long just to make the impact on a real device really minimal. It could probably be set to 5 or 10 seconds just as well - the only time it is needed is when the emulator bug hits. A single loop by the CPU very rarely has an absolutely minimal impact on battery. It doesn't keep the CPU awake but it wakes it up when the loop needs to run.

However, I respect that if we should have absolutely zero hacks that uses CPU when in fact it shouldn't, then this is such a one. I'm just not sure on how to proceed to get rid of it while we have this bug in the emulator.

There are already several places in Firefox code that detects if running in the emulator and I copied the most commonly used logic when I tried this patch: https://hg.mozilla.org/try/rev/235e4a51fb42 but the end try-run result still made the timeouts occur (https://treeherder.mozilla.org/#/jobs?repo=try&revision=9237daa986b5) which makes me conclude that for some reason that detection doesn't work for me.
Here's the incremental patch on top of the first one that makes the poll() only timeout if running in the emulator. This runs green so far in my emulator-only try-run:  https://treeherder.mozilla.org/#/jobs?repo=try&revision=22025bacd662

I'll do a squashed patch too in a second, just figured it might be easier to review this as a diff on top of the previous change.
Attachment #8540124 - Flags: review?(sworkman)
This is the v9 patch + the emulator-detect-at-run-time patches, squashed into one. v10 replaces the two former ones, but I still leave them around until review just in case someone prefers reading them separately.
See Also: → 1113533
Comment on attachment 8540124 [details] [diff] [review]
0001-emulator-detect-at-run-time-and-only-timeout-poll-if.patch

Review of attachment 8540124 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/system/linux/nsNotifyAddrListener_Linux.cpp
@@ +183,5 @@
> +    int pollTimeout = -1;
> +#ifdef MOZ_WIDGET_GONK
> +    char propQemu[PROPERTY_VALUE_MAX];
> +    property_get("ro.kernel.qemu", propQemu, "");
> +    pollTimeout = !strncmp(propQemu, "1", 1)?100:-1;

Spaces around "?" and ":", you barbarian! :)
Attachment #8540124 - Flags: review?(sworkman) → review+
Hah, I got so excited I made it work so I forgot to keep up the style! =)
Fixed the style nit.
Attachment #8538756 - Attachment is obsolete: true
Attachment #8540124 - Attachment is obsolete: true
Attachment #8540125 - Attachment is obsolete: true
Attachment #8538756 - Flags: review?(sworkman)
Attachment #8540673 - Flags: review+
I'll be off for ~2 weeks now, so I won't move for a commit of this until I get back.
Blocks: 1106220
Comment on attachment 8540673 [details] [diff] [review]
v11-0001-bug-1008091-send-network-change-events-on-FxOS-a.patch

Review of attachment 8540673 [details] [diff] [review]:
-----------------------------------------------------------------

Sorry to jump in. 
Just provide some feedback here.

::: netwerk/system/linux/nsNotifyAddrListener_Linux.cpp
@@ +95,5 @@
> +void nsNotifyAddrListener::OnNetlinkMessage(int aNetlinkSocket)
> +{
> +    struct  nlmsghdr *nlh;
> +    struct  rtmsg *route_entry;
> +    char buffer[4095];

Should we define a constant value or a variable for this, rather than a magic number?
And, should we initialize it with something like memset?

@@ +293,5 @@
> +{
> +    if (!aEventID)
> +        return NS_ERROR_NULL_POINTER;
> +
> +    nsresult rv;

Should we give it an initial value?
(In reply to Kevin Hu [:khu] from comment #49)

> Sorry to jump in. 

No worries at all, I appreciate all the feedback!

> ::: netwerk/system/linux/nsNotifyAddrListener_Linux.cpp
> @@ +95,5 @@
> > +void nsNotifyAddrListener::OnNetlinkMessage(int aNetlinkSocket)
> > +{
> > +    struct  nlmsghdr *nlh;
> > +    struct  rtmsg *route_entry;
> > +    char buffer[4095];
> 
> Should we define a constant value or a variable for this, rather than a
> magic number?

Perhaps, but it is only ever used on this single place and the purpose of the buffer is only to have it big enough to hold a full netlink message from the kernel. This particular size was chosen based on testing and that I've seen existing source code use this. It is not very scientifically chosen, but I've not found any docs indicating what a suitable size should be,

I'll add a comment there, saying basically this.

> And, should we initialize it with something like memset?

No, the buffer gets filled with data by the kernel.

> @@ +293,5 @@
> > +{
> > +    if (!aEventID)
> > +        return NS_ERROR_NULL_POINTER;
> > +
> > +    nsresult rv;
> 
> Should we give it an initial value?

Yeah, that will probably reduce the risk of a future mistake. Thanks!
Addressed two additional review comments
Attachment #8540673 - Attachment is obsolete: true
Attachment #8545188 - Flags: review+
Keywords: checkin-needed
Sorry to be a PITA, but you can you please push this on top of a fresh m-c rev? There are failures in your push that I suspect were from a bad parent and it would be good to confirm. Just emulator builds and tests (opt + debug) is fine.
Keywords: checkin-needed
No problems, coming right up...
Here's the latest patch try-run, b2g emulator + xpcshell tests: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8add4f3d291
Blocks: 1116695
Keywords: checkin-needed
Backed out for B2G mochitest failures. The same ones I was worried about in the last Try push (that caused me to ask for another run). Not sure why you only ran xpcshell in the new one, but I guess I should have insisted on the full B2G run I asked for.
https://hg.mozilla.org/integration/mozilla-inbound/rev/78f910177388

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=5265625&repo=mozilla-inbound
(In reply to Daniel Stenberg [:bagder] from comment #54)
> Here's the latest patch try-run, b2g emulator + xpcshell tests:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8add4f3d291

Hi Daniel,

   I try to apply your patch,but I failed. Could you help me? 

   My steps:
   1.copy your patch to 'B2G/gecko'
   2.git apply patch
   3.build.sh gecko
   4.it will build failed!  
   
   Could you give me some gudiences?
   Thanks a lot!
(In reply to zhaoyang from comment #57)

>    I try to apply your patch,but I failed. Could you help me? 

>    4.it will build failed!  
>    
>    Could you give me some gudiences?

Well, as the above try-run shows it does apply against a recent mozilla-central repo at least. Are you using a recent gecko? Can you show me some details from the build failure? Did the git apply succeed without complaints?
Hi Daniel,

   I used a recent gecko and git apply succeed. But build failed.
   so I did 'rm -rf objdir-gecko out' and './build.sh', build failed again.

   I am very concerned about Bug 1008091. Please give me some gudiences.

   Thanks a lot 

Details error:

../../../../gecko/netwerk/system/linux/nsNotifyAddrListener_Linux.cpp: In member function 'void nsNotifyAddrListener::OnNetlinkMessage(int)':
../../../../gecko/netwerk/system/linux/nsNotifyAddrListener_Linux.cpp:146:19: error: 'NS_NETWORK_LINK_DATA_CHANGED' was not declared in this scope
../../../../gecko/netwerk/system/linux/nsNotifyAddrListener_Linux.cpp: In member function 'nsresult nsNotifyAddrListener::Shutdown()':
../../../../gecko/netwerk/system/linux/nsNotifyAddrListener_Linux.cpp:277:13: warning: unused variable 'rc' [-Wunused-variable]

In the directory  /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/objdir-gecko/netwerk/system/linux
The following command failed to execute properly:
/usr/bin/ccache /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/prebuilts/gcc/linux-x86/arm/arm-linux-androideabi-4.7/bin/arm-linux-androideabi-g++ -o nsNotifyAddrListener_Linux.o -c -I../../../dist/system_wrappers -include /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/gecko/config/gcc_hidden.h -DMOZILLA_INTERNAL_API -DIMPL_LIBXUL -DSTATIC_EXPORTABLE_JS_API -DAB_CD=en-US -DNO_NSPR_10_SUPPORT -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/gecko/netwerk/system/linux -I. -I../../../dist/include -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/objdir-gecko/dist/include/nspr -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/objdir-gecko/dist/include/nss -fPIC -DANDROID -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic/libc/arch-arm/include -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic/libc/include/ -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic/libc/kernel/common -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic/libc/kernel/arch-arm -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic/libm/include -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/system -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/system/core/include -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/hardware/libhardware/include -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/external/valgrind/fxos-include -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/native/include -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/av/include -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/av/include/media -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/av/include/camera -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/native/include/media/openmax -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/av/media/libstagefright/include -DMOZILLA_CLIENT -include ../../../mozilla-config.h -MD -MP -MF .deps/nsNotifyAddrListener_Linux.o.pp -DANDROID -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic/libc/arch-arm/include -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic/libc/include/ -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic/libc/kernel/common -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic/libc/kernel/arch-arm -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic/libm/include -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/system -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/system/core/include -isystem /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/bionic -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/hardware/libhardware/include -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/external/valgrind/fxos-include -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/native/include -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/av/include -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/av/include/media -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/av/include/camera -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/native/include/media/openmax -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/frameworks/av/media/libstagefright/include -Wall -Wpointer-arith -Woverloaded-virtual -Werror=return-type -Werror=int-to-pointer-cast -Werror=type-limits -Wempty-body -Wsign-compare -Wno-invalid-offsetof -mandroid -fno-short-enums -fno-exceptions -Wno-psabi -DMOZ_ENABLE_JS_DUMP -include /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/gonk-misc/Unicode.h -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/gecko/build/stlport/stlport -I/home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/ndk/sources/cxx-stl/system/include -march=armv7-a -mthumb -mfpu=neon -mfloat-abi=softfp -mno-unaligned-access -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -std=gnu++0x -pipe -DNDEBUG -DTRIMMED -g -freorder-blocks -Os -fno-reorder-functions -fomit-frame-pointer /home/likewise-open/SPREADTRUM/david.zhao/code/ea_v2.1/B2G/gecko/netwerk/system/linux/nsNotifyAddrListener_Linux.cpp
make[6]: *** [nsNotifyAddrListener_Linux.o] Error 1
make[5]: *** [netwerk/system/linux/target] Error 2
make[4]: *** [compile] Error 2
make[3]: *** [default] Error 2
make[2]: *** [realbuild] Error 2
make[1]: *** [build] Error 2
make: *** [out/target/product/scx15_sp7715ea/obj/DATA/gecko_intermediates/gecko] Error 2

real	43m50.469s
user	45m3.137s
sys	4m7.187s

> Build failed! <
(In reply to zhaoyang from comment #59)

>    I used a recent gecko and git apply succeed. But build failed.
>    so I did 'rm -rf objdir-gecko out' and './build.sh', build failed again.
> 
>    I am very concerned about Bug 1008091. Please give me some gudiences.
> 
>    Thanks a lot 
> 
> Details error:
> 

> ../../../../gecko/netwerk/system/linux/nsNotifyAddrListener_Linux.cpp:146:19:
> error: 'NS_NETWORK_LINK_DATA_CHANGED' was not declared in this scope

Hi, I'm afraid that is not a recent gecko at all. The NS_NETWORK_LINK_DATA_CHANGED symbol was added as part of bug 939318 (landed in Firefox 35), the initial fix was also subsequently bugfixed (I believe once or twice) so there are a whole slew of patches if we're considering backporting.
Here's the additional delta "nuwa" fix required on top ov the v12 patch to get all the emulator tests run green:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f7e1579002f0
Here's the v12+nuwa squashed into a single patch.
Attachment #8549441 - Flags: review+
And here's the same change (the squashed version) on a try-run on linux and linux64: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c18777148044
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/9ac556188108
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
See Also: → 1116366
(In reply to Daniel Stenberg [:bagder] from comment #63)
> And here's the same change (the squashed version) on a try-run on linux and
> linux64: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c18777148044

Hi Daniel,
   
   2.1 need this feature,please fix it on 2.1
   Thanks a lot
Flags: needinfo?(daniel)
Both 2.1s/2.0m need this.

ni? Seinlin/Vincnet for this.
Flags: needinfo?(vliu)
Flags: needinfo?(kli)
2.1 need it
please help
blocking-b2g: --- → 2.0M?
blocking-b2g: 2.0M? → 2.0M+
blocking-b2g: 2.0M+ → 2.1S?
blocking-b2g: 2.1S? → 2.1S+
Comment on attachment 8549441 [details] [diff] [review]
v13-0001-bug-1008091-send-network-change-events-on-FxOS-a.patch

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: Download failed when switching data connection to wifi. 
Testing completed: 
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch:
Attachment #8549441 - Flags: approval-mozilla-b2g34?
Hi  shawn,
   
    2.1 need it
    Could you help?
    Thanks a lot
Flags: needinfo?(sku)
(In reply to zhaoyang from comment #71)
> Hi  shawn,
>    
>     2.1 need it
>     Could you help?
>     Thanks a lot

Hi zhaoyang:
 Comment 70 is the request for 2.1.
Besides, I believe SPRD will take 2.1s, not 2.1.
Flags: needinfo?(sku)
(In reply to shawn ku [:sku] from comment #72)
> (In reply to zhaoyang from comment #71)
> > Hi  shawn,
> >    
> >     2.1 need it
> >     Could you help?
> >     Thanks a lot
> 
> Hi zhaoyang:
>  Comment 70 is the request for 2.1.
> Besides, I believe SPRD will take 2.1s, not 2.1.

Hi shawn,

   Thank you for your request
   Maybe SPRD will take 2.1s,but it is block issure in SPRD 2.1 now
   so please help 
    
Hi Vincent,
   
   as comment70, 2.1 need this.
   please help,thanks a lot
Flags: needinfo?(vliu)
(In reply to zhaoyang from comment #74)
> (In reply to shawn ku [:sku] from comment #72)
> > (In reply to zhaoyang from comment #71)
> > > Hi  shawn,
> > >    
> > >     2.1 need it
> > >     Could you help?
> > >     Thanks a lot
> > 
> > Hi zhaoyang:
> >  Comment 70 is the request for 2.1.
> > Besides, I believe SPRD will take 2.1s, not 2.1.
> 
> Hi shawn,
> 
>    Thank you for your request
>    Maybe SPRD will take 2.1s,but it is block issure in SPRD 2.1 now
>    so please help 
>     
> Hi Vincent,
>    
>    as comment70, 2.1 need this.
>    please help,thanks a lot

Don't mess thing up.
SPRD should get 2.1s branch, not 2.1.

your 2.1 should indicate to 2.1s branch.
Besides,
2.1 is waiting for approval now.
Flags: needinfo?(vliu)
I got build break when I tried to merge the patch into v2.1s.

../../../../gecko/netwerk/system/linux/nsNotifyAddrListener_Linux.cpp: In member function 'void nsNotifyAddrListener::OnNetlinkMessage(int)':
../../../../gecko/netwerk/system/linux/nsNotifyAddrListener_Linux.cpp:150:19: error: 'NS_NETWORK_LINK_DATA_CHANGED' was not declared in this scope
../../../../gecko/netwerk/system/linux/nsNotifyAddrListener_Linux.cpp: In member function 'nsresult nsNotifyAddrListener::Shutdown()':
../../../../gecko/netwerk/system/linux/nsNotifyAddrListener_Linux.cpp:297:13: warning: unused variable 'rc' [-Wunused-variable]

I found that v2.1s lacks of below definition to define NS_NETWORK_LINK_DATA_CHANGED. Could you please help me to integrate this part into v2.1s? Thanks

https://dxr.mozilla.org/mozilla-central/source/netwerk/base/public/nsINetworkLinkService.idl#
There is a dependency on bug 939318 and its follow-up fixes, as mentioned already in comment 60.
Flags: needinfo?(daniel)
Patch was backed out on device branch, will land it again after bug 939318 is resolved fixed.
Bug 1077084 is also required when backporting.
Comment on attachment 8549441 [details] [diff] [review]
v13-0001-bug-1008091-send-network-change-events-on-FxOS-a.patch

Given the amount of backport/fallouts and nature of the issue I am not sure I would want to pick this up on 2.1. It looks looks like partners are getting this on the branch at their own risk, so I am removing the nom given this is not a new regression as well.
Attachment #8549441 - Flags: approval-mozilla-b2g34? → approval-mozilla-b2g34-
Here's a tarball with a backport to the b2g-2.1 brancg that builds for me. Included are the adjusted 939318 series, the adjusted 1077084 fix and 1008091.
I suspect this is needed for B2G v2.2 as well (b2g37) since it landed after the uplift?
(In reply to Kai-Zhen Li [:seinlin] from comment #78)
> Patch was backed out on device branch, will land it again after bug 939318
> is resolved fixed.

Marked to affected.
Honestly, these b2g branches make me all dizzy. I don't know which that are affected.
Flags: needinfo?(daniel)
(In reply to Daniel Stenberg [:bagder] from comment #85)
> Honestly, these b2g branches make me all dizzy. I don't know which that are
> affected.

Master (trunk/m-c) = 3.0
b2g37 = v2.2
b2g34 = v2.1
b2g32 = v2.0

My point is that at this point it's on v3.0 (m-c) and is intended for a couple partner-specific branches of v2.0 and v2.1. So my question is whether we want it on v2.2 as well. Bhavana, do you know who's driving the uplift requests for this and whether it'd be wanted on v2.2 or not?
Flags: needinfo?(bbajaj)
(In reply to Ryan VanderMeulen [:RyanVM UTC-5] from comment #86)
> (In reply to Daniel Stenberg [:bagder] from comment #85)
> > Honestly, these b2g branches make me all dizzy. I don't know which that are
> > affected.
> 
> Master (trunk/m-c) = 3.0
> b2g37 = v2.2
> b2g34 = v2.1
> b2g32 = v2.0
> 
> My point is that at this point it's on v3.0 (m-c) and is intended for a
> couple partner-specific branches of v2.0 and v2.1. So my question is whether
> we want it on v2.2 as well. Bhavana, do you know who's driving the uplift
> requests for this and whether it'd be wanted on v2.2 or not?

I'd recommend uplifting given we have a while to ship 2.2
Flags: needinfo?(bbajaj)
Josh, Patch on v2.0m and v2.1s were backed out in comment 78. And it depends on bug 939318, I think it is too risk to pick all patches. How do you think?
Flags: needinfo?(jocheng)
Thanks Kai-Zhen,
Let's mark 2.0M as wontfix here.
Flags: needinfo?(jocheng)
Steven, As per comment 88, how do you think?
Flags: needinfo?(styang)
Fine for me not to fix it in 2.1S.
blocking-b2g: 2.1S+ → ---
Flags: needinfo?(styang)
Is this still wanted for v2.2 or should we wontfix?
Flags: needinfo?(bbajaj)
Flags: needinfo?(bbajaj)
You need to log in before you can comment on or make changes to this bug.