Open Bug 1296836 Opened 8 years ago Updated 9 months ago

Start-up speed slow caused by local network (wireless router)

Categories

(Toolkit :: Startup and Profile System, defect, P3)

48 Branch
defect

Tracking

()

People

(Reporter: i, Unassigned)

References

Details

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:48.0) Gecko/20100101 Firefox/48.0
Build ID: 20160730000000

Steps to reproduce:

I have bought a new wireless router (ZyXEL NBG-418N v2) and use WiFi to connect to the internect.




Actual results:

When I am using this network, Firefox takes 15 ~ 20 seconds to start. But on other network or disconnected (offline), it only takes 2 ~ 3 seconds.

On this network, other browsers like Chrome, are not affected.

Tor Broswer, which is build on Firefox 45.x, is not affected either.


Expected results:

I expect Firefox have the same start speed compared with other network. Network should not affect application start.
Does the issue happen with safe mode and clean profile?
Flags: needinfo?(guoyunhebrave)
Yes, it happen with safe mode and reset firefox and a totally new profile.
Flags: needinfo?(guoyunhebrave)
If you're using distribution package of Firefox, can you try Mozilla's build instead?
If the issue still happens, can you check what kind of network access/connection is done from Firefox process?
Also, can you check if the issue happens on older versions of Firefox?
If the issue doesn't happen on older versions, you could find out what change introduced the issue by mozregression.
http://mozilla.github.io/mozregression/
Flags: needinfo?(guoyunhebrave)
Mozilla build 48.0: Yes.

Old versions:

47.0: Yes.

46.0: Yes.

45.0: Yes.

Do not know how to check network access/connection. Can you give me some instruction?
Flags: needinfo?(guoyunhebrave)
Here is the firefox performance report when I restart Firefox 51 Nightly, which takes about 20 seconds.

https://cleopatra.io/#report=aff9f82acdd25c7a7345a6a1706e88da32a60e85&selection=0,1,2,3,2,4,4,5,6,2,7,8,2,9,2,10,11,12,13,2,4,1662,1663,2,1675,1676,1679,1680,1706,2,3,2,4,4,5,6,2,1994,1995,2,2290,25,26,2,2296,2,22,24,25,26,2

But probably it doesn't contain useful information.
you could use Wireshark, to monitor network access.

https://www.wireshark.org/

it might be available as a package.
Attached file Wireshark capture file (obsolete) (deleted) —
Attached image Wireshark screenshot
In the wireshark results, here are several host names:

linux-onr7.site - My computer, the host name is automatically generated when I install operating system.

zyxel.com - ZyXEL is the brand of my wireless router.

Look like Firefox tried to connect all the following DNS:

----------------------------

linux-onr7.site.site

www.gstatic.com

www.gstatic.com (AAAA)

linux-onr7.site

linux-onr7.site.zyxel.com

linux-onr7.zyxel.com

linux-onr7.site.site

linux-onr7.site

linux-onr7.zyxel.com

----------------------------

Every request takes 1-2 seconds. But these hosts, except gstatic.com, are not existing.
The content of attachment 8784064 [details] has been deleted for the following reason:

Sensitive data
thanks dylan :)

sorry, I should've mentioned it at first.
network capture may contain private information, so please do not upload the raw capture.

(In reply to Guo Yunhe from comment #10)
> Look like Firefox tried to connect all the following DNS:
> Every request takes 1-2 seconds. But these hosts, except gstatic.com, are
> not existing.

thanks.
moving to Networking: DNS, for now.
Component: Untriaged → Networking: DNS
Product: Firefox → Core
Can you make a http log:
https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

you can also sent it directly to me per e-mail.
Flags: needinfo?(guoyunhebrave)
I did it before but after Firefox start up it is still empty. So I think HTTP log doesn't give useful information.
Flags: needinfo?(guoyunhebrave)
Tentatively assigning this unconfirmed to Dragana, feel free to bounce or reassign.
Assignee: nobody → dd.mozilla
Whiteboard: [necko-active]
(In reply to Guo Yunhe from comment #14)
> I did it before but after Firefox start up it is still empty. So I think
> HTTP log doesn't give useful information.

export MOZ_LOG=timestamp,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,nsHostResolver:5
export MOZ_LOG_FILE=/tmp/log.txt
cd /path/to/firefox
./firefox

thanks!
Flags: needinfo?(guoyunhebrave)
I did what you told me to do again:

> export MOZ_LOG=timestamp,nsHttp:5,nsSocketTransport:5,nsStreamPump:5,nsHostResolver:5
> export MOZ_LOG_FILE=/tmp/log.txt
> cd /path/to/firefox
> ./firefox

But the /tmp/log.txt is empty.
Gue, please use NSPR_LOG_MODULES instead of MOZ_LOG and NSPR_LOG_FILE instead of MOZ_LOG_FILE.
Sorry.. Guo! :)
I got a log from him.

He was saying that NSPR_LOG_MODULES(from https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging) does not work, so I asked him to try with MOZ_LOG.

I do not know which one worked in the end.
Both MOZ_LOG_MODULES and MOZ_LOG works on Mozilla build Firefox 51.0 Nightly. I send you the log of MOZ_LOG. I also have a log of MOZ_LOG_MODULES if you need it.
Flags: needinfo?(guoyunhebrave)
I will need more help here.
From the log I do not see anything strange.

dns resolution for hosts linux-onr7... are not coming from firefox. They are not in the log, I would see them in the log.

Looking at the log, the networking part of firefox is working just normal. The log starts at 2016-08-24 09:41:34.477083 UTC and 1s later there is data from network coming.

So can you please explain what is going on. When you start firefox you do not see anything for around 20s? or it starts but it does not render any page?

I am trying to figure out where is the delay. Thanks.
Flags: needinfo?(guoyunhebrave)
When I start Firefox, in the first 15 ~ 20 seconds, here is nothing I can see. No window, no dialog, no warning. After 15~20 seconds, the window appears and loads web pages.
Flags: needinfo?(guoyunhebrave)
(In reply to Guo Yunhe from comment #23)
> When I start Firefox, in the first 15 ~ 20 seconds, here is nothing I can
> see. No window, no dialog, no warning. After 15~20 seconds, the window
> appears and loads web pages.

Thanks! I will changed the status to untriaged so that other people are notified and we can find a proper place for the bug.

So this is not a networking bug, and from log I see that socketThread is started only after this delay.
Assignee: dd.mozilla → nobody
Component: Networking: DNS → Untriaged
Whiteboard: [necko-active]
Another interesting finding: NetBeans IDE has the same delay of 15 ~ 20 seconds.

Maybe NetBeans has something same with Mozilla applications?
Are you using anti-virus or some security related software?
If so, can you try disabling it to see if it's related to the delay?
I am using SuSEfirewall2. When I disabled it, I still have the delay.

I do not have other security software.
thanks.

Is firefox process taking much CPU while the delay?
No. Firefox just use 0% CPU and 8 MB RAM and no disk IO.
I removed my delay by doing following steps:

1. Modify "/etc/hostname"
------------------------------
guo-kudo
------------------------------


2. Modify "/etc/hosts"
------------------------------
127.0.0.1    localhost guo-kudo guo-kudo.guoyunhe.me
------------------------------

3. Restart computer.

Now Firefox starts in 2 ~ 3 seconds, the same with Thunderbird and NetBeans.

Before:

$ hostname -f
Timeout and cannot find

After:

$ hostname -f
localhost

So I think this problem is related to hostname/FQDN lookup.
so far I see only one call for gethostbyname with hostname here, on main thread, before opening window

https://dxr.mozilla.org/mozilla-release/rev/440936afbdbcc443de50100956a446f7702f118d/toolkit/profile/nsProfileLock.cpp#346-355
>     char hostname[256];
>     PRStatus status = PR_GetSystemInfo(PR_SI_HOSTNAME, hostname, sizeof hostname);
>     if (status == PR_SUCCESS)
>     {
>         char netdbbuf[PR_NETDB_BUF_SIZE];
>         PRHostEnt hostent;
>         status = PR_GetHostByName(hostname, netdbbuf, sizeof netdbbuf, &hostent);
>         if (status == PR_SUCCESS)
>             memcpy(&inaddr, hostent.h_addr, sizeof inaddr);
>     }

here's backtrace:
#0  __gethostbyname_r (name=0x7fffffffcd10, resbuf=0x7fffffffc770, buffer=0x7fffffffc790, buflen=1024, 
    result=0x7fffffffc760, h_errnop=0x7fffffffc75c) at ../nss/getXXbyYY_r.c:150
#1  0x00007ffff66d7db4 in PR_GetHostByName () from /usr/firefox/nightly/libnspr4.so
#2  0x00007fffeb21cd6d in nsProfileLock::LockWithSymlink(nsIFile*, bool) () from /usr/firefox/nightly/libxul.so
#3  0x00007fffeb21d32c in nsProfileLock::Lock(nsIFile*, nsIProfileUnlocker**) () from /usr/firefox/nightly/libxul.so
#4  0x00007fffeb21d586 in nsToolkitProfileLock::Init(nsIFile*, nsIFile*, nsIProfileUnlocker**) ()
   from /usr/firefox/nightly/libxul.so
#5  0x00007fffeb21d970 in nsToolkitProfileLock::Init(nsToolkitProfile*, nsIProfileUnlocker**) ()
   from /usr/firefox/nightly/libxul.so
#6  0x00007fffeb21da0e in nsToolkitProfile::Lock(nsIProfileUnlocker**, nsIProfileLock**) ()
   from /usr/firefox/nightly/libxul.so
#7  0x00007fffeb223f5e in XREMain::XRE_mainStartup(bool*) () from /usr/firefox/nightly/libxul.so
#8  0x00007fffeb2252bd in XREMain::XRE_main(int, char**, nsXREAppData const*) () from /usr/firefox/nightly/libxul.so
#9  0x00007fffeb2255e5 in XRE_main () from /usr/firefox/nightly/libxul.so
#10 0x000000000040fdf1 in do_main(int, char**, char**, nsIFile*) ()
#11 0x000000000040c11a in main ()
actually gethostbyname_r
this is interesting. We can probably fix it. It is not good idea to do gethostbyname on the main thread.

Is toolkit the right place for this bug?
some bugs in "Toolkit -  Startup and Profile System" touched the file.
moving there.
(original code was from "Core Graveyard - Profile: BackEnd")
Component: Untriaged → Startup and Profile System
Product: Core → Toolkit
This is the right component. I don't know what to do about it: I need a posix expert. It seems that currently the profile lock symlink is keyed off the IP address of the host, which is what this delicate dance is supposed to accomplish. We don't actually use the IP address: we just use it as a kind of unique key to determine if the lock/PID is from this machine or a different one.

I'm wondering if we could instead use uname.nodename or something similar. I don't know what the consequences of that might be for cross-version compatibility.

It's also a little strange: this codepath should only be hit if the normal locking method (using LockWithFcntl) fails. Is your profile mounted on a filesystem (NFS?) that doesn't support fcntl/flock?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(guoyunhebrave)
Priority: -- → P3
I have two parts: system root uses Btrfs, user directory (home) uses XFS. Profile is in a XFS part.
Flags: needinfo?(guoyunhebrave)
Not sure if this helps but sometimes i get this message when pages fail to load and instead of displaying the "Server not found":   

XML Parsing Error: unexpected parser state Location: jar:file:///home/adheeth/firefoxN/browser/omni.ja!/chrome/browser/content/browser/aboutNetError.xhtml Line Number 536, Column 68:
        <div id="ed_connectionFailure">&connectionFailure.longDesc;</div>
-------------------------------------------------------------------^

What can we do to move forward? I can still reproduce this issue in Firefox 72 on openSUSE Tumbleweed.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: