Closed Bug 1380896 Opened 7 years ago Closed 7 years ago

gmail is slow or stops responding

Categories

(Core :: Networking: HTTP, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
platform-rel --- ?
firefox-esr52 55+ fixed
firefox54 --- wontfix
firefox55 + fixed
firefox56 + fixed
firefox57 + fixed

People

(Reporter: bkelly, Assigned: u408661, NeedInfo)

References

Details

(Keywords: regression, Whiteboard: [platform-rel-Google] [platform-rel-Gmail])

For the last day or two I've noticed gmail has gotten really slow.  Sometimes it just seems to stop working.  Typically I noticed when I try to click from one folder to another.  It says "loading", but just doesn't switch.

I just noticed in the browser console these messages:

https://mail.google.com/mail/u/0/#label/dev-platform/15d33665bfd84ae3 : Unable to run script because scripts are blocked internally.  (unknown)
https://mail.google.com/mail/u/0/#label/-BZ-Watch : Unable to run script because scripts are blocked internally.  (unknown)
https://mail.google.com/mail/u/0/#label/-BZ-Watch/15d3e738df3b8a55 : Unable to run script because scripts are blocked internally.  (unknown)
https://mail.google.com/mail/u/0/#label/-BZ-Watch/15d3e7b6b5832fff : Unable to run script because scripts are blocked internally.  (unknown)
https://mail.google.com/mail/u/0/#label/-BZ-Watch/15d3e8cc01f109dd : Unable to run script because scripts are blocked internally.  (unknown)
https://mail.google.com/mail/u/0/#label/-BZ-Watch/15d3e94560d3a95d : Unable to run script because scripts are blocked internally.

I don't know if these messages are new or not.  Maybe its always done this?

I've been noticing this on windows 10 running 56.0a1 (2017-07-12) (64-bit).

Filing this in DOM for now, but really no idea where it belongs.
Oh, a ctrl+r refresh seems to fix the problem.
I think I'm experiencing something similar in nightly. When I hit ctrl+r, I get the "This page is asking you to confirm that you want to leave - data you have entered may not be saved." dialog.
I do get those same script warnings while gmail is functional though.
Gecko profiler profiles please?
I wonder if one of the add-ons (Classic Theme Restorer, uBlock Origin) may be to blame since the gmail performance does improve once you disable them.
I've heard several complains recently about uBlock Origin.
Chris, by any chance, do you have some addons?
Flags: needinfo?(catlee)
Only Stylish, test pilot and containers.
Flags: needinfo?(catlee)
I get this with zero addons or extensions beyond the defaults for nightly.
Some profile, pretty please. I'm not a gmail user so I have no idea how to reproduce.
The problem is that it's very intermittent and I don't normally keep the profiler running all the time.

I don't think the os monitor showed much cpu in use.  It felt more like a networking stall to me in some way.

I'm away from the computer where I saw this, but can try to capture something when I get back.
I'm seeing this on nightly only starting today (but I might not have updated yesterday). I have only the profiler extension installed but not actively collecting profiles.

It definitely feels like gmail just stopped doing something related to networking or event delivery. Sometimes switching to another tab and back fixes things, but a couple times I've had to completely refresh the tab to get it to recover. I don't think a profile would help in this case, given that the site isn't frozen, it's just not working.

Is it possible this is related to any recent quantum DOM throttling work?
When this happens to me, I can switch between the categories at the top of the gmail page, but I can't open new mails or look at the different categories on the left. I get the same message about navigating away as comment 2 when I try to switch. I have no extensions installed.
Does the site use requestIdleCallback now?  Maybe a compat issue there?  I know the new youtube is trying to debug some issues when our rIC is enabled.
I sometimes get a "still working..." message on the top of the page after clicking a link. But it never finishes working.
Tracking 56+ for investigation, high profile site that must work well with Firefox.
Priority: -- → P1
Flags: needinfo?(mstange)
I've been experiencing this, too, with no addons beyond the profiler. I'll try to capture a profile.
I got a log of this happening from about:networking. I don't really want to post it here in case it contains any GMail credentials. Jason, who can I send this to?
Flags: needinfo?(jduell.mcbugs)
I got this today and saw some time later a 400 failure in network monitor.  Not sure if its related, though.  I have the request headers if someone wants to look at them.
Oh, the response body for the failure in comment 21 and comment 22 has this:

<H1>Unknown SID</H1>

Which suggests some kind of session id failure.  Again, I'm not sure if this is related to the comment 0 issue or not.  I have not seen it in a working tab, though.
Not seeing the problem in 20170721 Nightly. Woohoo!
(In reply to Ajay Gupta from comment #24)
> Not seeing the problem in 20170721 Nightly. Woohoo!

I'm seeing it very frequently with this version, unfortunately.  56.0a1 (2017-07-21) (64-bit)
I'm working through my PTO backlog.  Whenever I try to "goto next message" such that it crosses from one set of 100 messages to the next I seem to hit this problem.  Its nearly triggering 100% of the time in this use case for me.

Whenever I have hit it before it also seems like I was trying to load a message list.
I spoke too soon. I am seeing the problem more often now with the current build 20170723-64bit.

I am surprised that this high-visibility error has not received much attention so far!
(In reply to Ajay Gupta from comment #27)
> I am surprised that this high-visibility error has not received much
> attention so far!

Well, its not because its low priority.  Its because we don't have good steps-to-reproduce yet which makes it difficult to investigate.
That being said, I think I have pretty decent STR now:

1. Log into gmail on nightly
2. Go to a folder/tag that has a lot of existing mail.  I used my "try build" folder.
3. Select all mail in the folder and mark as unread.
4. Page back through the folder until you are 500 or 600 messages deep.
5. Open a message.
6. Use "next message" arrow key or keyboard shortcut if you have enabled ("k" key).
7. Repeat step 6 many, many times.  When it crosses from one page of messages to the next it will likely get stuck.

I was able to reproduce this on the first try using these steps.  All message list pages in my gmail session now seem broken.

This takes a bit of time, but maybe its enough to narrow down a regression window.
Chris, Benjamin, do the steps in comment 29 reproduce for either of you?  Just want to confirm these steps are not unique to my profile.
Flags: needinfo?(catlee)
Flags: needinfo?(benjamin)
(In reply to Ben Kelly [:bkelly] from comment #29)
> That being said, I think I have pretty decent STR now:
> 
> 1. Log into gmail on nightly
> 2. Go to a folder/tag that has a lot of existing mail.  I used my "try
> build" folder.
> 3. Select all mail in the folder and mark as unread.
> 4. Page back through the folder until you are 500 or 600 messages deep.
> 5. Open a message.
> 6. Use "next message" arrow key or keyboard shortcut if you have enabled
> ("k" key).
> 7. Repeat step 6 many, many times.  When it crosses from one page of
> messages to the next it will likely get stuck.
> 
> I was able to reproduce this on the first try using these steps.  All
> message list pages in my gmail session now seem broken.
> 
> This takes a bit of time, but maybe its enough to narrow down a regression
> window.

Tried this STR in my Gmail Trash folder (lots of items - made 400-600 unread). Cannot reproduce the issue. I just upgraded to 20170724-64bit nightly.
I wasn't able to use those STR directly, no.

My suggestion would be, if you can reproduce this on Linux, to collect a network log and an RR trace of the problem.
Flags: needinfo?(benjamin)
I'm on windows unfortunately.
No, STR don't work for me.
Flags: needinfo?(catlee)
platform-rel: --- → ?
Whiteboard: [platform-rel-Google] [platform-rel-Gmail]
This issue is not limited to Nightly. I can reproduce on 55.0b11 (64-bit) OSX. Same conditions as reported above plus I also have issues sending email where the send button greys out, no loading indicator shows, and then GMail becomes unresponsive. Same scripts are blocked internally message as well.
I see this on Firefox 54 on Mac and Windows as well. Buttons stop working, navigating says that Gmail is doing something when it is not and various things just stop working.
I randomly experience it on Linux (Nightly). Can't isolate STR.

Thhe symptoms are similar to what others described. Gmail UI pieces stop responding, back, forward, reply etc.
Summary: gmail is slow or stops responding in nightly → gmail is slow or stops responding
I'm experiencing this too. I *think* it goes away when I set network.http.throttle.enable to false. I'm not 100% sure, but it seems like it stopped happening since I toggled that pref earlier today.
Honza: http log from :billm should be headed your way soon.
Flags: needinfo?(jduell.mcbugs) → needinfo?(honzab.moz)
I've been experiencing this in Nightly for a couple of weeks too.

I'm on OS X. I have TestPilot with MinVid and Screenshots enabled.

Gmail stops responding a few times per day when I try to Archive an email after having read it. It also happens happens when I use "Send + Archive" button after composing a message.

CMD + R resolves the issue.

I haven't figured out a way to reproduce at will.
I've also been seeing this for a little while now. Gmail just stops responding and actions don't immediately execute - such as deleting a message or whatever.

I did attempt to pull a profile on this but I've never been able to get Gecko Profiler to work on this install :(
Tom, can you reproduce this?
Flags: needinfo?(twisniewski)
Hi, 

We have the same (or very similar) problem. I can confirm that at least versions from 50 to 54 have the problem, in Windows 7 and Windows 10. The performance monitos in Windows show no CPU or RAM problem, and the other tabs in mozilla work fine. In our case, the next issues show randomly, a few times every day: 

- I sometimes get a "still working..." message on the top of the page after clicking a link. But it never finishes working.
- Cannot navigate to any link label in the left on the screen
- Gmail buttons in main screen (archive, delete...) and inside a mail (back, archive, delete) don't work. 
- If you click on the 'Send' button in a new mail the windows closes, but the mail is not sent. 

In all cases, if I wait a lot of time (5-10 minutes) the action finished, but this response time is unacceptable. Also, in all cases refreshing the page with F5 or Ctrl+R solves the problem, except for a new sent mail, because if you refresh the page the mail is lost. 

If I can do anything (start a monitor or debugger, I don't mind having it on all day) please tell me how to do it. 

Thanks!
(In reply to Jason Duell [:jduell] (needinfo me) from comment #39)
> Honza: http log from :billm should be headed your way soon.

I think this might be similar to the youtube problem.

Looking into the log, there are two streams that send the requests but there is never a response being received.  The socket under the session the stream belongs to is already not receiving any data during the whole time the log capture started.  Hence, it's very hard to find out what could trigger this.

Can we somehow reach to google and ask if they have updated their h2 servers recently?
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #44)
> Can we somehow reach to google and ask if they have updated their h2 servers
> recently?

We have a thread with them going on this issue.  I'll direct them to your question here.
(In reply to Till Schneidereit [:till] from comment #38)
> I'm experiencing this too. I *think* it goes away when I set
> network.http.throttle.enable to false. I'm not 100% sure, but it seems like
> it stopped happening since I toggled that pref earlier today.

:till ping me on IRC and led to me to this comment!  

I think we have a duplicate of bug 1367861 here.  Looks like there is a code path that incorrectly marks a transaction as h1 and allows throttling on it.
(In reply to Honza Bambas (:mayhemer) from comment #46)
> (In reply to Till Schneidereit [:till] from comment #38)
> > I'm experiencing this too. I *think* it goes away when I set
> > network.http.throttle.enable to false. I'm not 100% sure, but it seems like
> > it stopped happening since I toggled that pref earlier today.
> 
> :till ping me on IRC and led to me to this comment!  
> 
> I think we have a duplicate of bug 1367861 here.  Looks like there is a code
> path that incorrectly marks a transaction as h1 and allows throttling on it.

Gmail still hangs up for me occasionally even with throttling disabled.
I have been having this problem with Firefox 55 (Beta) for the past week or so. Gmail will frequently take forever to send mail, or it will not save drafts. Sometimes the UI in general just stops appearing. I have lost a lot of emails I've composed in the past few days and had to go back to stable.

This is a huge problem. Please do not release Firefox 55 until this is fixed.
(In reply to Josh Aas from comment #48)
> I have been having this problem with Firefox 55 (Beta) for the past week or
> so. Gmail will frequently take forever to send mail, or it will not save
> drafts. Sometimes the UI in general just stops appearing. I have lost a lot
> of emails I've composed in the past few days and had to go back to stable.
> 
> This is a huge problem. Please do not release Firefox 55 until this is fixed.

Does this not happen for you in 54?  Others are finding it affects 54 and even earlier (which perhaps points to a site change triggering a latent bug in Firefox)

If this *was* triggered by a site change, we can get them to change it (probably) once we identify the problem.  If it is affecting back to 50, we may not want to block release on it... but OTOH if a fix is pending shortly, we may want to hold the release for the fix if we don't think the site will fix it quickly (and avoid a point release).  It's a balancing act...  But we need to have a fix for this ASAP in some way (unless it's simply a huge gmail bug we can't fix, in which case we need to lean on them).
I reproduced this problem on my personal gmail under nightly and with a network log.  The high level problem seems to be that one of the many POST requests performed while navigating between pages hangs with all the data received, but the HTTP2 stream does not close and so the channel does not close and so the response is not processed.  In fact, the HTTP2 stream only was closed 1hr 3mins later when the server sends us a GOAWAY command.

The network log shows that we have both sent our request and received our response in a timely fashion.  The data payload is properly processed in the parent and relayed to the child.  However, the expected sequence of Http2Session::CleanupStream, Http2Session::CloseStream, and nsHttpTransaction::Close does not occur on the Socket Thread.  Accordingly, the nsStreamPump doesn't see an OnStateStop, so no stop is relayed to the HttpChannelChild, etc.

I'm going to attach the relevant log excerpts as a hidden security bug attachment momentarily.  (Because I'm not redacting contents and while I don't think there's anything particularly embarassing in the RPC payloads, I definitely don't particularly want my gmail account stolen if it turns out there are usable tokens embedded there, although I'll try and invalidate them now.)
To mitigate the problem here, is it possible to use a system addon to disable HTTP2 (fall back to HTTP 1.l) for specific domains/URL patterns?
Flags: needinfo?(miket)
It's possible that multiple issues are obscuring things, but since downgrading from the 55 beta to 54 stable I have not experienced any problems with gmail.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #52)
> To mitigate the problem here, is it possible to use a system addon to
> disable HTTP2 (fall back to HTTP 1.l) for specific domains/URL patterns?

If that can be done via flipping a pref, then yeah, we should be able to do that with the webcompat gofaster/system addon (which can filter on domains or specific URIs). 

It's already the weekend for Dennis, but I can ask him to cook something up on Monday.

Is `network.http.throttle.enable` the pref we want to toggle?
Flags: needinfo?(miket) → needinfo?(dschubert)
Depends on: 1385358
Filed Bug 1385358 (which Dennis will go to on Monday). 

Honza, which prefs would we want to flip to disable h2, assuming that's what we want to do?
Flags: needinfo?(twisniewski)
Flags: needinfo?(honzab.moz)
Flags: needinfo?(dschubert)
(In reply to Mike Taylor [:miketaylr] (55 Regression Engineering Owner) from comment #55)
> Filed Bug 1385358 (which Dennis will go to on Monday). 
> 
> Honza, which prefs would we want to flip to disable h2, assuming that's what
> we want to do?

network.http.spdy.enabled.http2
Flags: needinfo?(honzab.moz)
Note also this chromium-dev thread, where people mention also stalling Gmail loads on Chromium:

  https://groups.google.com/a/chromium.org/forum/?utm_medium=email&utm_source=footer#!msg/chromium-dev/gwYttMaF_lY/L_yceitaBAAJ
Hi everybody,
In our company we've currently a lot of user reports about problems with Gmail  that seems to be the same as yours.
We use Firefox 52.2.1 ESR.
I'm currently testing downgrading to 52.1.2 ESR because it seems the problems occured since we update from 52.1.2 to 52.2.2.
Update: code-wise most of the work here is happening in bug 1381016 (we suspect a common fix).  But we're not 100% certain of that approach, so we're also trying to land bug 1385358 in 56 so we can disable H/2 per site as needed.

Finally we suspect this may be a google server side issue--:ekr has contacted google and they're "on it".
Component: DOM → Networking: HTTP
Can you please confirm me if 52.x ESR is affected and if you plan to ship bug 1385358 improvment in this branch ?
Yes, it affects ESR52 as well. The tentative fix for this is in bug 1381016 (local testing is looking optimistic so far). Here's some links to trunk-based Try builds if anybody else is interested in trying that patch out (usual caveats about testing Try builds apply).

Linux64: https://queue.taskcluster.net/v1/task/Lvqo1SlgRkG-nDRQJ5aj-g/runs/0/artifacts/public/build/target.tar.bz2
macOS: https://queue.taskcluster.net/v1/task/ATKzmtBaTZSDYbN6ctgUMg/runs/0/artifacts/public/build/target.dmg (may require tweaking Gatekeeper settings to launch due to signing)
Win32: https://queue.taskcluster.net/v1/task/VHuUasPiQb27-uqZN-Rbbg/runs/0/artifacts/public/build/target.zip
Win64: https://queue.taskcluster.net/v1/task/E4QuhVyQRfi_c1IDJ4Duaw/runs/0/artifacts/public/build/target.zip
Status: we've landed a patch in bug 1381016 which we think fixes both that bug and this one.  Anecdotal evidence from testers is promising so far (see bug 1381016 comment 37 for instance).
The fix for this is landing in bug 1381016 and will be in Fx55 RC2 and ESR 52.3.0.
Depends on: 1381016
No longer depends on: 1385358
I'm still seeing an issue while running 20170802100302. It's actually been worse than previous days, so I'll try to get a profile.
Bug 1381016 hasn't been merged to m-c yet (and therefore isn't in a nightly build). There were infra issues blocking merges earlier today.
aaaaand I think I have too many bugzilla tabs open because I thought this had landed already. Disregard #c64. #tabstozero
Bug 1381016 has been merged to mozilla-central and will be in tomorrow's nightlies. The fix has also been included in the latest release candidate builds of Firefox 55 and ESR52.3 due for release next week. We still need to backport the fix to mozilla-beta in time for next week's build of Firefox 56b1, but it's very much on the radar for that too :).
Assignee: nobody → hurley
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
I'm testing the last release candidate build for 52.3.0 ESR since yesterday and the problem is not here anymore.
Thank you !
I just started noticing this again since I updated my nightly. Might be a different problem, but the symptoms are pretty similar. Has anyone else seen this?
I had similar symptoms in Bug 1384957 (which I'm having trouble reproducing even using the old nightly), although I'm not clear if it dupes this bug or not. I was looking down the path of about:config/network.tcp.tcp_fastopen_enable since the necko folks have been experimenting with enabling TCP FastOpen for a few weeks at a time. It has currently been on for the last week (Bug 1389079), and we last turned off around when this bug was fixed.
(In reply to Bill McCloskey (:billm) from comment #69)
> I just started noticing this again since I updated my nightly. Might be a
> different problem, but the symptoms are pretty similar. Has anyone else seen
> this?

I've just seen the hang symptoms again twice on an 8/21 nightly within ~2 hours.  It'll be a day or two before I can investigate and try and capture logs.
If anyone see the hang please turn on http logging, and attach the log or send it to me via e-mail for privacy reasons (cookies may be in there)

use about:networking to turn logging on.

https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

Thank you!!!
I captured one set of logs while this hang was happening for me. Will email you Dragana. This was on wifi at Toronto office, but I've not really been having any packet loss issues other than this hang.
Bill, Andrew, have you seen this issue again? Any update?
Flags: needinfo?(wmccloskey)
Flags: needinfo?(bugmail)
(In reply to Dragana Damjanovic [:dragana] from comment #74)
> Bill, Andrew, have you seen this issue again? Any update?

I have not experienced any recurrences with recent nightlies, and I'm not sure my statement in comment 71 about the nightly build in use was accurate.  I may have had too many sessions open and reported the wrong Firefox instance.  It's very possible I was actually experiencing the pause in an Aug 1st build that pre-dates the fix.  (I think I was running my main profile on more modern locally built versions for a while to help investigate this bug, then switched back to my downloaded firefox install which had not auto-updated because I turn that off in favor of manual update triggering because of weirdness around using a single install for multiple profiles, plus who likes restart pop-ups?)
Flags: needinfo?(bugmail)
I'm still seeing the issue with my 8/21 nightly. I'll restart and see what happens.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #76)
> I'm still seeing the issue with my 8/21 nightly. I'll restart and see what
> happens.

Can you try to make a http log?

https://developer.mozilla.org/en-US/docs/Mozilla/Debugging/HTTP_logging

you can send it directly to via e-mail as well.
So far this isn't happening with today's nightly. I'll post again if it stops working.
Flags: needinfo?(wmccloskey)
I have open a new bug - Bug 1395494.

It seems tat there is a problem with TFO+TLS1.3.
Gmail being really show & unresponsive is definitely still a problem for me on today's nightly.
Same here in 57.0a1 (2017-09-03) (64-bit).

It interesting to notice though that the problem occurs almost in a daily base after 3pm UTC +2, which is 9am EST.
Michael, Christos, do you still see this problem?

We found out that there is a problem with TLS1.3 and the TCP FastOpen (in this combination) in some of Mozilla offices (probably not reduce only to the Mozilla offices, it is probably some network configuration, e.g. maybe firewall configuration).
The TCP FastOpen is turned off by default on 9th September. Therefore if you do not experience this problem after 9th of September it is bug 1395494. If you still experience the problem please open a new bug we should investigate.
Flags: needinfo?(michael)
Flags: needinfo?(christos)
I haven't noticed this problem over the last week or so, so I imagine I'm experiencing bug 1395494.

Thanks!
Flags: needinfo?(michael)
You need to log in before you can comment on or make changes to this bug.