Open Bug 1199214 Opened 9 years ago Updated 2 years ago

Clicked Link Does Not Activate Until Mouse Moved

Categories

(Core :: DOM: UI Events & Focus Handling, defect)

40 Branch
x86_64
Linux
defect

Tracking

()

UNCONFIRMED

People

(Reporter: drichard, Unassigned, NeedInfo)

References

Details

Attachments

(7 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:41.0) Gecko/20100101 Firefox/41.0
Build ID: 20150824144923

Steps to reproduce:

We have about 50 beta testers on FF 41.  I am getting similar feedback from several of the people.  Sometimes when you click on a link, it will not actually navigate to the new site until you move you mouse slightly from the click position.  I have been able to replicate this with great frequency.   We're using FF41b4 on 64bit Linux.   This also was happening on FF41b2.

To replicate, do a yahoo search or something and then click on a link without moving the mouse after the click.


Actual results:

When it fails, Firefox will just hang and make no attempt to to the page.  If you move your mouse even slightly, it then begins to work correctly.


Expected results:

For those people that do not move their mouse after a click, the page should just load without the requirement to move the mouse.  Those people that naturally move their mouse after a click will not see this.
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Still broken in FF41b5.  Very often links will freeze when clicked until you move your mouse.
Still happening in FF41b6.
Users have been reporting too that when you select a bookmark, that it hangs until the mouse is moved.  I have confirmed this on B6.

I can easily take a video of this happening.
Ok, I just spent a few hours looking this over.  Firefox 39 does not have this issue, but I just found that it started in Firefox 40.  From the best that I can tell, it happened here:

Firefox 40 - Mozilla Central 

2015-04-01  (WORKS)

https://hg.mozilla.org/mozilla-central/rev/0b88606f8fe7

--------------------------------------------------------

2015-04-02   (FAILS)

https://hg.mozilla.org/mozilla-central/rev/d222742756c4



For my test, I added a few bookmarks to a new account. I then selected Bookmark > <saved bookmark> from the pulldown menus.  On the April 1st build, I have not been able to get it to hang/lock.  But I did on the April 2nd build.  When it hangs, it says Waiting On <site> on the status bar and just sits until you move the mouse which seems to trigger the UI to refresh.  We're getting a good amount of complaints about this issue.
Severity: normal → major
Version: 41 Branch → 40 Branch
It also should be noted that I installed these builds on two totally different servers, and they are both exhibiting the same symptom.
No luck reproducing on my Ubuntu VM. Can you give me some idea of what flavour of Linux you're running? The users you mention... do they all run the same distro?
Flags: needinfo?(drichard)
Keywords: verifyme
I have been trying to figure this one out for weeks and trying to find a pattern. It's there for sure and we are getting support calls.  All of our users are on OpenSuse 11.4.  It just happened on the Mozilla site. I clicked on the button "My Bugs" and the animation came up with the monster and it just sat there until I moved my mouse and then the search completed.  Sometimes moving the mouse once will load the whole page.  Sometimes moving the mouse slightly will cause part of the page to load, and it will stop loading until you move it again and then it will finish.  Very probably this is related to having to fetch content from different sites.   It does seem like once you have visited the site once and it's in cache, that it's less pronounced.  I can easily replicate it on first browser launch.  If I launch the browser, select Bookmarks >>  and then pick a bookmark, I can almost consistently get it to hang until I move my mouse.  I have two OpenSuse 11.4 machines and they are both doing the same thing.  I can make a movie if that helps.

The other idea I had was to trip the bug and then using another workstation get an strace of exactly what it's doing when it's hung up...would that help?

I am currently on 41.0 Beta 8.
Flags: needinfo?(drichard)
(In reply to drichard from comment #7)
> I have been trying to figure this one out for weeks and trying to find a
> pattern. It's there for sure and we are getting support calls.  All of our
> users are on OpenSuse 11.4.  It just happened on the Mozilla site. I clicked
> on the button "My Bugs" and the animation came up with the monster and it
> just sat there until I moved my mouse and then the search completed. 
> Sometimes moving the mouse once will load the whole page.  Sometimes moving
> the mouse slightly will cause part of the page to load, and it will stop
> loading until you move it again and then it will finish.  Very probably this
> is related to having to fetch content from different sites.   It does seem
> like once you have visited the site once and it's in cache, that it's less
> pronounced.  I can easily replicate it on first browser launch.  If I launch
> the browser, select Bookmarks >>  and then pick a bookmark, I can almost
> consistently get it to hang until I move my mouse.  I have two OpenSuse 11.4
> machines and they are both doing the same thing.  I can make a movie if that
> helps.
> 
> The other idea I had was to trip the bug and then using another workstation
> get an strace of exactly what it's doing when it's hung up...would that help?
> 
> I am currently on 41.0 Beta 8.

Sounds like the mouse movement is causing a refresh driver kick, which is otherwise not occurring. That's my guess based on what you're saying.

The regression range you gave in comment 4... are you absolutely sure about it? The WebVR stuff doesn't really strike me as a likely culprit (though I've been wrong before!). How did you get that regression range? Have you tried mozregression[1]?

[1]: http://mozilla.github.io/mozregression/
Flags: needinfo?(drichard)
Mozregression won't run on OpenSuse, so I did it by hand using the daily builds.  After I posted that regression date, I looked at the change log for that date and did see it was unrelated.  But it does seem to be right around that time as best as I can tell.  The problem is that it's not always perfectly easy to replicate.  There were no issues with FF 39...I have been able to make it happen with FF 40.
Flags: needinfo?(drichard)
Why will mozregression not run on OpenSUSE?
Flags: needinfo?(drichard)
I have a bit more information.  I was able to test this issue again, and wanted to see how long it would just lock when I did not move my mouse.

Test #1, I picked a URL from my bookmarks and it sat for a few minutes without loading the page. I then received a email message and a popup came from my lower panel and the page then loaded correctly.  I did not move the mouse in this case, but the popup triggered the page load.

Test #2, I picked a URL from my bookmarks and it sat for a few minutes, I then logged into ssh to the server (from another workstation) and connected to the process with strace and when it was locked, there was absolutely no spewage at all, it was doing 'nothing'.  I then moved the mouse and got strace to dump data which I am attaching.  I don't think this will be useful however, because at that point FF is working correctly.

When it was deadlocked, I took two pictures of the last UI strings that appear in the browser before it hangs.  Whatever is happening is happening after this code.
Flags: needinfo?(drichard)
Here is the error that happens on OpenSuse 11.4 with mozgression.  I was able to move through the daily builds by hand without this program, but it's hard to pinpoint exactly the failure date. 

(none):/u # mozregression
Traceback (most recent call last):
  File "/usr/local/bin/mozregression", line 8, in <module>
    load_entry_point('mozregression==0.38', 'console_scripts', 'mozregression')()
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 318, in load_entry_point
    return get_distribution(dist).load_entry_point(group, name)
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2221, in load_entry_point
    return ep.load()
  File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1954, in load
    entry = __import__(self.module_name, globals(),globals(), ['__name__'])
  File "/usr/local/lib/python2.7/site-packages/mozregression/main.py", line 18, in <module>
    import mozprofile
  File "/usr/local/lib/python2.7/site-packages/mozprofile/__init__.py", line 13, in <module>
    from addons import *
  File "/usr/local/lib/python2.7/site-packages/mozprofile/addons.py", line 21, in <module>
    module_logger = mozlog.getLogger(__name__)
AttributeError: 'module' object has no attribute 'getLogger'
Is there anything that I can do about this bug, we're being flooded with support calls from users that think their web pages are locked up.  Our internal payroll software runs using Firefox and they are having to continually move their mouse to get the pages to load.

I can make videos, it's very simple to replicate and see.
Hey wlach, not sure if you're still the one maintaining mozregression, but do you recognize the error that drichard is setting in comment 15 while attempting to run it on OpenSUSE?
Flags: needinfo?(wlachance)
(In reply to drichard from comment #15)
> Here is the error that happens on OpenSuse 11.4 with mozgression.  I was
> able to move through the daily builds by hand without this program, but it's
> hard to pinpoint exactly the failure date. 
> 
> (none):/u # mozregression
> Traceback (most recent call last):
>   File "/usr/local/bin/mozregression", line 8, in <module>
>     load_entry_point('mozregression==0.38', 'console_scripts',
> 'mozregression')()
>   File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 318, in
> load_entry_point
>     return get_distribution(dist).load_entry_point(group, name)
>   File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 2221, in
> load_entry_point
>     return ep.load()
>   File "/usr/lib/python2.7/site-packages/pkg_resources.py", line 1954, in
> load
>     entry = __import__(self.module_name, globals(),globals(), ['__name__'])
>   File "/usr/local/lib/python2.7/site-packages/mozregression/main.py", line
> 18, in <module>
>     import mozprofile
>   File "/usr/local/lib/python2.7/site-packages/mozprofile/__init__.py", line
> 13, in <module>
>     from addons import *
>   File "/usr/local/lib/python2.7/site-packages/mozprofile/addons.py", line
> 21, in <module>
>     module_logger = mozlog.getLogger(__name__)
> AttributeError: 'module' object has no attribute 'getLogger'

Yeah I think that's an old bug in mozregression (http://mozilla.github.io/mozregression/news.html#0.38-0.39-release). Try upgrading to the latest one (via pip2 install -U mozregression) and you should be good.
Flags: needinfo?(wlachance)
Excellent, thanks wlach. drichard - I think we're going to have far more success tracking this down once we have a proper regression range, and mozregression is the tool for the job. Would you mind attempting to update to the latest version of mozregression and seeing if it works?
Flags: needinfo?(drichard)
I have mozregression working and am trying to get an exact regression date.
We have lots of users on our circuit, so there are variables during the day that make this a little more difficult to replicate exactly the same internet speed on each test, but using the regression tester, it indicates:

199:44.52 LOG: MainThread mozversion INFO platform_version: 39.0a1
Was this inbound build good, bad, or broken? (type 'good', 'bad', 'skip', 'retry', 'back' or 'exit' and press Enter): bad
202:00.39 LOG: MainThread Bisector INFO Narrowed inbound regression window from [d65328c6, 92b51483] (3 revisions) to [d65328c6, 2bf41497] (2 revisions) (~1 steps left)
202:00.39 LOG: MainThread main INFO Oh noes, no (more) inbound revisions :(
202:00.39 LOG: MainThread Bisector INFO Last good revision: d65328c6afe95c977f6670d55c08a8587ca487b9
202:00.39 LOG: MainThread Bisector INFO First bad revision: 2bf4149711c980470a8081cbd71c3da10fe90069
202:00.39 LOG: MainThread Bisector INFO Pushlog:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d65328c6afe95c977f6670d55c08a8587ca487b9&tochange=2bf4149711c980470a8081cbd71c3da10fe90069

If these changes seem impossible to be causing this issue, I'll do another run at it and see if we can find it.
Flags: needinfo?(drichard)
(In reply to drichard from comment #22)
> We have lots of users on our circuit, so there are variables during the day
> that make this a little more difficult to replicate exactly the same
> internet speed on each test, but using the regression tester, it indicates:
> 
> 199:44.52 LOG: MainThread mozversion INFO platform_version: 39.0a1
> Was this inbound build good, bad, or broken? (type 'good', 'bad', 'skip',
> 'retry', 'back' or 'exit' and press Enter): bad
> 202:00.39 LOG: MainThread Bisector INFO Narrowed inbound regression window
> from [d65328c6, 92b51483] (3 revisions) to [d65328c6, 2bf41497] (2
> revisions) (~1 steps left)
> 202:00.39 LOG: MainThread main INFO Oh noes, no (more) inbound revisions :(
> 202:00.39 LOG: MainThread Bisector INFO Last good revision:
> d65328c6afe95c977f6670d55c08a8587ca487b9
> 202:00.39 LOG: MainThread Bisector INFO First bad revision:
> 2bf4149711c980470a8081cbd71c3da10fe90069
> 202:00.39 LOG: MainThread Bisector INFO Pushlog:
> https://hg.mozilla.org/integration/mozilla-inbound/
> pushloghtml?fromchange=d65328c6afe95c977f6670d55c08a8587ca487b9&tochange=2bf4
> 149711c980470a8081cbd71c3da10fe90069
> 
> If these changes seem impossible to be causing this issue, I'll do another
> run at it and see if we can find it.

I'm afraid neither of the commits in that range (https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=d65328c6afe95c977f6670d55c08a8587ca487b9&tochange=2bf4149711c980470a8081cbd71c3da10fe90069) seem related. :/

Please keep trying.
Doing another regression test, but did find something interesting.  In this one test of CNN, it is always hanging in the area of the white canvas that displays HTML 5 video at the top....Flash is not installed nor used in the regression tester.  The first shot shows the white area that blocks the browser, and it sits until I move the mouse and then the video appears.
Ok, as best I could....another regression test.  Lots of changes over these days -- interesting this happened to pick the day the GTK printing regression bug was introduced that you fixed with the recent patch.  Maybe something else going on here with GTK?


Was this nightly build good, bad, or broken? (type 'good', 'bad', 'skip', 'retry', 'back' or 'exit' and press Enter): bad
26:02.19 LOG: MainThread Bisector INFO Narrowed nightly regression window from [2015-03-09, 2015-03-11] (2 days) to [2015-03-09, 2015-03-10] (1 days) (~0 steps left)
26:02.19 LOG: MainThread main INFO Got as far as we can go bisecting nightlies...
26:02.19 LOG: MainThread Bisector INFO Last good revision: eab4a81e4457
26:02.19 LOG: MainThread Bisector INFO First bad revision: 6686aacf006f
26:02.19 LOG: MainThread Bisector INFO Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=eab4a81e4457&tochange=6686aacf006f
Bug 1128934 looks potentially suspect.

drichard - would you mind setting layers.offmainthreadcomposition.enabled to false, restarting, and seeing if you can still reproduce the bug? If so, that bolsters the case that some OMTC code is at fault here.
Flags: needinfo?(drichard)
Ok, I changed layers.offmainthreadcomposition.enabled to false in our global settings and so far after about 30 minutes, I have not been able to get Firefox to lock or halt when navigating between pages.  I am finding that page loading is crisper and faster now and has a different feel.   Mike you are awesome, I feel 90% sure this is it.  

We do have a global prefs.js file which we push to users when they launch FF, but this bug was happening on the regression tester which wipes all settings and resets the session each time it's run.

We are running FF over NX technology which allows it to run on thin clients.  Our users have no local video cards in the traditional workstation sense where FF is running locally.  

So is the best approach to re-open 1128934 and alert them of this issue?
Flags: needinfo?(drichard)
Cc'ing kats and nical from bug 1128934.

nical - any idea what might be going wrong here with drichard's setup?
Flags: needinfo?(nical.bugzilla)
No problems now having this setting changed for 3 hours.  I would have seen many lockups by now for sure.
CC'ing :karlt (who I think you were looking for, instead of me) and :Bas. This does sound vaguely familiar, I remember a bug when we turned on OMTC on windows where stuff wouldn't happen unless the mouse was moving. Bas/nical would know more.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #32)
> CC'ing :karlt (who I think you were looking for, instead of me)

Hah, yep - thanks, and sorry for the noise.
Ah, it was bug 1105386 (thanks markus for finding it).
See Also: → 1105386
Here is the xdpyinfo for the Xserver used with NX.  NX is heavily used in enterprise deployments.  It creates a stateless Xserver that can be resumed from remote devices.  It's similar to VNC but much faster.
(In reply to Mike Conley (:mconley) - Needinfo me! from comment #8)
> Sounds like the mouse movement is causing a refresh driver kick, which is
> otherwise not occurring. That's my guess based on what you're saying.

Yes, my suspicions would be in line with that or an invalidation problem.
OMTC doesn't use GTK for invalidation, so I don't know how to log that.
Nical may have some suggestions.
See Also: → 1193520
Lowering "importance" because we now have a work around with the setting change.  100% for sure, this setting has made this issue go away.  I'd still love to figure it out and be able to get it fixed long term.
Severity: major → normal
If I remember correctly there were two workarounds(either of which would have the same effect):
1. set layers.offmainthreadcomposition.enabled to false in about:config  (as mentioned in a comment above)
2. recompile firefox with --disable-system-cairo (and keep layers.offmainthreadcomposition.enabled set to true(the default)  (or set to false; they both work the same))

Both workarounds worked the same(for me) for https://bugzilla.mozilla.org/show_bug.cgi?id=1193520 (which I believe is an issue of the same cause)
I am clearing my overdue needinfos. If I haven't answered within afew days it means I didn't find any useful information to bring but thought I'd find the time to come back and investigate, which usually never happens. Transferring this one to Lee in case it rings a bell since this lloks like it could be related to racy interactions with the x server.
Flags: needinfo?(nical.bugzilla) → needinfo?(lsalzman)
Could you check if the issue still happens in a nightly build?

Also, as hinted at by Emanuel, if we can confirm that this is indeed related to builds using system Cairo, then that would be good. We are instead encouraging using our in-tree Cairo (which means building with --disable-system-cairo, the default for builds now), so official builds hypothetically wouldn't have this issue anymore were this the culprit.
Flags: needinfo?(lsalzman) → needinfo?(drichard)
Build ID: 20151124030553
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0

Hi, 

I have tested this on Nightly 45.0a1(2015-11-24) with the steps from the reporter description and I can't reproduce the problem. Maybe my comment will help you with this bug.
(In reply to ovidiu boca from comment #41)
> Build ID: 20151124030553
> User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
> Firefox/45.0
> 
> Hi, 
> 
> I have tested this on Nightly 45.0a1(2015-11-24) with the steps from the
> reporter description and I can't reproduce the problem. Maybe my comment
> will help you with this bug.

Please post the contents of about:buildconfig to see if --disable-system-cairo was used(most likely?) in which case, it's normal that you cannot reproduce(I'd say). Thanks.
Hi, 

Here is the contents from "about:buildconfig"



about:buildconfig
Source

Built from https://hg.mozilla.org/mozilla-central/rev/45273bbed8efaface6f5ec56d984cb9faf4fbb6a
Build platform
target
x86_64-unknown-linux-gnu
Build tools
Compiler 	Version 	Compiler flags
/usr/bin/ccache /builds/slave/m-cen-l64-ntly-000000000000000/build/src/gcc/bin/gcc 	4.7.3 	-Wall -Wempty-body -Wpointer-to-int-cast -Wsign-compare -Wtype-limits -Werror=char-subscripts -Werror=comment -Werror=endif-labels -Werror=enum-compare -Werror=ignored-qualifiers -Werror=int-to-pointer-cast -Werror=multichar -Werror=nonnull -Werror=pointer-arith -Werror=pointer-sign -Werror=return-type -Werror=sequence-point -Werror=trigraphs -Werror=uninitialized -Werror=unknown-pragmas -Wno-unused -Wcast-align -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -std=gnu99 -fgnu89-inline -fno-strict-aliasing -ffunction-sections -fdata-sections -fno-math-errno -pthread -pipe
/usr/bin/ccache /builds/slave/m-cen-l64-ntly-000000000000000/build/src/gcc/bin/g++ 	4.7.3 	-Wall -Wempty-body -Woverloaded-virtual -Wsign-compare -Wwrite-strings -Werror=endif-labels -Werror=int-to-pointer-cast -Werror=missing-braces -Werror=parentheses -Werror=pointer-arith -Werror=return-type -Werror=sequence-point -Werror=switch -Werror=trigraphs -Werror=type-limits -Werror=uninitialized -Werror=unused-label -Wno-invalid-offsetof -Wcast-align -Wno-error=maybe-uninitialized -Wno-error=deprecated-declarations -Wno-error=array-bounds -Wno-error=coverage-mismatch -Wno-error=free-nonheap-object -fno-exceptions -fno-strict-aliasing -fno-rtti -ffunction-sections -fdata-sections -fno-exceptions -fno-math-errno -std=gnu++0x -pthread -D_GLIBCXX_USE_CXX11_ABI=0 -pipe -DNDEBUG -DTRIMMED -g -fprofile-use -fprofile-correction -Wcoverage-mismatch -O3 -fno-omit-frame-pointer -Werror
Configure arguments

--enable-update-channel=nightly --enable-update-packaging --with-google-api-keyfile=/builds/gapi.data --with-google-oauth-api-keyfile=/builds/google-oauth-api.key --with-mozilla-api-keyfile=/builds/mozilla-desktop-geoloc-api.key --enable-crashreporter --enable-release --enable-elf-hack --enable-stdcxx-compat --enable-default-toolkit=cairo-gtk3 --enable-warnings-as-errors --enable-profiling --enable-verify-mar --with-branding=browser/branding/nightly --with-ccache
 Version	40.0
Build ID 	20150807085045
Update History 	
Update Channel 	release
User Agent 	Mozilla/5.0 (X11; Linux i686; rv:40.0) Gecko/20100101 Firefox/40.0

 Version 	43.0.4
Build ID 	20160105164030
Update History 	
Update Channel 	release
User Agent 	Mozilla/5.0 (X11; Linux i686; rv:43.0) Gecko/20100101 Firefox/43.0

I had try to reproduce this issue with this versions but I can't. 
Any updates on this problem?
Component: Untriaged → Event Handling
Product: Firefox → Core
Component: Event Handling → User events and focus handling
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: