1274597 - Clicking on links sometimes doesn't work, long-pressing results in text selection instead of context menu

Reporter

Description

•

8 years ago

It seems like sometimes Firefox gets into a situation where clicks on links (both short and long) aren't recognised any more:

(In reply to Ryan VanderMeulen [:RyanVM] from bug 1231570 comment #29)
> I'm guessing this is why I'm finding it nearly impossible to click links
> with today's Aurora nightly?

(In reply to Ryan VanderMeulen [:RyanVM] from bug 1231570 comment #31)
> Yes. HTC M8 w/ Marshmallow. I see it most easily on Reddit. When I try to
> long-click a link, I get selection handles (and the associated menu) instead
> of the context menu letting me open the link in a new tab. Oddly, I was
> unable to even short-click links at that point either.

(In reply to Jan Henning [:JanH] from bug 1231570 comment #32)
> I've seen something similar (short-clicking not working and long-clicking
> only displaying the selection handles) very recently on a local build,
> although in my case it was on a plain HTTP server directory listing page. A
> restart fixed it, although I can't quite remember whether restarting Fennec
> was enough, or if I had to actually reboot my phone... apart from that
> incident, I haven't encountered any difficulties clicking links.

Ryan VanderMeulen [:RyanVM]

Comment 1

•

8 years ago

Like Jan, I force-killed and restarted Fennec and I could no longer reproduce. We'll see if it returns I guess.

Kevin Brosnan [Ex-Mozilla]

Comment 2

•

8 years ago

This is really bad and I've seen this and did not realize we closed bug 1265251 which may be the first report of this. Might be APZ?

tracking-fennec: --- → ?

status-firefox47: --- → ?

status-firefox48: --- → affected

tracking-firefox48: --- → ?

tracking-firefox49: --- → ?

Flags: needinfo?(snorp)

Marcia Knous [:marcia]

Comment 3

•

8 years ago

Sounds reasonable to track this for 48/49 until we figure out what is going on.

tracking-firefox48: ? → +

tracking-firefox49: ? → +

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 4

•

8 years ago

-> rbarker

Assignee: nobody → rbarker

tracking-fennec: ? → 48+

Flags: needinfo?(snorp)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 5

•

8 years ago

I ran into it this morning again, on Nightly. Switching tabs to about:config, checking the value of some prefs, and then switching back seemed to fix it. I'm going to start running local builds so that if it happens again I can try to gdb it.

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Updated

•

8 years ago

Blocks: fennec-aboard-apz

status-firefox47: ? → unaffected

status-firefox50: --- → affected

Component: General → Graphics, Panning and Zooming

Priority: -- → P1

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Keywords: regression

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Summary: Clicking on links sometimes doesn't work → Clicking on links sometimes doesn't work, long-pressing results in text selection instead of context menu

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 8

•

8 years ago

I have my local Fennec build in this state now. From the logging I added and gdb investigation everything *appears* to be working correctly in terms of coordinate flow, but I think the hit test is failing for some reason. In particular:

- The page is loaded and zoomed in to a resolution of ~2.2x
- If I tap on a particular link and inspect the coordinates the widget creates and puts into the MultiTouchInput as a ScreenIntPoint, they are in the neighbourhood of (100, 480) depending on exactly where my finger lands.
- The APZ untransform is a no-op, so the touch events and single-tap message sent to gecko all have the same LayoutDevice coordinates of (100, 480)
- The APZ callback transform is empty, so that's not affecting this at all.
- The division at [1] is doing nothing (which I believe is correct on Fennec - that was intended for B2G IIRC)
- During hit-testing the PET code at [2] is getting what appears to be the correct aPointRelativeToRootFrame of (3000, 14400) app units (which corresponds to 100,480 with a 2x devicePixelRatio).
- During the hit-test I see that point getting RemoveResolution'd at [3]
- HOWEVER, the result of the hit-test doesn't appear to be correct. It seems to find the body element as the target, rather than the link that I'm actually tapping on.

- I also inspected the coordinates at various other points and I think they all seem to be correct based on my understanding of how the resolution removal works. For example at [4] the coordinates that get logged are (50,240) which is correct for resolution-applied CSS coordinates.
- In order to trace through the hit-testing code, I need to enable display-list dumping or ideally turn on the dumping at [5]. Unfortunately I did an opt build without paint dumping, so I can't make that dumping work. And I can't seem to attach the devtools (filed a bug for that) which would allow me to turn on display-list dumping via JS APIs. If I open about:config or switch tabs at all I know this state will get cleared so I won't be able to debug further. For now I'm leaving it in this state, if anybody has any ideas on what I can do let me know. Otherwise I might just have to do another build with paint-dumping enabled and wait until the next time I can reproduce this.

Also I feel like the results I have seen so far are consistent with the observable symptoms, which gives me a little bit of confidence that I'm on the right track. The bad hit-test means that the _highlightElement in browser.js gets set to the body, which is why long-pressing doesn't give a context menu. I think the mouse events that constitute the click are also probably getting the wrong target (the body) and so clicks have no effect. Text selection still works because it uses the eMouseLongTap event point rather than an event target (I think).

[1] http://searchfox.org/mozilla-central/rev/bfcc10319e4e3ce78367fa9bba9316f7eb5248b6/gfx/layers/apz/util/APZCCallbackHelper.cpp#404
[2] http://searchfox.org/mozilla-central/source/layout/base/PositionedEventTargeting.cpp#547
[3] http://searchfox.org/mozilla-central/rev/bfcc10319e4e3ce78367fa9bba9316f7eb5248b6/layout/base/nsDisplayList.cpp#4959
[4] http://searchfox.org/mozilla-central/rev/bfcc10319e4e3ce78367fa9bba9316f7eb5248b6/gfx/layers/apz/util/APZEventState.cpp#172
[5] http://searchfox.org/mozilla-central/rev/bfcc10319e4e3ce78367fa9bba9316f7eb5248b6/layout/base/nsLayoutUtils.cpp#3176

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 9

•

8 years ago

so my understanding of how things were supposed to work was wrong. The call to GetEventCoordinatesRelativeTo is supposed to do a RemoveResolution on the point, so that when it goes through the display list build and hit-test the coordinates already have the resolution removed. At least that's what I'm seeing now on a "working normally" Fennec build. So for some reason we seem to get into a state where after having called GetEventCoordinatesRelativeTo, the coordinates still have the resolution applied.

I'm still unable to find a good way to reproduce this behaviour, so it's hard to debug. I did find a bug in the touch-action support while investigating though, will file another bug for that.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Updated

•

8 years ago

status-firefox48: affected → wontfix

Kevin Brosnan [Ex-Mozilla]

Comment 10

•

8 years ago

I am really scared that we are shipping this in 48. I know that investigation has been fruitless so far :-(

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 11

•

8 years ago

I have my device in this state again. The logging seems to indicate that during the course of a long-press, there are many calls to GetEventCoordinatesRelativeTo, some of which go through the code path at [1] and some of which go through [2]. When it goes through [1], the resolution is 1.0 and at [2] the resolution is correctly reflecting my zoom level. The main-thread hit-testing seems to occur mostly with the coordinates returned by [1], and so the display items that get built are for something not directly under the finger.

[1] http://searchfox.org/mozilla-central/rev/ff0e3782b1c44ffd1a68ef8e9e4f8be3866741e2/layout/base/nsLayoutUtils.cpp#2234
[2] http://searchfox.org/mozilla-central/rev/ff0e3782b1c44ffd1a68ef8e9e4f8be3866741e2/layout/base/nsLayoutUtils.cpp#2273

DrBill

Comment 12

•

8 years ago

I've found that the scrolling behavior with nightly on marketwatch.com mobile site has been suspect since they revamped it a few months ago.  Specifically, it seems like when you attempt to scroll down the page, you can "miss" the page or something, and the content won't scroll at all or it will scroll 1-3px and then just stop scrolling even though you keep moving your finger.  After this "miss-scroll" occurs, bug 1274597 seems to occur (more?) and it seems to select the text of the link rather than try to hit the link itself.  Just some behavior I've been seeing for the last 4+ weeks on nightly w/apz.  Further, you can try to scroll again and scrolling will work on the 2nd+.  Bug 1274597 then does not seem to occur as much(?), but it still does sometimes.  

Try the following:
1. marketwatch.com
2. Scroll down the page about 1-inch with your finger.
   Note how the page will scroll just a few pixels.
3. Long tap a link.  It doesn't go, text selected instead.

Refresh the page, rinse, repeat to do it again.  I hope this helps; this bug is aggravating.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 13

•

8 years ago

Thanks for the suggestions, but unfortunately even with that I'm not able to reproduce the problem.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Blocks: apz-fennec
No longer blocks: fennec-aboard-apz

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 14

•

8 years ago

Attached file Logging — Details

I ran into this again. I don't have much time to look into this now since I'm going on PTO but I'm attaching a log which covers me long-pressing on a link and it getting selected instead of showing the context menu. The patch which I used to generating the logging is at [1].

It looks like nsLayoutUtils::GetEventCoordinatesRelativeTo is getting called with a frame 0x91b2d740 that is the "reference frame" (it shows up as such in the display list dump). The code at [2] gets the presShell from the frame and then gets the APZ resolution scale from that, but that's coming out as 1.0. This results in the event not being transformed properly and so the hit-test returns the wrong thing.

In the log you can also see other calls to nsLayoutUtils::GetEventCoordinatesRelativeTo where the frame is 0x8b421620 (this is the ViewportFrame, according to the display list dump). In that case the frame's view's widget is 0x0 and it ends up falling through to [3] where it correctly gets 1.547295 as the resolution scale and unapplies it. However it looks like those calls are not the ones that are part of a hit-test (because there's no display list dump immediately after).

I *think* the upshot of all this is that we're using the wrong presShell at [2], at least in some cases, where the frame being passed in is the reference frame.

[1] https://github.com/staktrace/gecko-dev/commit/e58c4fe58d173ad606d7fb2d47cf36fcfb11ded4
[2] http://searchfox.org/mozilla-central/rev/d9a04f71630ce4203ff0a5e26722723045d035b5/layout/base/nsLayoutUtils.cpp#2230
[3] http://searchfox.org/mozilla-central/rev/d9a04f71630ce4203ff0a5e26722723045d035b5/layout/base/nsLayoutUtils.cpp#2273

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 15

•

8 years ago

Attached file Logging (working version, for comparison) — Details

Also, I did a tab switch to clear the bad state, and repeated the long-press. This is the resulting log. Note how in this one GECRT never gets called with the frame 0x91b2d740.

Botond Ballo [:botond]

Comment 16

•

8 years ago

(In reply to (away Aug5-14) Kartikaya Gupta (email:kats@mozilla.com) from comment #14)
> I *think* the upshot of all this is that we're using the wrong presShell at
> [2], at least in some cases, where the frame being passed in is the
> reference frame.
> 
> [2]
> http://searchfox.org/mozilla-central/rev/
> d9a04f71630ce4203ff0a5e26722723045d035b5/layout/base/nsLayoutUtils.cpp#2230

cc'ing Timothy; I have a recollection of us talking about this early exit from GECRT in London.

Comment hidden (offtopic)

I started noticing similar bugged behavior on desktop(!) nightly sometime earlier this week.  It seems to be related to the enabling of some a11y text-selection functionality.

I tried long-tapping a link in desktop nightly and I can get this to fire pretty often, almost every time on certain links.  The behavior is slightly different on desktop, however.  It seems that first the text under the finger is selected, followed by the appearance of the context menu.  I didn't check to see if there was a bug filed for this issue specifically, but I was assuming this desktop nightly behavior is related to this bug and apz overall.

Comment hidden (offtopic)

(In reply to DrBill from comment #17)

Sorry for the late response, I was away for a bit. I think the behaviour you are seeing on desktop is expected - UX requested [1] that when long-pressing on desktop we do both the text selection and the context menu. Regardless, I think it is unrelated to this bug, because based on my understanding of this bug it is specific to platforms on which you can pinch-zoom and we don't support zooming on desktop. If you are encountering behaviour on desktop that you think is defective please file a new bug for it.

[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1195722#c7

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

8 years ago

Version: Trunk → 48 Branch

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 19

•

8 years ago

I looked at the logs I posed in comment 14 and comment 15 again, and noticed that in the bad case, where the 0x91b2d740 is used as the GECRT frame, the hit-testing display list dump is actually dumping the root XUL document rather than the HTML document. So 0x91b2d740 is actually the reference frame for the XUL document while 0x8b421620 is the reference frame for the HTML document, and what we want to be targeting. So the root cause is somewhere earlier from that point, probably in PresShell::HandleEvent, where we pick the wrong document as the target. Will need to add more logging and run this again.

Liz Henry (:lizzard) (relman/hg->git project)

Comment 20

•

8 years ago

kats, would it be useful to have more attempts to reproduce and log or find STR?
ioana, can your team try to reproduce this issue?

status-firefox49: affected → wontfix

status-firefox51: --- → affected

Flags: needinfo?(ioana.chiorean)

Flags: needinfo?(bugmail)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 21

•

8 years ago

Yes, finding reliable STR would be quite helpful. I've only been able to hit randomly (and not at all in the last couple of weeks) so debugging it is very slow.

Flags: needinfo?(bugmail)

Ioana Chiorean

Comment 22

•

8 years ago

Attached image Long_tap.png — Details

I am able to reproduce it both on Reddit and using steps from Comment #12 aroudn 1 out of 4/5 times. 
I can reproduce easily using end of string or even near proximity ( this last case might be intended)

see screens

Flags: needinfo?(ioana.chiorean)

Ioana Chiorean

Comment 23

•

8 years ago

Attached file v2_nexus.txt — Details

Logs

Ioana Chiorean

Comment 24

•

8 years ago

Attached file v1_Nexus.txt — Details

Ioana Chiorean

Comment 25

•

8 years ago

Hope it helps - but again I am not sure I'm reproducing it clean all times. 
I can have a look tomorrow morning too.

Flags: needinfo?(bugmail)

DrBill

Comment 26

•

8 years ago

This is exactly what I was able to do in Comment 12. Notice in the screenshot, however, that the selected text is specifically at the end of the string in both cases. I can accomplish the same thing to any string of text, whether or not it is a hyperlink, simply by long-tapping somewhere in the small amount of white space just after the last character in the string, and it happens 100% of the time, but I don't think it actually reproduces the bug being described here. I believe it is correct behavior to highlight the text of a link if the browser believes the text is being selected for text manipulation rather than being invoked to perform the link action. But I could be very wrong here too.

I dropped off the radar because ever since Comment 12, the thing I've been trying to reproduce is to get a token which isn't the first or last token in the hyperlink string to be selected, proving this bug's existence and sidestepping any issues from my above description, whether or not it is true. And again, maybe this is wrongful thinking.

After a seemingly week+ hiatus, this behavior just happened again to me today using some email unsubscribe page (I was on yesterday's Nightly at the time). A word in the center of the hyperlink string was selected when I attempted to long-tap the link, which was surely an occurrence of this bug. I wasn't really paying attention at the time because it was early morning, but the things I remember were that:
1) I did a single zoom-in gesture on the page immediately after it finished loading and before I selected the link.
2) I began selecting the link almost immediately after finishing the zoom gesture, i.e. I was navigating quickly.
3) I don't remember exactly, but I'm pretty sure the dynamic toolbar was displayed in its entirety the whole time I was on the page, and I do have full screen browsing turned on.
4) All of this happened in about 1 second between the time the page finished loading and the time the bug occurred/hyperlink text was selected.

Try and see if you can reproduce where some token in the middle of the hyperlink string is selected. I think that's what we're looking for. It doesn't even have to be text! I've seen picture elements and radio buttons get selected as well.

Other things I've noticed: It seems this bug likes to occur when you start to see some fps drop or see other kinds of jank while panning or zooming, or the behavior I described above where you can "miss" scrolling the page. I called out marketwatch.com because that site was a pig at the time. I'll sometimes have to select a link for a very long time before the context menu appears as well (2+ seconds?). With marketwatch.com, I was usually lucky in reproducing this bug because the browser showed both of these symptoms in a significant way and just so happened to notice that I was able to reproduce this bug more frequently on that site at the time. The weird scrolling behavior no longer occurs there and I haven't been able to reproduce this bug on that site in a few weeks. I'd be happy to file bugs for any behavior I've called out above.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Updated

•

8 years ago

tracking-fennec: 48+ → 50+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 27

•

8 years ago

(In reply to Ioana Chiorean from comment #25)
> Hope it helps - but again I am not sure I'm reproducing it clean all times. 
> I can have a look tomorrow morning too.

What DrBill said in comment 26 is basically correct - long-pressing outside a link will generally do text-selection. It's when you tap directly on a link and it doesn't work, or if you long-press directly on a link and it does text selection instead of showing the context menu, that you're seeing the same problem.

At this point logs probably won't help much because the logs won't contain any necessary information, but reliable STR would be great.

Flags: needinfo?(bugmail)

Sotaro Ikeda [:sotaro]

Comment 28

•

8 years ago

This month, I saw the problem several times in the following STR with firefox 48 on my nexus 9.

[Prerequisite]
- Open http://syosetu.com/ and log in to an account.
    I already registered some novels to read. Updated novels appear left of the page when I log in.
- I always keep Firefox and the tab open.

[STR]   
[1] Touch page update icon in the toolbar. Then latest updated novel appear.
[2] Long press a link of a latest story. And select "Open Link in New Tab".
    I did it for all new updated stories.
[3] Move to a Tab and read a story. When I end to read it, close the tab

Continue [1]-[3] several times in a day. There was a case that long push did not work for a link in [2].

Sotaro Ikeda [:sotaro]

Comment 29

•

8 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #28)
> Continue [1]-[3] several times in a day. There was a case that long push did
> not work for a link in [2].

In this case text selection just happened.

Ioana Chiorean

Comment 30

•

8 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #27)
>  but reliable STR would be great.

For STR I mentioned same as DrBill's in comment #12:
1. marketwatch.com
2. Scroll down the page about 1-inch with your finger.
   Note how the page will scroll just a few pixels.
3. Long tap a link.  It doesn't go, text selected instead
or simply go to Reddit and long press any link from the page at a end or where some white space s ( when a title is on 2 lines etc)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 31

•

8 years ago

I wasn't able to reproduce this on marketwatch.com, although I just ran into it after loading https://bugzilla.mozilla.org/show_bug.cgi?id=1296887 in a local Fennec build. Unfortunately the local build didn't have my logging, but I was able to use gdb and I discovered that during hit-testing, the call at [1] is returning null. That's why the events are getting sent to the root XUL document instead of the content document. And that function in turn is returning null because the primaryContentShell is null at [2]. According to the comment just above that, that would indicate there is no "active tab". Not sure how we can get into that state but at least it's another step in the investigation.

[1] http://searchfox.org/mozilla-central/rev/3611a7607db7554b9d7065ebfef3d5111f7171b8/layout/base/nsPresShell.cpp#7688
[2] http://searchfox.org/mozilla-central/rev/3611a7607db7554b9d7065ebfef3d5111f7171b8/layout/base/nsPresShell.cpp#8246

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 32

•

8 years ago

I forgot to mention - the last time I saw this, I also checked BrowserApp.selectedTab and BrowserApp.selectedBrowser using the JS debugger. And it looked like selectedBrowser had type="content-targetable" instead of the type="content-primary" that I would expect based on [1].

[1] http://searchfox.org/mozilla-central/rev/3582398bc9db5a83e25c2a8299cc780a52ad7ca8/mobile/android/chrome/content/browser.js#3689

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 33

•

8 years ago

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #32)
> I forgot to mention - the last time I saw this, I also checked
> BrowserApp.selectedTab and BrowserApp.selectedBrowser using the JS debugger.
> And it looked like selectedBrowser had type="content-targetable" instead of
> the type="content-primary" that I would expect based on [1].
> 
> [1]
> http://searchfox.org/mozilla-central/rev/
> 3582398bc9db5a83e25c2a8299cc780a52ad7ca8/mobile/android/chrome/content/
> browser.js#3689

The first line there causes the function to bail out if the docShell is null. Is that possible? My guess is that it is possible if someone decided to guard against it. The result would be as you observed, an 'active' tab that isn't really active.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 34

•

8 years ago

I have it in this state again. I started Fennec, which restored the one tab I had from last time (my bugmail dashboard). I reloaded it, and then after some panning around, long-pressed on a link to open it in a new background tab. After that point it was in the bad state. Using the JS debugger I see that both tabs are returning false from getActive(). This is on a local build where I added logging into Tab.setActive, and I don't see that logging anywhere in my logcat, which seems to indicate Tab.setActive was never called at all. That's bizarre - it should have at least gotten called from the Tab.create function.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 35

•

8 years ago

Ugh. It looks like my logcat is incomplete, because of http://stackoverflow.com/questions/34587273/android-logcat-logs-chatty-module-line-expire-message

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 36

•

8 years ago

Perhaps unrelated, but while I was running android reftests I saw this in the logcat:

E/GeckoConsole(11112): [JavaScript Error: "TypeError: this.browser.docShell is undefined" {file: "chrome://browser/content/browser.js" line: 3481}]
E/GeckoConsole(11112): Tab.prototype.create@chrome://browser/content/browser.js:3481:5
E/GeckoConsole(11112): Tab@chrome://browser/content/browser.js:3403:3
E/GeckoConsole(11112): addTab@chrome://browser/content/browser.js:1191:18
E/GeckoConsole(11112): BrowserApp.observe@chrome://browser/content/browser.js:1719:11
E/GeckoConsole(11112): [JavaScript Error: "TypeError: aTab is null" {file: "chrome://browser/content/browser.js" line: 1366}]
E/GeckoConsole(11112): _handleTabSelected@chrome://browser/content/browser.js:1366:5
E/GeckoConsole(11112): BrowserApp.observe@chrome://browser/content/browser.js:1733:9
E/GeckoConsole(11112): [JavaScript Error: "TypeError: aBrowser is null" {file: "resource://gre/modules/PrivateBrowsingUtils.jsm" line: 37}]
I/Gecko   (11112):

The first error indicates that browser.docShell can be undefined at http://searchfox.org/mozilla-central/rev/124c120ce9d7e8407c9ad330a9d6b1c4ad432637/mobile/android/chrome/content/browser.js#3481 during tab creation. I don't know if this is an artifact of the reftest environment or something that can happen in production as well.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 37

•

8 years ago

Hit it again. My logging is still incomplete, but I see that the tab is in fact deactivated by browser.js. I see this in the log:

09-11 06:05:14.637 D/GeckoAppShell(27012): Notification client already set
09-11 06:05:14.652 D/GeckoBrowser(27012): staktrace: setting tab active true
09-11 06:05:14.652 D/GeckoBrowser(27012): setActive@chrome://browser/content/browser.js:3718:11
09-11 06:05:14.652 D/GeckoBrowser(27012): ao_observe@chrome://browser/content/browser.js:6606:7
09-11 06:05:14.654 D/GeckoScreenOrientation(27012): unlocking
09-11 06:05:14.679 D/GeckoBrowser(27012): staktrace: setting tab active false
09-11 06:05:14.679 D/GeckoBrowser(27012): setActive@chrome://browser/content/browser.js:3718:11
09-11 06:05:14.679 D/GeckoBrowser(27012): ao_observe@chrome://browser/content/browser.js:6606:7

which is the last thing to change a tab's state - so something is dispatching a application-foreground quickly followed by a application-background notification, and the selectedtab is activated and then deactivated. This happened while Fennec was in the foreground and I was interacting with it (although I don't recall if maybe I got some system notification popup at this time).

The "Notification client already set" logging is coming from GeckoService.onCreate(), which does in fact dispatch the application-foreground message via GeckoThread.onResume(). It seems logical to assume that the application-background message is coming from GeckoService.onDestroy() via GeckoThread.onPause(). I've added more logging to catch this if happens. snorp, do you know what this GeckoService is, and what causes it to get created/destroyed? And in particular, if it might get created/destroyed while in the middle of interacting with Fennec?

Flags: needinfo?(snorp)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 38

•

8 years ago

Interesting. GeckoService is used to handle some background stuff like updating addons or showing a persistent notification while Fennec isn't running.

The onResume/onPause stuff in GeckoThread ends up in nsAppShell[0] which keeps a count. Apparently we have an unbalanced pause/resume, leading to nsAppShell thinking we are 'paused' when we are not. A quick glance suggests that we are missing an initial GeckoThread.onResume() call when Fennec is started. Without this, GeckoService starting and stopping yields a 'paused' state. Jim does this sound right to you?

[0] https://dxr.mozilla.org/mozilla-central/source/widget/android/nsAppShell.cpp?q=android%2FnsAppShell.cpp&redirect_type=direct#141

Flags: needinfo?(snorp) → needinfo?(nchen)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 39

•

8 years ago

So it looks like the problem is that the sPauseCounter in nsAppShell isn't set up to track multiple resumes (i.e. it's a one-way counter, rather than two way).

So on startup, sPauseCount is 0.
Then GeckoThread.onResume() is called by GeckoService.onCreate, and we go through [1]. Since sPauseCount is 0 we return immediately without doing anything.
Then GeckoThread.onPause() is called by GeckoService.onDestroy, and we go through [2]. Since sPauseCount is 0, it gets incremented to 1 and we send the application-background message.

As snorp pointed out on IRC, this can be reproduced by sticking in calls to GeckoThread.onResume(); GeckoThread.onPause(); somewhere in a code path that can be executed on-demand. Doing so reproduces the problem.

I think it's a relatively simple fix to make sPauseCount signed instead of unsigned and update the conditions so that it can track multiple resumes as well as multiple pauses.

[1] http://searchfox.org/mozilla-central/rev/591354c5e5521cf845bf6b6ef44e8b3b0aeda17d/widget/android/nsAppShell.cpp#173
[2] http://searchfox.org/mozilla-central/rev/591354c5e5521cf845bf6b6ef44e8b3b0aeda17d/widget/android/nsAppShell.cpp#141

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/78240/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/78240/

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Assignee: rbarker → bugmail

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Flags: needinfo?(nchen)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 41

•

8 years ago

mozreview-review

Comment on attachment 8790371 [details]
Bug 1274597 - Update the pause counter in nsAppShell to track multiple resumes as well multiple pauses.

https://reviewboard.mozilla.org/r/78240/#review77254

The patch looks correct, but I don't think we should actually be allowing multiple pause/resume calls. The app is either in the foreground or background, and there should be one place managing that. We (or just myself) can rework that in a followup.

Attachment #8790371 - Flags: review?(snorp) → review+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 42

•

8 years ago

(In reply to James Willcox (:snorp) (jwillcox@mozilla.com) from comment #41)
> I don't think we should actually be allowing
> multiple pause/resume calls. The app is either in the foreground or
> background, and there should be one place managing that. We (or just myself)
> can rework that in a followup.

Sounds good.

Pulsebot

Comment 43

•

8 years ago

Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/109281cadd42
Update the pause counter in nsAppShell to track multiple resumes as well multiple pauses. r=snorp

Carsten Book [:Tomcat]

Comment 44

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/109281cadd42

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox51: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 51

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 45

•

8 years ago

Comment on attachment 8790371 [details]
Bug 1274597 - Update the pause counter in nsAppShell to track multiple resumes as well multiple pauses.

Approval Request Comment
[Feature/regressing bug #]: unsure, but probably is an interaction between bug 1260243 and APZ
[User impact if declined]: randomly fennec will get into a state where tapping on links doesn't work. a tab switch is required to get it working again properly. it's very annoying. There might be other bugs that result from the same root cause, not sure.
[Describe test coverage new/current, TreeHerder]: no automated coverage. the bug is intermittent so it's hard to verify
[Risks and why]: fairly low risk. even if this ends up not fixing the bug the patch is pretty safe and shouldn't break anything
[String/UUID change made/needed]: none

Attachment #8790371 - Flags: approval-mozilla-aurora?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Blocks: 1260243

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

status-firefox49: wontfix → fix-optional

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 46

•

8 years ago

Comment on attachment 8790371 [details]
Bug 1274597 - Update the pause counter in nsAppShell to track multiple resumes as well multiple pauses.

Approval Request Comment - see above comment

I'm not sure if we're still taking patches for 49 beta but if we are this is probably a good one to consider.

Attachment #8790371 - Flags: approval-mozilla-beta?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 47

•

8 years ago

Hi Jan, Ryan, could you please verify this issue is fixed as expected on a latest Nightly build? Thanks!

Flags: needinfo?(ryanvm)

Flags: needinfo?(jh+bugzilla)

Ryan VanderMeulen [:RyanVM]

Comment 48

•

8 years ago

I wasn't ever able to consistently reproduce it on my device. Not sure if Ioana's team has had better luck in doing so.

Flags: needinfo?(ryanvm) → needinfo?(ioana.chiorean)

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 49

•

8 years ago

Comment on attachment 8790371 [details]
Bug 1274597 - Update the pause counter in nsAppShell to track multiple resumes as well multiple pauses.

Fixes a recent regression, Aurora50+

Attachment #8790371 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 50

•

8 years ago

http://hg.mozilla.org/releases/mozilla-aurora/rev/657bba96bda0

status-firefox50: affected → fixed

Jan Henning [:JanH]

Reporter

Comment 51

•

8 years ago

What Ryan said - from a user perspective it's essentially random.

Flags: needinfo?(jh+bugzilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 52

•

8 years ago

I guess it's too late for 49.

status-firefox49: fix-optional → wontfix

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

8 years ago

Attachment #8790371 - Flags: approval-mozilla-beta?

Jan Henning [:JanH]

Reporter

Comment 53

•

8 years ago

Maybe a potential candidate to ride along on a dot release, should there be any?

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 54

•

8 years ago

That's a good point. Liz, is there a process for flagging patches as potential ride-alongs for dot releases? This one would be good on 49 if there's a dot release for Fennec.

Flags: needinfo?(lhenry)

Liz Henry (:lizzard) (relman/hg->git project)

Comment 55

•

8 years ago

Yes, I can mark it affected and track it. My list of tracked bugs is pretty small at this point, mostly things that are also candidates for dot release. I'm pretty sure we will end up with one, but there isn't a clear "driver" for it yet.

status-firefox49: wontfix → affected

Flags: needinfo?(lhenry)

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Flags: needinfo?(ioana.chiorean)

Carmen Fat [:cfat] - Ecosystem QA, PTO until May 6 - NI to: ecosystemqa@mozilla.com

Comment 57

•

8 years ago

Hello,

This fix was verified on latest Fennec Beta 51.0b2 - build1, using the next devices:
- Huawei Honor 5x (Android 5.1.1);
- Nexus 9 (Android 7.0).

status-firefox51: fixed → verified

neil

Comment 59

•

8 years ago

I've seen a resurgence of this problem under aurora 52 on Galaxy S4 w/ Cyanogenmod-13/Android-6.0.1

DrBill

Comment 60

•

8 years ago

I also started seeing this again this week.

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 61

•

8 years ago

(In reply to neil from comment #59)
> I've seen a resurgence of this problem under aurora 52 on Galaxy S4 w/
> Cyanogenmod-13/Android-6.0.1

(In reply to DrBill from comment #60)
> I also started seeing this again this week.

Could you please file a new bug with as much detail as possible? Given the amount of trouble we had tracking this down last time, the more information we have the better. Thanks!

icedt

Comment 62

•

7 years ago

[Tracking Requested - why for this release]:

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #61)

Bug resurging in Firefox 53 aurora as of February 2017, on Galaxy S4 i9515 w/ up-to-date official stock ROM.

What kind of extra detail can I provide for you on an unrooted device? Any log-files might be provided if I know how to create them. This really needs to be addressed, since opening links in new tabs is an absolute must for newspapers, etc. Browser history to go back is painful.

And thanks for the friendly reply on IRC :)

tracking-firefox53: --- → ?

Flags: needinfo?(bugmail)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Updated

•

7 years ago

Blocks: 1336209

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Assignee

Comment 63

•

7 years ago

I filed bug 1336209 for this. Please continue any discussion over on that bug.

tracking-firefox53: ? → ---

Flags: needinfo?(bugmail)

Jan Henning [:JanH]

Reporter

Updated

•

7 years ago

Updated

•

7 years ago

status-firefox49: affected → wontfix

BMO Automation

Updated

•

3 years ago

Product: Firefox for Android → Firefox for Android Graveyard

Logging 8 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 44.58 KB, text/plain		Details
Logging (working version, for comparison) 8 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 64.96 KB, text/plain		Details
Long_tap.png 8 years ago Ioana Chiorean 1009.91 KB, image/png		Details
v2_nexus.txt 8 years ago Ioana Chiorean 52.00 KB, text/plain		Details
v1_Nexus.txt 8 years ago Ioana Chiorean 6.53 KB, text/plain		Details
Bug 1274597 - Update the pause counter in nsAppShell to track multiple resumes as well multiple pauses. 8 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 58 bytes, text/x-review-board-request	snorp : review+ ritu : approval-mozilla-aurora+	Details