Open Bug 1698037 Opened 4 years ago Updated 1 year ago

Drag-and-drop and context menus break after 49.7 days of system uptime on Linux

Categories

(Core :: Widget: Gtk, defect, P3)

78 Branch
defect

Tracking

()

UNCONFIRMED

People

(Reporter: from_bugzilla3, Unassigned)

References

Details

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0

Steps to reproduce:

  1. Leave a Linux system running for between 49.7 and 99.4 days (Tested under Kubuntu Linux 16.04 LTS)
  2. Launch Firefox (Or have it already running when the uptime counter hits 49.7 days. It doesn't matter which.)
  3. Attempt to drag-and-drop or right-click in Firefox

Actual results:

Drag-and-drop refuses to initiate and context menus flicker into existence just briefly enough for it to be visible that they were triggered.

As far as I can guess, based on the time interval, this has something to do with an improper integer conversion (49.7 days is 2**32 milliseconds and 99.4 days is double that).

Once the uptime counter hits 99.4 days, the problem goes away. (Yes, I discovered this because I was stubborn enough to switch to a more heavily keyboard-driven workflow rather than just rebooting my system.)

Expected results:

It should function normally during that interval, like native GTK+ applications do.

The Bugbug bot thinks this bug should belong to the 'Core::DOM: Drag & Drop' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → DOM: Drag & Drop
Product: Firefox → Core

It's significantly more fundamental than DOM, but I don't know whether I'm supposed to revert it to Untriaged or some other category. Can someone weigh in on that?

(I wouldn't know where to begin to look in the code, but my first hypothesis is that it affects all code which uses X11 event timestamps in a certain way.)

I honestly have no idea. :smaug, do you have some?

Flags: needinfo?(bugs)

Didn't find anything obvious in dom/*.

In widget/gtk
https://searchfox.org/mozilla-central/rev/aa9a7136835deb0eeba00c62bb50a4a0e2cdea2d/widget/gtk/nsWindow.cpp#2167
https://searchfox.org/mozilla-central/rev/aa9a7136835deb0eeba00c62bb50a4a0e2cdea2d/widget/gtk/nsWindow.cpp#417
https://searchfox.org/mozilla-central/rev/aa9a7136835deb0eeba00c62bb50a4a0e2cdea2d/widget/gtk/nsWindow.cpp#423
could have something to do with this.
sLastUserInputTime is guint32 after all.

Perhaps karlt recalls that code. Or maybe that code it totally unrelated to this issue after all.

Flags: needinfo?(bugs)
Flags: needinfo?(karlt)
Component: DOM: Drag & Drop → Widget: Gtk
Priority: -- → P3

Intriguing that the full period is 2^33 ms. I would have expected bugs like this to have a period of 2^32 ms.
Given Kubuntu, I assume this is not running under Wayland.

We have had these kind of issues with the XIM input method.
https://developer.gnome.org/gtk3/stable/GtkIMContext.html#GtkIMContext.description has some hints about how the input method might be selected. lsof might be the most reliable way to determine which, if any, input method module is loaded in the process.
I expect GTK_IM_MODULE=gtk-im-context-simple would override whatever the system has selected.

Flags: needinfo?(karlt)

(In reply to Karl Tomlinson (:karlt) from comment #5)

Intriguing that the full period is 2^33 ms. I would have expected bugs like this to have a period of 2^32 ms.

Turns out that's not the full period. I just noticed my context menus and DnD are out again, and my uptime is just over 2^32 ms * 2.5.

  • 2^32 ms * 1.0: Context menus and DnD go out
  • 2^32 ms * 2.0: Everything normal again
  • 2^32 ms * 2.5: Context menus and DnD go out

...which, given the puzzle games I've been playing lately, makes me think the observed symptoms are the result of interaction between two variables with different ranges.

(In reply to Stephan Sokolow from comment #6)

...which, given the puzzle games I've been playing lately, makes me think the observed symptoms are the result of interaction between two variables with different ranges.

...that is, beyond the overflow aspect of things.

I just noticed that my copy of Thunderbird 68.10.0 took several uses of the menus just now before it had its context menus and DnD go out, despite it being hours after hitting 2^32 ms * 2.5, which lines up with something else I forgot about.

If I restart Firefox or Thunderbird, the context menus and DnD work for a short period (I never checked whether it was time or number of uses) before they stop working again. It's just so inconvenient to restart them for how long it lasts that I never bothered and started learning more keyboard shortcuts instead.

See Also: → 787943
See Also: → 1719175

Original reporter,

You may want to check the content of ~/xsession-errors.
It may reveal strange timestamp issues reported by a problem or two.
https://bugzilla.mozilla.org/show_bug.cgi?id=787943#c12

Flags: needinfo?(from_bugzilla2)

I delete my xsession-errors on startup to allow my SSD as much room for wear levelling as possible, my uptime is currently only 33 days, and I'll probably be taking the system down to install another 16GB of RAM in the next few days, so I won't be able to answer that for a at least another 49.7 days, but I'll leave the needsinfo up for when I can check it.

I noticed something similar when the X server time was 0x804bbbac (from xev), about 82 minutes after the high bit was set.

GTK drags have changed so that the event time is no longer used to grab the pointer device but the time is used to own the selection, so drags were starting but could not deliver their payloads to other apps. Menus would flicker.

"WidgetPopup:5" can be included in "Log Modules" via about:networking:

[Parent 170931: Main Thread]: D/WidgetPopup GrabPointer time=0x00118111 retry=1
[Parent 170931: Main Thread]: D/WidgetPopup GrabPointer: pointer grab failed: 2

0x00118111 came from gdk_x11_display_get_user_time():

(gdb) p ((void*(*)())gdk_display_get_default)()
$2 = (void ) 0x7fda83ce7aa0
(gdb) p ((unsigned(
)())gdk_x11_display_get_user_time)($2)
$3 = 1147153
(gdb) p /x $3
$4 = 0x118111

Running DESKTOP_STARTUP_ID=bump-time_TIME114715300 firefox modified GDK's display user time to work around the issue.

grep 1147153 /proc/*/environ identifies processes having this environment variable value. In this case, the problematic app is emacs. It is built with GTK, which would usually unset DESKTOP_STARTUP_ID from the app's environment, so I don't know why that is not happening in emacs.

/usr/share/applications/emacs.desktop had StartupWMClass=Emacs, but StartupNotify was not set.

If true, it is KNOWN that the application will send a "remove"
message when started with the DESKTOP_LAUNCH_ID environment variable
set.
If false, it is KNOWN that the application does not work
with startup notification at all (does not shown any window, breaks
even when using StartupWMClass, etc.).
If absent, a reasonable handling is up to implementations (assuming false,
using StartupWMClass, etc.).

Unchecking "Enable launch feedback" in KDE's config sets StartupNotify=false, but KDE still sets DESKTOP_STARTUP_ID. Even manually removing StartupWMClass from the desktop file does not prevent KDE from setting DESKTOP_STARTUP_ID.

This is effective in the emacs startup file:

;;; Work around emacs passing old timestamps on to child processes.
(setenv "DESKTOP_STARTUP_ID" nil)

Flags: needinfo?(from_bugzilla3)
You need to log in before you can comment on or make changes to this bug.