Last Comment Bug 787943 - Using XIM with Firefox will cause firefox menu cannot popup correctly.
: Using XIM with Firefox will cause firefox menu cannot popup correctly.
Status: NEW
:
Product: Core
Classification: Components
Component: Widget: Gtk (show other bugs)
: 15 Branch
: x86_64 Linux
: -- normal with 6 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
Mentors:
Depends on:
Blocks: MenuBug 500081
  Show dependency treegraph
 
Reported: 2012-09-03 07:51 PDT by Weng Xuetian
Modified: 2013-07-08 19:02 PDT (History)
8 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
my mozconfig (1.19 KB, text/plain)
2012-09-03 22:05 PDT, Weng Xuetian
no flags Details
Uninitialised value messages from valgrind running TB 15.0.1 (146.81 KB, text/plain)
2012-10-12 09:50 PDT, ISHIKAWA, Chiaki
no flags Details
Uninitialised value messages from valgrind running TB 16.0b2 (39.16 KB, text/x-log)
2012-10-12 09:54 PDT, ISHIKAWA, Chiaki
no flags Details
Uninitialised value messages from valgrind running TB 17.0a2 (40.14 KB, text/plain)
2012-10-12 09:56 PDT, ISHIKAWA, Chiaki
no flags Details
Uninitialised value messages from valgrind running TB 18.0a2 (208.97 KB, text/plain)
2012-10-12 09:58 PDT, ISHIKAWA, Chiaki
no flags Details
Uninitialized byte(s) message from valgrind running kinput2-wnn v3.1 (Debian) (3.95 KB, text/plain)
2012-10-12 10:04 PDT, ISHIKAWA, Chiaki
no flags Details
XNextEvent shows up in an uninitialized byte(s) message from valgrind running kinput2-wnn v3.1 (Debian) (4.09 KB, text/plain)
2012-10-12 10:11 PDT, ISHIKAWA, Chiaki
no flags Details
Valgrind session shows the uninitialized value messages from kinput2 are gone (35.51 KB, text/plain)
2012-10-12 20:20 PDT, ISHIKAWA, Chiaki
no flags Details
patch to kinput2-wnn v3.1 (3-1-10.3) source in Debian. (6.09 KB, patch)
2012-10-12 20:26 PDT, ISHIKAWA, Chiaki
no flags Details | Diff | Splinter Review
Kinput2.conf I used to create knput2-v3.1 (sorry missing from the patch above.) (7.71 KB, text/plain)
2012-10-12 20:27 PDT, ISHIKAWA, Chiaki
no flags Details
diff to libX11 source files (7.08 KB, text/plain)
2012-10-16 06:27 PDT, ISHIKAWA, Chiaki
no flags Details
log to show bogus(?) serial after time field is tentatively fixed (set to 0). (50.90 KB, text/plain)
2012-10-16 06:59 PDT, ISHIKAWA, Chiaki
no flags Details
warning about 0 timestamp from the window manager (25.50 KB, text/plain)
2012-11-12 07:12 PST, ISHIKAWA, Chiaki
no flags Details

Description Weng Xuetian 2012-09-03 07:51:19 PDT
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20100101 Firefox/15.0
Build ID: 20120827204808

Steps to reproduce:

run:

export GTK_IM_MODULE=xim
firefox

Using any input method type something, and then use right button to trigger popup menu.


Actual results:

Menu cannot popup correctly. It hide immediately.


Expected results:

Menu appear corretly.
Comment 1 Masayuki Nakano [:masayuki] (Mozilla Japan) 2012-09-03 20:00:42 PDT
Is it a regression starting from Firefox 15?
Comment 2 Weng Xuetian 2012-09-03 20:16:14 PDT
Seems not, some of people tell me that it happens since firefox 10..
Comment 3 Masayuki Nakano [:masayuki] (Mozilla Japan) 2012-09-03 20:21:00 PDT
Hmm, we didn't touch to IM code at Fx10, though...
Comment 4 Weng Xuetian 2012-09-03 20:26:05 PDT
I feel since this only happen to XIM, it maybe related to window timestamp issue.. only a guess though.
Comment 5 Karl Tomlinson (:karlt) 2012-09-03 20:39:24 PDT
Would you be able to run a debug build with NSPR_LOG_MODULES=Widget:5,WidgetFocus:5 in the environment, reproduce, and attach the log, please?

Debug builds are available from http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-09-03-mozilla-central-debug/ for example.
It may be best to use a separate profile for the nightly build.
Comment 6 Karl Tomlinson (:karlt) 2012-09-03 20:41:57 PDT
Could you be seeing this bug?
https://bugs.launchpad.net/ubuntu/+source/unity-2d/+bug/1010466
Comment 7 Weng Xuetian 2012-09-03 21:53:03 PDT
(In reply to Karl Tomlinson (:karlt) from comment #6)
> Could you be seeing this bug?
> https://bugs.launchpad.net/ubuntu/+source/unity-2d/+bug/1010466

No, I'm using KDE.

(In reply to Karl Tomlinson (:karlt) from comment #5)
> Would you be able to run a debug build with
> NSPR_LOG_MODULES=Widget:5,WidgetFocus:5 in the environment, reproduce, and
> attach the log, please?
> 
> Debug builds are available from
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-09-03-mozilla-
> central-debug/ for example.
> It may be best to use a separate profile for the nightly build.

I cannot reproduce this bug with nightly/beta/aurora/15 firefox on mozilla ftp. But same profile can still reproduce this easily with my firefox 15.
Comment 8 Karl Tomlinson (:karlt) 2012-09-03 21:56:16 PDT
Where did your firefox 15 come from?
Comment 9 Weng Xuetian 2012-09-03 22:05:49 PDT
Created attachment 657994 [details]
my mozconfig

Build by myself. But well.. I'm not the only one can reproduce this bug. They are not using the same build flag with me. I'm using chakra.

I know some other people can reproduce this with offical archlinux package since 13.0.
Comment 10 Karl Tomlinson (:karlt) 2012-09-03 22:42:37 PDT
That's hard to explain, unless perhaps there's an uninitialized variable.
valgrind if you are keen.
Comment 11 ISHIKAWA, Chiaki 2012-09-25 21:28:56 PDT
I am seeing this problem sporadically since early summer in thunderbird under linux (Debian GNU/Linux, and it uses Gnome, if I am not mistaken.)
Once this mis-behavior occurs, I have to kill thunderbird and restart.
[So I thought I may be seeing the problem reported in ubuntu URL in # 6 
But I am not using unity-2d mentioned as the culprit of the program. ]

I have reported this in a few message to mozilla.support.thunderbird
and mozilla.dev.apps.thunderbird in early August,
Re: How to get rid if non-click mouse responses
and as a follow up:
Smoking gun about dropdown box closing immediately (Re: How to get rid if non-click mouse responses)
in mozilla.support.thunderbird


( You can read the original post from someone who mentions a mouse problem and my posts in the thread. I initially thought it was a strange mouse event problem.
https://groups.google.com/forum/?fromgroups=#!topic/mozilla.support.thunderbird/2QyKaUtNXmo )

I have been using XIM front end for Japanese input.
Maybe some interactions with XIM front end of various types are triggering this problem?
(I have XMODIFIERS=@im=kinput2 in my startup to enable XIM input for Japanese using XIM front end called kinput2.)

It seems to me, correct me if I am wrong, people with language front end (i.e., more complex X event handling than otherwise) are likely to be affected.
Comment 12 ISHIKAWA, Chiaki 2012-10-01 21:26:00 PDT
After I experience the same problem again,
I tinkered with a lot with xev, and xwininfo commands to trace X11
events. 

I did not have much success until I came across a few interesting web
pages after searching "x11 pulldown notifyUngrab" using google
and decided to look at Xorg.log.0 and .xsession-errors.

FILE: Xorg.log.0: At the end of the log, I found the following line.

[1711707.925] [mi] Increasing EQ size to 512 to prevent dropped events.

This did not mean much to me at this time since the symptom began
appearing a while ago, and I am not sure if this particular
log line is related to the problem at hand.

FILE: ~/.xsession-errors:

This turned out to contain crucial information.

I found the following and similar lines!  (The Japanese string at the
beginning basically means "Warning from the Window Manager". )

Quote: 
ウィンドウ・マネージャーの警告: last_user_time (3214223008) is greater than comparison timestamp (1117867237).  This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW.  Trying to work around...
ウィンドウ・マネージャーの警告: 0x3e88569 (カウン\343) appears to be one of the offending windows with a timestamp of 3214223008.  Working around...


The above is exactly the problem symptom and analysis found in
the launchpad URL (!)
Unfortunately, by the time I notice this log line in .xsession-errors, 
the window object with ID 0x3e88569 is no where to be found.
The string inside the parenthesis is "カウン\343" (and actually \343
is raw binary char with that code value. I edited it to avoid strange
char code here. Looks to me that it could mean a window title string
which is broken a little bit. This could be a small window that
appears when certain Japanese input operation may be invoked... I have
no idea. But this does look suspicious with the incorrect string for
title. It ends with an incomplete first byte of two bytes wide char.)

I ran tail -f ~/.xsession-errors in a terminal window to catch
such interesting lines and see if I can generate it by certain operations.

I did not see anything interesting in the log when I tinkered with xev
to try to figure out what is going wrong.

But when I hit [NEW] to create the composing window in thunderbird,
the following appears.

ウィンドウ・マネージャーの警告: last_user_time (3215021160) is greater than comparison timestamp (1803951205).  This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW.  Trying to work around...
ウィンドウ・マネージャーの警告: 0x400c42d (作成: (�) appears to be one
of the offending windows with a timestamp of 3215021160.  Working
around.

The Japanese string at the beginning basically means 
"Warning from the Window Manager". (There are two instances of them)

Now, the second warning says, that the window id 0x400c42d is the
offending window.

So I dumped the information about
xwininfo -id 0x400c42d

It turns out to be the already hosed thunderbird creating a bogus
timestamp.

output of xwininfo -id 0x400c42d:
The Japanese string next to the windows id means "New: (No subject)".

xwininfo: Window id: 0x400c42d "作成: (件名なし)"

  Absolute upper-left X:  588
  Absolute upper-left Y:  118
  Relative upper-left X:  6
  Relative upper-left Y:  28
  Width: 1503
  Height: 810
  Depth: 24
  Visual: 0x21
  Visual Class: TrueColor
  Border width: 0
  Class: InputOutput
  Colormap: 0x20 (installed)
  Bit Gravity State: NorthWestGravity
  Window Gravity State: NorthWestGravity
  Backing Store State: NotUseful
  Save Under State: no
  Map State: IsViewable
  Override Redirect State: no
  Corners:  +588+118  --261+118  --261-44  +588-44
  -geometry 1503x810+582-37

ishikawa@debian-vbox-ci:~$ 

Tentative summary:

Just like mentioned in 

https://bugs.launchpad.net/ubuntu/+source/unity-2d/+bug/1010466/comments/22

once thunderbird (and presumably Firefox) got hosed by some offending
client (?) which sends out bogus very large timestamp [it could be
there is a problem in firefox, too], TB (and FF) seems to retain the
incorrect time stamp inside and it creates a new window with this
incorrect timestamp of the last event afterward.  (And the incorrect
timestamp well into the future causes the pulldown menu not working
anymore as the comment in the launchpad URL mentions.)

Proposed solution:

I wonder if thunderbird (and firefox) can be hardened (with the
attitude of defensive programming) to avoid remembering this type of
bogus timestamp, which may be sent by bogus client. (Essentially such
timestamp acts as Denial of Service attack.)  There is a chance that
TB (or FF) is generating such a timestamp within, also, though.

But I do not know exactly where to look right now.

Obviously, this is only X11-related.

I will restart the X11 session to figure out when this problem first
manifests in a X11 session and see if I can catch the culprit.

But, as I mention above, such misbehaving clients may happen from time
to time, and hardening TB (and FF) is desirable here.

And I think it strange that only FF (in the few web pages I found) is
mentioned to be affected by such bugs...

TIA
Comment 13 Karl Tomlinson (:karlt) 2012-10-01 22:48:15 PDT
Thanks for all the helpful analysis.

The program in attachment 519832 [details] may be helpful in identifying the kind of event that may contain the bad timestamp and the client that sent it.
Run with no arguments.  Timestamps are output only for ClientMessage events, but other kinds of events sent are output without timestamps.  The SendEvents recorded before the bogus Timestamp in the WM_TAKE_FOCUS may be the source of the bad timestamp.  Compare client ids / with the res_base from "xrestop -b -m 1" or window ids.

GDK tracks the most recent timestamp of some events, while Gecko tracks other kinds of events.
http://hg.mozilla.org/mozilla-central/annotate/fb076a446870/widget/gtk2/nsWindow.cpp#l1347
Gecko uses the most recent event from these two sources, but always overwrites its most recent record with a subsequent event, so effects of a bogus timestamp should be temporary, but GDK only updates its timestamp if the later event's timestamp is newer.  That makes this hard to fix in Gecko.
Comment 14 ISHIKAWA, Chiaki 2012-10-02 21:18:27 PDT
Thank you for the program.
I am collecting the event and hopefully can get down to the cause(s) of the problem soon.
I think there are multiple causes here.

TIA
Comment 15 ISHIKAWA, Chiaki 2012-10-08 21:08:40 PDT
I am still investigating.

In a short exchange with Karl Tomlinson earlier, he suggested that there may be library bug since in my trace and log from ~/.xsession-errors, I noticed both thunderbird and Emacs created strange time stamps. Seeing two very unrelated programs show similar problems may be a sufficient doubt about the library bug.

Today, I noticed a strange log after the problem symptom re-appeared.
(And please recall, ONLY thunderbird shows this sympton while all other programs easily invokable from the Gnome menu on the linux desktop seem to work OK. So it suggests something went wrong in the run-time data structure of thunderbird after some interactions with XIM input OR other causes... I wonder if there is any way to clear/reset the problematic data.)
Anyway, here is the strange log from ~/.xsession-errors:

search "ldap" not found - skipping.
ウィンドウ・マネージャーの警告: last_user_time (3067544512) is greater than comparison timestamp (2402997145).  This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW.  Trying to work around...
ウィンドウ・マネージャーの警告: 0x3001261 (GNOME 端�) appears to be one of the offending windows with a timestamp of 3067544512.  Working around...
search "ldap" not found - skipping.
ウィンドウ・マネージャーの警告: last_user_time (7471209) is greater than comparison timestamp (2403096456).  This most likely represents a buggy client sending inaccurate timestamps in messages such as _NET_ACTIVE_WINDOW.  Trying to work around...
ウィンドウ・マネージャーの警告: 0x3e735e0 (メッセ�) appears to be one of the offending windows with a timestamp of 7471209.  Working around...

(7471209 is the time stamp that thunderbird some got stuck with. )

Note the value of timestamps.
3067544512     value 1
2402997145     value 2
7471209        value 3

The first warning states value 1 is larger than value 2, which is fine.
But the second warning states that value 3 is larger than value 2 (!), which is patently WRONG.

I wonder if this is simply a problem of output routine mixing up signed and/or unsigned value, or something. (Is there 64 bits and 32 bits issues as well???)

Further, I could not find the value for 7471209 in the log captured by the posted program. Strange. Another clue that the warning output is somehow incorrect.

Presumably, the following events are related to the event warning in the .xsession-errors log:

2403096456 is 0x8F3C5788

and so the relevant lines are searched using 
time=0x8f3c5867

In the following excerpt from the captured events, 

0x600000, 0x600006c are related to kinput2-wnn (WIM input engine).

>12 - kinput2-wnn ( PID:  ?   ):
>	res_base      : 0x600000
>	res_mask      : 0x1fffff
>	windows       : 25

0x3e0000 and friends are related to Thundebird's windows.
>1 - FolderName - ishikawa@example.com - Mozilla Thunderbird ( PID: 7271 ):
>	res_base      : 0x3e00000
>	res_mask      : 0x1fffff
>	windows       : 111
>	GCs           : 53
>	fonts         : 1


0x00600000 SendEvent 33 to 0x03e000f6 propagate=0 mask=0x00000000
  _XIM_PROTOCOL window=0x03e000f6 format=8 0x0001004f 0x013e0009
0x00600000 SendEvent 33 to 0x03e000f6 propagate=0 mask=0x00000000
  _XIM_PROTOCOL window=0x03e000f6 format=8 0x00010051 0x013e0009
0x01c00000 SendEvent 18 to 0x00000138 propagate=0 mask=0x00180000
0x03e00000 SendEvent 18 to 0x00000138 propagate=0 mask=0x00180000
	   two more times repeated
0x01a00000 SendEvent 22 to 0x03e735e0 propagate=0 mask=0x00020000
0x01a00000 SetInputFocus 0x03e735e0 revert-to=PointerRoot time=0x8f3c5788
0x01a00000 SendEvent 33 to 0x03e735e0 propagate=0 mask=0x00000000
  WM_PROTOCOLS protocol=WM_TAKE_FOCUS time=0x8f3c5788
0x01a00000 SendEvent 18 to 0x00000138 propagate=0 mask=0x00180000
0x01a00000 SendEvent 18 to 0x00000138 propagate=0 mask=0x00180000
0x03e00000 SendEvent 33 to 0x0060006c propagate=0 mask=0x00000000
  _XIM_MOREDATA window=0x0060006c format=8 0x00050032 0x00100009
  <--- the first time I noticed _XIM_MOREDATA (!)
0x03e00000 SendEvent 33 to 0x0060006c propagate=0 mask=0x00000000
  _XIM_PROTOCOL window=0x0060006c format=8 0x03e73603 0x00000000
0x00600000 SendEvent 33 to 0x03e000f6 propagate=0 mask=0x00000000
  _XIM_PROTOCOL window=0x03e000f6 format=8 0x00010033 0x01fc0009
0x03e00000 SendEvent 33 to 0x0060006c propagate=0 mask=0x00000000
  _XIM_PROTOCOL window=0x0060006c format=8 0x00020038 0x01fc0009

  ... flurry of events between 0x0060000 and 0x03e00000 ...

0x03e00000 SendEvent 33 to 0x0060006c propagate=0 mask=0x00000000
  _XIM_PROTOCOL window=0x0060006c format=8 0x0001003b 0x01fd0009
0x03e00000 SetInputFocus 0x03e735e1 revert-to=Parent time=0x8f3c5788
0x00600000 SendEvent 33 to 0x03e000f6 propagate=0 mask=0x00000000
  _XIM_PROTOCOL window=0x03e000f6 format=8 0x00010051 0x01fd0009
0x03e00000 SendEvent 18 to 0x00000138 propagate=0 mask=0x00180000
0x03e00000 SendEvent 33 to 0x0060006c propagate=0 mask=0x00000000
  _XIM_PROTOCOL window=0x0060006c format=8 0x00010034 0x01fc0009
0x01a00000 SetInputFocus 0x03e0d4f7 revert-to=PointerRoot time=0x8f3c5867
0x01a00000 SendEvent 33 to 0x03e0d4f7 propagate=0 mask=0x00000000
  WM_PROTOCOLS protocol=WM_TAKE_FOCUS time=0x8f3c5867
0x01a00000 SendEvent 22 to 0x03e735e0 propagate=0 mask=0x00020000
0x00600000 SendEvent 33 to 0x03e000f6 propagate=0 mask=0x00000000
  _XIM_PROTOCOL window=0x03e000f6 format=8 0x00010035 0x01fc0009
0x03e00000 SendEvent 33 to 0x0060006c propagate=0 mask=0x00000000
  _XIM_PROTOCOL window=0x0060006c format=8 0x00010034 0x01fd0009

Grepping with time=, I found the following events near where the timestamp appears:
0x1a000000 is the Window manager: metacity.

0x03e00000 SendEvent 18 to 0x00000138 propagate=0 mask=0x00180000
0x03e00000 SendEvent 18 to 0x00000138 propagate=0 mask=0x00180000
0x03e00000 SendEvent 18 to 0x00000138 propagate=0 mask=0x00180000
0x01a00000 SetInputFocus 0x04400047 revert-to=PointerRoot time=0x8f3c6777
0x01a00000 SendEvent 33 to 0x04400047 propagate=0 mask=0x00000000
  WM_PROTOCOLS protocol=WM_TAKE_FOCUS time=0x8f3c6777
0x04400000 SetInputFocus 0x04400048 revert-to=Parent time=0x8f3c6777
0x01a00000 SetInputFocus 0x04400047 revert-to=PointerRoot time=0x8f3c6777
0x01a00000 SendEvent 33 to 0x04400047 propagate=0 mask=0x00000000
  WM_PROTOCOLS protocol=WM_TAKE_FOCUS time=0x8f3c6777
0x01a00000 GrabKeyboard 0x01a00c72 time=0x8f3c6777 owner-events=1 pointer-mode=Async keyboard-mode=Async
0x01a00000 UngrabKeyboard time=0x01a00c72
0x04400000 SetInputFocus 0x04400048 revert-to=Parent time=0x8f3c6777
0x01a00000 UngrabKeyboard time=0x8f3c6815
0x01c00000 SendEvent 33 to 0x00000138 propagate=0 mask=0x00180000
  _NET_WM_STATE window=0x01c3103a format=32 0x00000000 0x00000183
0x01c00000 SendEvent 18 to 0x00000138 propagate=0 mask=0x00180000
0x01c00000 SendEvent 33 to 0x00000138 propagate=0 mask=0x00180000


I will keep on looking. First, I will figure out why
the warning message that value 3 is larger than value 2 is generated.
Comment 16 ISHIKAWA, Chiaki 2012-10-09 09:08:23 PDT
(In reply to Karl Tomlinson (:karlt) from comment #5)

> Debug builds are available from
> http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2012-09-03-mozilla-
> central-debug/ for example.

It seems that there is no similar debug builds of thunderbird :-(
(well, not to be found easily as in the case of mozilla. I don't see any directory name with "debug" in an easy to find place...)

I tried to run thunderbird with valgrind.
There seems to be
a few uninitialized variable, an invalid read and write or something in 15.0.1. But because no debug symbols are available,
I can not see what exactly are the problems...

Also, only TB 15.0.1 and TB 18a2 showed windows after invoked under valgrind.
(Maybe some timing issues, but TB 16 and 17a2 failed to show the main window before valgrind printed error message below (I quote only the relevant part and omitted the lines preceding and following the excerpted part.)

==30744== Unsupported clone() flags: 0x800600
==30744== 
==30744== The only supported clone() uses are:
==30744==  - via a threads library (LinuxThreads or NPTL)
==30744==  - via the implementation of fork or vfork
==30744==  - for the Quadrics Elan3 user-space driver
==30744== 
==30744== Valgrind detected that your program requires
==30744== the following unimplemented functionality:
==30744==    Valgrind does not support general clone().

I will upload the valgrind session log(s) tomorrow.
Comment 17 Karl Tomlinson (:karlt) 2012-10-09 21:10:09 PDT
(In reply to ISHIKAWA, chiaki from comment #15)
> Note the value of timestamps.
> 3067544512     value 1
> 2402997145     value 2
> 7471209        value 3
> 
> The first warning states value 1 is larger than value 2, which is fine.
> But the second warning states that value 3 is larger than value 2 (!)

This is because the timestamp wraps back to 1 at 0xffffffff.  7471209 seems closer to 2402997145 if the timestamp has wrapped.
Comment 18 Karl Tomlinson (:karlt) 2012-10-09 21:11:29 PDT
There are a few callers of XPutBackEvent from XIM files in http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/im/ximcp
Some of them look like they don't initialize the time field of XKeyEvents.

There is also a suspicious uninitialized event in freewnn:
http://sourceforge.jp/cvs/view/freewnn/FreeWnn/Xwnmo/xwnmo/callback.c?view=markup#l638

I would suggest setting the time file of the XKeyEvent to 0 wherever it is not initialized, and checking whether that helps.
Comment 19 ISHIKAWA, Chiaki 2012-10-10 07:11:39 PDT
(In reply to Karl Tomlinson (:karlt) from comment #18)
> There are a few callers of XPutBackEvent from XIM files in
> http://cgit.freedesktop.org/xorg/lib/libX11/tree/modules/im/ximcp
> Some of them look like they don't initialize the time field of XKeyEvents.
> 
> There is also a suspicious uninitialized event in freewnn:
> http://sourceforge.jp/cvs/view/freewnn/FreeWnn/Xwnmo/xwnmo/callback.
> c?view=markup#l638
> 
> I would suggest setting the time file of the XKeyEvent to 0 wherever it is
> not initialized, and checking whether that helps.

Will do. Thanks!
Comment 20 ISHIKAWA, Chiaki 2012-10-12 09:50:06 PDT
Created attachment 670837 [details]
Uninitialised value messages from valgrind running TB 15.0.1

This is the valgrind session log when thunderbird 15.0.1 is run under it.

[This was run on a different PC with 2+GB of memory, but the uninitialised value message should be of use.]

Note the use of run-mozilla.sh script.
This is necessary to run thunderbird under valgrind.

(It used to be that thunderbird/thunderbid was a shell script to
run thudnerbird/thunderbird-bin. But thunderbird/thunderbird is a binary program now. Instead run-mozilla.sh can be used to run thunderbird under valgrind as in below.]

Please notice many messages of the types:
==18567== Use of uninitialised value of size 4
==18567== Conditional jump or move depends on uninitialised value(s)
Comment 21 ISHIKAWA, Chiaki 2012-10-12 09:54:26 PDT
Created attachment 670840 [details]
Uninitialised value messages from valgrind running TB 16.0b2

Funny, in this case of 16b2 (not the released one), TB failed to show
the window.  valgrind quits saying something about unsupported use of
clone().

But we still saw the uninitialised value messages.

[v 15.0.1 could run to show the window, and we quit TB using the quit button.]
Comment 22 ISHIKAWA, Chiaki 2012-10-12 09:56:13 PDT
Created attachment 670842 [details]
Uninitialised value messages from valgrind running TB 17.0a2

Funny, in this case of 17.0a2.en-US, TB failed to show the window.
valgrind quits saying something about unsupported use of clone().

[v 15.0.1 could run to show the window, and we quit TB using the quit button.]
Comment 23 ISHIKAWA, Chiaki 2012-10-12 09:58:27 PDT
Created attachment 670843 [details]
Uninitialised value messages from valgrind running TB 18.0a2

Only in the cases of TB18.0a2 and TB 15.0.1, I saw the main window
came up.

With other versions (16 and 17.0a2), the unsupported clone() message
appeared and thunderbird run stopped before such windows appeared...

But still we can see there are many uninitialised value messages.
Comment 24 ISHIKAWA, Chiaki 2012-10-12 10:04:42 PDT
Created attachment 670847 [details]
Uninitialized byte(s) message from valgrind running kinput2-wnn v3.1 (Debian)

First attempt to run kinput2-wnn (system supplied) under valgrind.

Notice the few messages about 
 param writev(vector[...]) points to uninitialised byte(s)

kinput2-wnn  was run initially like this after the image invoked at the
start of X11 session (/etc/X11/xinit.d/xinput/...) was stopped.

---- file : bin/run-kinput2-wnn.sh  ---
:
#
# run kinput2-wnn 
# under valgrind
#

KINPUT2PROG=kinput2-wnn
valgrind $KINPUT2PROG -jserver localhost \
	     -wnnenvrc4 /usr/share/wnn/ja_JP/wnnenvrc \
	     -ccdef /etc/kinput2/ccdef.kinput2.egg.NEW \
     -xim 

----
Comment 25 ISHIKAWA, Chiaki 2012-10-12 10:11:47 PDT
Created attachment 670849 [details]
XNextEvent shows up in an uninitialized byte(s) message from valgrind running kinput2-wnn v3.1 (Debian)

This session log shows XNextEvent in the incomplete stacktrace.
(Note that this was run under valgrind 3.7.0. kinput2-wnn is light weight and could be run under valgrind 3.7.0. TB could not.)
Comment 26 ISHIKAWA, Chiaki 2012-10-12 10:44:55 PDT
(Sorry, I am running out of time, and so just add this message.
Some uploads [patches] have to wait until the weekend is over.]

1. Status: the problem still persists.

2. What I have done: 

0-a Installed a fresh valgrind from its development web, and installed
it since the current package on Debian could not grok thunderbird binary.
(It choked on a few libraries: searching the error message suggested
that the latest version of valgrind may have  fixed the problem, and
it seems so indeed.)

   $ valgrind --version
   valgrind-3.8.1

0-b Installed a few tools for libxcb generation and build libX11 and libxcb so that I could insert some code to dump messages at crucial locations.
Without the tools installed, I could not build libxcb (see below). The names of the tools are self-evident due to the friendly configure / build failure messages.

i) kinput2-wnn (XIM engine): Fixed a few places that caused
valgrind to print uninitialized memory value messages.
For the warning messages (see 3-ii-a below).

NOTE: that I use the name kinput2-wnn and kinput2 interchangeably
below.  This is because, under my setup, where only an input
conversion method called Wnn (with the associated server, jserver) is
used, kinput2 and kinput2-wnn are synonymous: usually the kinput2 with
built-in support for only Wnn is built and then renamed as kinput2-wnn
and installed as such.

ii) libX11 imcp portion: inserted a few warning messages/log messages.

After I noticed the uninitialized variable messages from kinput2 under
valgrind, I inserted a few dump messages into libX11 and libxcb.  This
is because initially I thought they could be in ximcp portion of the
library code.  (Now it looks as though the the cause of the warnings
are more or less in kinput2.c code and has been taken care of.)

Actually, I had to modify libxcb source since, from what I understand, 
XCB interface is now the preferred interface used for X programs (?),
and libxcb-x-modulenames are the ones that really handle the data
transfer on the wire, so to speak. Anyway, libxcb is where the
warning was generated about uninitialized memory area.

The Xrecord extension and such would have been used usually in these
cases, but I suspected that the underlying X11 library had a problem
or two on its own, and thus added the dump just before writev() in
libxcb where kinput2 caused valgrind to print uninitialized value
messages.

iii) Ran the modified programs and tried to find the smoking gun.
Not yet found a smoking gun.

Still to find a clear cause. And trying...

3. Attachments:

i). Session log to show The uninitialized variable warnings produced
when thunderbird (various versions) ran under valgrind.

     Already uploaded.
     
I am afraid that 16.0.1 which I used at the office caused the problem
to appear again. So I am not sure what part program/library is to blame...

In any case, this might tempt a few developers of thunderbird to
create a debug version (with full debug symbols) so that people
trying to narrow down the cause of a problem in thunderbird can focus 
on the source line if they get lucky (using gdb, valgrind, etc.)
To be honest, creating a full debug build on a memory-tight machine is
very difficult.

Also, I see the following bugzilla entry:
      UUUU (Hmm,  I can't find it today, it mentioned something about
disable running thunderbird under valgrind because it is not quite
valgrind ready or to that effect.)

I wonder if this means that running thunderbird under valgrind is no
longer attempted(?), and wonder also this is why the thunderbird build
with debug symbols is not provided.

ii) a. Valgrind session log that shows the uninitialized variable
   messages from kinput2.

   kinput2-valgrind-10-10.txt
   https://bugzilla.mozilla.org/attachment.cgi?id=670847
   
   kinput2-wnn-valgrind-log-10-09.txt
   https://bugzilla.mozilla.org/attachment.cgi?id=670849

ii) b Valgrind session log that shows the problematic uninitialized
   variable messages from kinput2 are gone after my fix.

   [I think I will upload the problematic log with symbol names
   in the stack trace AND the log after the clean up to show the comparison.]

iii) Just FYI, the patch to kinput2 to suppress the warnings seen above.

     Will do so after creating the diff.

iv) Just FYI, the patch to libX11 and libxcb which I applied for
debugging purposes.

     Will do so after creating the diff.

[Do not install this in your ordinary LD_LIBRARY_PATH.  EVERY X
programs started from your command terminal will start to generate
dump messages, and this will overwhelm the CPU and I/O bandwidth of
your console. Install the library into a different directory not in
LD_LIBRARY_PATH and add this directory to the search PATH only before
you want to run a debug target such as kinput2.

v) X libraries used by emacs.

Since I suspect that emacs is also to blame, I attach the relevant
lines related to X libraries used by emacs. I checked this by using
the lsof output and picked up libX... lines which are related to
emacs.

    To be uploaded.

4. BIG QUESTIONS. 

I began seeing this problem since maybe around March of this year.  I wonder what
change in thunderbird code caused this problem to manifest itself.

And I am curious why only the particular instance of THUNDERBIRD
begins to misbehave and other programs had no problem showing drop
down/pull down menus. (I checked a few Gnome desktop accessories, and
even started firefox to see how other programs would behave when the
problem manifested itself again. The other programs had no problem
with pull down menus.)

From the post in 

https://bugs.launchpad.net/ubuntu/+source/unity-2d/+bug/1010466/comments/10

I see that seeing the fixed value (of incorrect timestamp, far into
the future?) appearing again and again suggests that there is a
problematic timestamp stored in the data of running thunderbird (in
GDK library data structure)?

I wonder how it got into thunderbird, though. From the current
understanding, it could be it sneaked into it from the interaction of
kinput2(-wnn) and jserver, etc.

BUT, there is a possibility that this is NOT in thunderbird itself.
The day before and today, the problem was so severe that the strange
time value got stuck EVEN IF I restarted thunderbird(!). I have to
restart the X11 session...  But this means then that the problem is
NOT in the running image of thunderbird at all. The incorrect
timestamp value must be stored somewhere OUTSIDE thunderbird, too.

CAN THIS MEAN THAT THE PROBLEM POSSIBLY IS IN THE WINDOW MANAGER
METACITY (or whatever is responsible for the Gnome desktop?)
Or, of course, kinput2(-wnn) may hold such strange value external to
thunderbird. *BUT*, if I start thunderbird afresh, does TB have a
chance to interact with kinput2 (even before I type anything) to
receive this bogus timestamp (?) from kinput2. I have no clue here.
Gnome desktop and its component is more likely culprit.
 
(I have a feeling that the value of 0 for timestamp, which is to be
interpreted as the current time [in GDK library, at least?] may not be
interpreted correctly in all the server(s) and client side(s). I am
not sure if all the programs use the same routine in a library to
handle these timestamps...)

Also emacs is also a suspect since, today, after an absence of
for a day, the problem reappeared again after I try to select and copy
a region in Emacs, and then when I tried to paste it into thunderbird
composition window (for a new message), I noticed that I could no
longer make the pull down menu from [Edit] button on the upper part of
the window. The problem returned after a full day!

My work style mixes thunderbird for mail reading and writing,
emacs for writing documents, and program editing and debugging.
[And seeing thunderbird misbehaving erratically just when I am composing
some important messages is quite annoying. If I am opening other
programs such as emacs, it is a bother to end the X session and
restart :-( ]

But the usage of emacs just before the problem appeared might be just
a fluke.

I noticed, though, the problem seems to be triggered OR NOTICED often
when I try to use right-button to show the context-sensitive menu. (It
will not show, or rather I see a flicker and then pull-down menus and
context-sensitive menus will not show.)
Hmm... This may be just an impossible to verify claim. But this *IS*
my first impression about when the problem is often noticed.

5. What I plan to do.

In comment 18
there is a problem mentioned for FreeWnn jserver code earlier.
So I will look into the XIM protocol implementation / handler in it.

Also, since there is a chance that Desktop itself is storing
incorrect timestamp somewhere, I will look into metacity source code
as well.
(I am afraid that I am going a fishing expedition...)

In the meantime, if someone has any idea about the areas I might want
to look, please let me know.
I am all ears.
Maybe I should try to print timestamps for more message types in
xrecord-focus.cpp or capture more events(?) to approach to the cause
of the bugs. Right now, I have no idea where the problematic timestamp
was inserted yet.

Just to be sure, and collect some data points, here is my environment.
Debian GNU/Linux   (within VirtualBox. I don't know if VBox may be to blame?)

libX11 : Agah. The problem appeared just when I was typing this information
         on another computer and I don't have the information handy.
         Will upload next Monday.

libxcb :

kinput2-wnn v3.1 (from Debian. Patched as above, but still the problem
persists.)

I wonder if the other reporters run the version of Debian GNU/Linux.
It may be that Debian's packages for Gnome changed early Spring and
this was when people started seeing these problems???
(Ubuntu is months or years ahead of package renewal and thus the
problems in Ubuntu were noticed earlier.)

TIA
Comment 27 ISHIKAWA, Chiaki 2012-10-12 10:47:57 PDT
I should have checked the behavior of firefox when even restarting thunderbird didn't cure the pull-down menu problem. Then probably I should be fairly certain that the external environment (gnome desktop programs or kinput2-wnn retains the
strange timestamp.) Hmm... so many variables to deal with :-(
Comment 28 ISHIKAWA, Chiaki 2012-10-12 20:20:50 PDT
Created attachment 671018 [details]
Valgrind session shows the uninitialized value messages from kinput2 are gone

[1] kinput2-v3.1 (original debian binary without symbol)
    under valgrind.
    libX11 and libxcb are system original (without specific dump messages)

    We know there is a problem. (actually a few: later ones are omitted.)

[2] kinput2-v3.1 with debug symbol (compiled from the debian source) 
under valgrind.
libX11 and libxcb are rebuilt with my own dump messages just before
writev() mentioned in the previous valgrind message.

Note that in this log with the symbolic names, the place where the
uninitialized value came from is very clear. Thus I could begin fixing
kinput2-v3.1 sources. There were a few places I needed to initialize
variables.

[3] After the fix is applied kinput2 source.
No more uninitialized value messages.

But the original problem still appears :-(
Comment 29 ISHIKAWA, Chiaki 2012-10-12 20:26:38 PDT
Created attachment 671019 [details] [diff] [review]
patch to kinput2-wnn v3.1 (3-1-10.3) source in Debian.

Kinput2 patch.

You need to install libwnn6-dev under Debian.  (A header file,
/usr/include/wnn/jllib.h in libwnn-dev, has issues with prototypes of
a few functions, and results in build failure.)

After copying Kinput2.conf to the one from the patch (for wnn
support only ),

run under the top-most directory
  xmkmf -a
  make

This should create the kinput2 binary under cmd directory.

When you run it, though, you may get the following error.  

/home/ishikawa/repos/kinput2-v3.1-CI-mods/cmd/kinput2: symbol lookup error: /usr/lib/i386-linux-gnu/libXrender.so.1: undefined symbol: _XGetRequest

This happens if you fail to have proper libX11 in one of the path of
LD_LIBRARY_PATH. (I have tried to install my own build of libX11 and
libxcb under /usr/local/X11/lib/ by specifiying ./configure
--prefix=/usr/local/X11 during configure.)

In my case, I had forgotten to install my own build of libX11 to
/usr/local/X11/lib and yet specified
LD_LIBRARY_PATH=/usr/local/X11/lib::/usr/local/lib during running of
kinput2 under valgrind.

After the installation of libX11 (and libxcb, of course) in
/usr/local/X11/lib, now kinput2 ran under valgrind and spewed out dump
messages at crucial positions.
Comment 30 ISHIKAWA, Chiaki 2012-10-12 20:27:42 PDT
Created attachment 671020 [details]
Kinput2.conf I used to create knput2-v3.1 (sorry missing from the patch above.)

Removed from the patch above by mistake.
Comment 31 Karl Tomlinson (:karlt) 2012-10-14 13:20:55 PDT
Can you log the time fields of KeyPress/KeyRelease events passed to XPutBackEvent?

This could be done with a conditional break point in gdb if debug symbols are installed, but it may be easier in your own build of libX11.
Comment 32 Karl Tomlinson (:karlt) 2012-10-14 13:22:36 PDT
Actually, only KeyPress is important.  I don't think GDK keeps timestamps from KeyRelease events.
Comment 33 ISHIKAWA, Chiaki 2012-10-14 16:18:16 PDT
(In reply to Karl Tomlinson (:karlt) from comment #31)
> Can you log the time fields of KeyPress/KeyRelease events passed to
> XPutBackEvent?

Will do. Give me a couple of days so that I can get a log when the problem re-appears.
Comment 34 ISHIKAWA, Chiaki 2012-10-14 16:22:09 PDT
By the way, I wonder if others who reported the problem use the same kinput2 XIM front end. My suspects now include:
 kinput2(-wnn), VirtualBox addon for linux (it is invoked when X copy is involved. It add-on can act as a buffer between host OS and guest OS to allow passing of copied text. So very OS specific),  metacity, emacs,  itself...
Comment 35 Weng Xuetian 2012-10-15 02:13:27 PDT
(In reply to ISHIKAWA, chiaki from comment #34)
> By the way, I wonder if others who reported the problem use the same kinput2
> XIM front end. My suspects now include:
>  kinput2(-wnn), VirtualBox addon for linux (it is invoked when X copy is
> involved. It add-on can act as a buffer between host OS and guest OS to
> allow passing of copied text. So very OS specific),  metacity, emacs, 
> itself...

Thank you for your work, I can reproduce with this bug under any other input method I know who using XIM. And I'm using kwin, without virtualbox.

For me I'd suspect this bug with gtk, libX11 or IMDKit used by every IM itself.

I know one of IM (fcitx) well since I develop it, the problem is it doesn't use home made X event, it uses the original event that send from xserver if it doesn't need to process it.

Some might related report:
https://bugs.freedesktop.org/show_bug.cgi?id=39367
https://bugzilla.gnome.org/show_bug.cgi?id=645993

I think you'd try do memset against here.
http://git.gnome.org/browse/gtk+/tree/modules/input/gtkimcontextxim.c?h=gtk-2-24#n710
Comment 36 ISHIKAWA, Chiaki 2012-10-15 22:34:31 PDT
(In reply to Weng Xuetian from comment #35)
> (In reply to ISHIKAWA, chiaki from comment #34)
> Thank you for your work, I can reproduce with this bug under any other input
> method I know who using XIM. And I'm using kwin, without virtualbox.
> 

You are welcome, and your reply helps me to focus on suspect(s).

> For me I'd suspect this bug with gtk, libX11 or IMDKit used by every IM
> itself.
> 
> I know one of IM (fcitx) well since I develop it, the problem is it doesn't
> use home made X event, it uses the original event that send from xserver if
> it doesn't need to process it.
> 
> Some might related report:
> https://bugs.freedesktop.org/show_bug.cgi?id=39367
> https://bugzilla.gnome.org/show_bug.cgi?id=645993
> 
> I think you'd try do memset against here.
> http://git.gnome.org/browse/gtk+/tree/modules/input/gtkimcontextxim.c?h=gtk-
> 2-24#n710

Thank you for your pointers. Coupled with another reply from Karl, [the bug 39367 at freedesktop seems very interesting.] I am inserting some probes to see if I can find the culprit(s).

But, then this problem started to appear more than 12 months ago. It is an interesting topic for a grad sutdent to learn how a problem in base library is noticed by application users after the new buggy library is packaged in a popular distribution (unstable, testing, and then stable to hit the large community) and then noticed by users who use specific features (XIM in this case) not used by every user. For those of us, this bug is very nasty, but for those who don't use XIM, thunderbird and firefox work just fine, I suppose.

Stay tuned.
Comment 37 ISHIKAWA, Chiaki 2012-10-16 06:27:13 PDT
Created attachment 671818 [details]
diff to libX11 source files

Attached is the patch I have applied to libX11 source (debian
distribution. libx11 (2:1.5.0-1) unstable; urgency=low

(I disabled the writev() dump in libxcb so that the output is reduced.)

It now seems that there are two causes of the problems.

 - easy one that goes away if I restart thunderbird.
   (I can show pull down menus if I restart thunderbird.)

 - difficult one that persists if I restart thunderbird.
   (I can not show pull down menus even if I restart thunderbird.
    I had to restart the whole X11 session...)
   This has happened only on a few occasions so far. 

It seems to me that the patch I applied to libX11
modules/im/ximcp/imDefLkup.c  (around line 710-720)
that sets time field to 0L 
fixed the easy-to-reproduce TB problems.
At least, I could not reproduce the problem after I have run TB,
kinput2(-wnn) and emacs for a day using the library.
I am not ruling out that the even the easier problem still persists somehow,
but the frequency has dropped significantly at least.

I will explain the case(s) where this fix certainly avoided introducing strange
time values in the event queue(!) in another post.

I recorded the timestamp of XPutBackEvent in a separate file (/var/tmp/X11-bug.log) so that
I can analyze it without cluttering up other files such as ~/.xsession-errors

Funny thing is that, or it may be quite simple when you think about it, only TB generates putback trace: kinput2-wnn does not.
Comment 38 ISHIKAWA, Chiaki 2012-10-16 06:53:27 PDT
After being unable to re-produce the problem for a while, it occurred to me that
running other programs and doing something there (typing, copying, etc.) seemed to trigger the problem. 

So I invoked another gnome terminal emulator and typed a few Japanese strings there and returned to TB to continue typing a new Japanese text there.

Voila, I see very suspicious timestamps recorded as follows.

I don't understand the logic behind this, but obviously the different history of 
function calls depending on the situation caused different stale data to 
be left on the stack and uninitialized event variable picked up the bogus timestamp from there.
This could be compiler-dependent, layout and optimization level, etc. That *MAY* explain why this problem was not noticed before. Also, the change in the GDK lib and maybe mozilla library change [I have no idea if pull down menu is now
drawn by mozilla gui tool that may have gone through change in the last 18 months or so.] 

Suspicious log: dumped into ~/.xsession-errors after gnome terminal was invoked and input was attempted using XIM there, and again when I typed some Japanese text
into TB compose window (for new message).

Notice some strange and sometimes very large values (marked with *) for ev.time.

Actually, I noticed that ev.time value of 44,48, etc. even before today and
thus inserted ev.time = 0L after dumping the time field.
I was surprised to find the big values such as 3220769152 which would certainly
be a cause of the strange time value reversal issue if this is picked up somewhere within TB. 

imDefLkup.c,385: putback k press   ev.xkey.time=531665848
imDefLkup.c,385: putback k press   ev.xkey.time=531666299
*imDefLkup.c,733: putback k press   ev.time=48 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531670994
imDefLkup.c,733: putback k press   ev.time=48 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
*imDefLkup.c,733: putback k press   ev.time=7077987 (bogus?) set to 0 after this.
*imDefLkup.c,733: putback k press   ev.time=3220769152 (bogus?) set to 0
after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531686660
imDefLkup.c,385: putback k press   ev.xkey.time=531687240
imDefLkup.c,385: putback k press   ev.xkey.time=531688116
imDefLkup.c,385: putback k press   ev.xkey.time=531688259
imDefLkup.c,385: putback k press   ev.xkey.time=531688389
imDefLkup.c,385: putback k press   ev.xkey.time=531688536
imDefLkup.c,385: putback k press   ev.xkey.time=531688672
imDefLkup.c,385: putback k press   ev.xkey.time=531688802
imDefLkup.c,385: putback k press   ev.xkey.time=531689134
imDefLkup.c,385: putback k press   ev.xkey.time=531689310
imDefLkup.c,385: putback k press   ev.xkey.time=531689640


imDefLkup.c,385: putback k press   ev.xkey.time=531665848
imDefLkup.c,385: putback k press   ev.xkey.time=531666299
*imDefLkup.c,733: putback k press   ev.time=48 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531670994
imDefLkup.c,733: putback k press   ev.time=48 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
*imDefLkup.c,733: putback k press   ev.time=7077987 (bogus?) set to 0 after this.
*imDefLkup.c,733: putback k press   ev.time=3220769152 (bogus?) set to 0
after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531686660
imDefLkup.c,385: putback k press   ev.xkey.time=531687240
imDefLkup.c,385: putback k press   ev.xkey.time=531688116
imDefLkup.c,385: putback k press   ev.xkey.time=531688259
imDefLkup.c,385: putback k press   ev.xkey.time=531688389
imDefLkup.c,385: putback k press   ev.xkey.time=531688536
imDefLkup.c,385: putback k press   ev.xkey.time=531688672
imDefLkup.c,385: putback k press   ev.xkey.time=531688802
imDefLkup.c,385: putback k press   ev.xkey.time=531689134
imDefLkup.c,385: putback k press   ev.xkey.time=531689310
imDefLkup.c,385: putback k press   ev.xkey.time=531689640
imDefLkup.c,385: putback k press   ev.xkey.time=531707221
imDefLkup.c,385: putback k press   ev.xkey.time=531718353
imDefLkup.c,385: putback k press   ev.xkey.time=531718479
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531722754
imDefLkup.c,385: putback k press   ev.xkey.time=531722873
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531726275
*imDefLkup.c,733: putback k press   ev.time=3220766856 (bogus?) set to 0
after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
*imDefLkup.c,733: putback k press   ev.time=48 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531733598
imDefLkup.c,385: putback k press   ev.xkey.time=531734043
imDefLkup.c,385: putback k press   ev.xkey.time=531734544
imDefLkup.c,385: putback k press   ev.xkey.time=531734574
imDefLkup.c,385: putback k press   ev.xkey.time=531734628
imDefLkup.c,385: putback k press   ev.xkey.time=531734658
imDefLkup.c,385: putback k press   ev.xkey.time=531734721
imDefLkup.c,385: putback k press   ev.xkey.time=531734751
imDefLkup.c,385: putback k press   ev.xkey.time=531734781
imDefLkup.c,385: putback k press   ev.xkey.time=531735141
imDefLkup.c,385: putback k press   ev.xkey.time=531735390
*imDefLkup.c,733: putback k press   ev.time=48 (bogus?) set to 0 after this.
*imDefLkup.c,733: putback k press   ev.time=48 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531743298
imDefLkup.c,385: putback k press   ev.xkey.time=531743798
imDefLkup.c,385: putback k press   ev.xkey.time=531743830
imDefLkup.c,385: putback k press   ev.xkey.time=531743860
imDefLkup.c,385: putback k press   ev.xkey.time=531743897
imDefLkup.c,385: putback k press   ev.xkey.time=531743928
imDefLkup.c,385: putback k press   ev.xkey.time=531743959
imDefLkup.c,385: putback k press   ev.xkey.time=531744017
imDefLkup.c,385: putback k press   ev.xkey.time=531744047
imDefLkup.c,385: putback k press   ev.xkey.time=531744084
imDefLkup.c,385: putback k press   ev.xkey.time=531744125
imDefLkup.c,385: putback k press   ev.xkey.time=531744173
imDefLkup.c,385: putback k press   ev.xkey.time=531744204
imDefLkup.c,385: putback k press   ev.xkey.time=531744247
imDefLkup.c,385: putback k press   ev.xkey.time=531744283
imDefLkup.c,385: putback k press   ev.xkey.time=531744320
imDefLkup.c,385: putback k press   ev.xkey.time=531744385
imDefLkup.c,385: putback k press   ev.xkey.time=531744516
imDefLkup.c,385: putback k press   ev.xkey.time=531744565
imDefLkup.c,385: putback k press   ev.xkey.time=531744595
imDefLkup.c,385: putback k press   ev.xkey.time=531744681
imDefLkup.c,385: putback k press   ev.xkey.time=531744720
imDefLkup.c,385: putback k press   ev.xkey.time=531744750
imDefLkup.c,385: putback k press   ev.xkey.time=531744780
imDefLkup.c,385: putback k press   ev.xkey.time=531744824
imDefLkup.c,385: putback k press   ev.xkey.time=531744862
imDefLkup.c,385: putback k press   ev.xkey.time=531744892
imDefLkup.c,385: putback k press   ev.xkey.time=531744936
imDefLkup.c,385: putback k press   ev.xkey.time=531744968
imDefLkup.c,385: putback k press   ev.xkey.time=531745084
imDefLkup.c,385: putback k press   ev.xkey.time=531745162
imDefLkup.c,385: putback k press   ev.xkey.time=531745195
imDefLkup.c,385: putback k press   ev.xkey.time=531745228
imDefLkup.c,385: putback k press   ev.xkey.time=531745263
imDefLkup.c,385: putback k press   ev.xkey.time=531745297
imDefLkup.c,385: putback k press   ev.xkey.time=531745331
imDefLkup.c,385: putback k press   ev.xkey.time=531745591
imDefLkup.c,385: putback k press   ev.xkey.time=531745774
imDefLkup.c,385: putback k press   ev.xkey.time=531745935
imDefLkup.c,385: putback k press   ev.xkey.time=531746070
imDefLkup.c,385: putback k press   ev.xkey.time=531746206
imDefLkup.c,385: putback k press   ev.xkey.time=531746346
imDefLkup.c,385: putback k press   ev.xkey.time=531746482
imDefLkup.c,385: putback k press   ev.xkey.time=531746628
imDefLkup.c,385: putback k press   ev.xkey.time=531747341
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531748836
imDefLkup.c,385: putback k press   ev.xkey.time=531749076
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531752480
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531756443
imDefLkup.c,385: putback k press   ev.xkey.time=531756576
*imDefLkup.c,733: putback k press   ev.time=44 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531759281
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
*imDefLkup.c,733: putback k press   ev.time=44 (bogus?) set to 0 after this.
*imDefLkup.c,733: putback k press   ev.time=3220769152 (bogus?) set to 0
after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531763025
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
*imDefLkup.c,733: putback k press   ev.time=44 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=44 (bogus?) set to 0 after this.
imDefLkup.c,733: putback k press   ev.time=0 (bogus?) set to 0 after this.
imDefLkup.c,385: putback k press   ev.xkey.time=531772056


Now, though, although my patch fixed the particular problem I am facing, there may be other causes for strange timestamps.
In some patched places, I only observe the value to see if there is a suspicious
value recorded during execution. My setup does not seem to invoke the execution of the patched positions so far. (For example, one place seems to be related to
Thai input. I do not use Thai (I only use the default English and Japanese input using XIM.).
So there may be other fixed needed to complete the patch to libX11.

PS: I found out that unless the system is in such a state that
TB started to show the suspicious timestamp deteced by the
metacity window manager, simply invoking the [HELP]->[INFO] does not cause
the strange timestamp issue from gnome terminal emulator program.

Somehow, we need to corrupt the X event queue and some data structure shared(?)
by gnome programs before gnome terminal program shows the reversed timestamp issue.
   

PPS: I think one major bug is erased, but I suspect there are at least a few more to go.

I will check FreeWnn, and metacity issues.

I wonder if Weng Xuetian  and others can test if my patch mainly the
zeroing of time field in the mentioned place above in imDefLkup.c
make the problem disappear for them, or if they would see log lines
from OTHER fprintf statements.

PPPS: After seeing time field, I have begun to notice BOGUS SERIAL number.
How should I fix it? Can I simply set it also to zero and hope nothing breaks ?
In the following URL kindly provided by Weng
http://git.gnome.org/browse/gtk+/tree/modules/input/gtkimcontextxim.c?h=gtk-2-24#n710
it says also
>  xevent.serial = 0;		/* hope it doesn't matter */

Well, "hope it doesn't matter", indeed. Any comment? :-)
Comment 39 ISHIKAWA, Chiaki 2012-10-16 06:59:05 PDT
Created attachment 671829 [details]
log to show bogus(?) serial after time field is tentatively fixed (set to 0).

Note the bogus serial number.
This serial obviously creeps in where previously bogus time stamp was introduced.
(I could have simply zeroed the whole event variable. But I am not sure if zeroing the serial is OK or not. Of course, right now, it is a random junk.)
Comment 40 Karl Tomlinson (:karlt) 2012-10-16 13:32:41 PDT
(In reply to ISHIKAWA, chiaki from comment #38)
> PPPS: After seeing time field, I have begun to notice BOGUS SERIAL number.
> How should I fix it? Can I simply set it also to zero and hope nothing
> breaks ?

> Well, "hope it doesn't matter", indeed. Any comment? :-)

Looks like GDK uses that here
http://git.gnome.org/browse/gtk+/tree/gdk/x11/gdkevents-x11.c?h=gtk-2-24#n2339

I expect setting the serial to LastKnownRequestProcessed(display) should work fine.
Comment 41 ISHIKAWA, Chiaki 2012-10-16 20:19:48 PDT
(In reply to Karl Tomlinson (:karlt) from comment #40)
> I expect setting the serial to LastKnownRequestProcessed(display) should
> work fine.

Thank you. I will set serial to this value and see what happens.
Comment 42 ISHIKAWA, Chiaki 2012-10-17 02:10:41 PDT
Now I am fairly confident that the problem seems to have gone away for me using
kinput2-wnn as XIM front end (as far as Japanese input across a few applications including thunderbird, and gnome terminal is concerned.)

Use of LastKnownRequestProcessed(display) seems to have no ill-effect (and should not, I suppose).

In any case, I think the patch I posted to the URL at freedesktop
quoted in the following

> Karl Tomlinson (:karlt) 2012-10-15 02:24:39 PDT
> See Also: https://bugs.freedesktop.org/show_bug.cgi?id=39367

should fix the strange time value noticed in XNextEvent in the original reporter there.

Let us hope that patch in a modified version will make it into official X11 distribution
and picked up by distributions such as Debian, Ubuntu, etc.

But still I am wondering: why it is only (?) thunderbird and firefox and other
clients seemed to OK then? (Or was it likely that other clients, if we screw up THEIR internal timestamp because of XIM input uninitialized variable bug, would start to trigger this problem?
It is a little strange that I noticed only firefox related bug reports using google. Or maybe I was just biased during the search :-)
And, of course, firefox is one of the major applications used all over the world by so many, it may trigger the bug before users of other software packages noticed the bug.

TIA
Comment 43 ISHIKAWA, Chiaki 2012-10-24 08:11:27 PDT
>In any case, I think the patch I posted to the URL at freedesktop
>quoted in the following
>
>> Karl Tomlinson (:karlt) 2012-10-15 02:24:39 PDT
>> See Also: https://bugs.freedesktop.org/show_bug.cgi?id=39367

Hi, I updated the patch I posted in the above URL to fix
bogus timestamp issue in XIM support in libX11. 
The new patch I posted there now includes one extra initialization of a buffer which was noticed by valgrind last week.

Ever since I applied the patch and testing using the modified libX11, I have not seen the problem of pull down menu not showing correctly for a full week.

A great improvement(!) 

I wonder if someone using XIM input protocol for Chinese input
can test this out also.

For this, I can make the libX11 with the patch available, but in this day and age, you don't want to install a binary from a stranger, a random Joe, on your linux machine. So I prefer the distros to prepare the testing version of the libX11 instead.

Or I can help someone to install their modified libX11 step by step off-line by e-mail or in the above URL, etc.

TIA
Comment 44 Hunter Tao 2012-10-24 14:33:00 PDT
hi, Karl and Ishikawa san, I'm a user from china who is tracking this issue since Sep. Thank you very much for your great efforts to resolve this problem. Maybe I can test the patch for Chinese input. I have the same problem under linux mint 12(kernel 3.0.0-12) with both ibus and fcitx IMs. I have both firefox/thunderbird 16.0.1 installed. If necessary, I can test other versions of firefox and thunderbird. Would you pls suggest the detailed steps of how to test the patch? I have basic skills of compiling firefox or other C/C++ programs. As I know, many chinese linux firefox users are waiting for this patch.
Comment 45 Weng Xuetian 2012-10-24 15:14:47 PDT
I have some difficulty reproduce this bug right now (well.. I know this bug is kinds of "random", so it doesn't means this bug is gone without patch).

So I'm not sure whether I can give out some useful info recently. But I will try to find some of my friend to test your patch.

@Hunter Tao:
Compile library is kinds of hard if you know about nothing of this.

But I think you can use:
apt-get build-dep libx11 # get compile dependency
apt-get source libx11 # get dpkg source
apt-get install dpkg-dev # dpkg

and now you will get libx11-x.y.z directory.

Download the patch, and put it inside debian/patches/.
And edit debian/patches/series, add the downloaded patch file name.

And cd back to the libx11-x.y.z, execute:

 dpkg-buildpackage

Then you will get new patched package in parent directory.

BTW, fcitx and ibus all support non-xim way, so I don't think this bug affect any fcitx or ibus user if they setup correctly. So if you want to test, you need to make sure you run firefox with:

GTK_IM_MODULE=xim firefox
Comment 46 stlifey 2012-10-24 19:59:12 PDT
(In reply to ISHIKAWA, chiaki from comment #43)
> >In any case, I think the patch I posted to the URL at freedesktop
> >quoted in the following
> >
> >> Karl Tomlinson (:karlt) 2012-10-15 02:24:39 PDT
> >> See Also: https://bugs.freedesktop.org/show_bug.cgi?id=39367
> 
> Hi, I updated the patch I posted in the above URL to fix
> bogus timestamp issue in XIM support in libX11. 
> The new patch I posted there now includes one extra initialization of a
> buffer which was noticed by valgrind last week.
> 
> Ever since I applied the patch and testing using the modified libX11, I have
> not seen the problem of pull down menu not showing correctly for a full week.
> 
> A great improvement(!) 
> 
> I wonder if someone using XIM input protocol for Chinese input
> can test this out also.
> 
> For this, I can make the libX11 with the patch available, but in this day
> and age, you don't want to install a binary from a stranger, a random Joe,
> on your linux machine. So I prefer the distros to prepare the testing
> version of the libX11 instead.
> 
> Or I can help someone to install their modified libX11 step by step off-line
> by e-mail or in the above URL, etc.
> 
> TIA

Thanks,it fixed!in nightly with xsunpinyin
Comment 47 Hunter Tao 2012-10-24 20:46:47 PDT
(In reply to Weng Xuetian from comment #45)
> I have some difficulty reproduce this bug right now (well.. I know this bug
> is kinds of "random", so it doesn't means this bug is gone without patch).
> 
> So I'm not sure whether I can give out some useful info recently. But I will
> try to find some of my friend to test your patch.
> 
> @Hunter Tao:
> Compile library is kinds of hard if you know about nothing of this.
> 
> But I think you can use:
> apt-get build-dep libx11 # get compile dependency
> apt-get source libx11 # get dpkg source
> apt-get install dpkg-dev # dpkg
> 
> and now you will get libx11-x.y.z directory.
> 
> Download the patch, and put it inside debian/patches/.
> And edit debian/patches/series, add the downloaded patch file name.
> 
> And cd back to the libx11-x.y.z, execute:
> 
>  dpkg-buildpackage
> 
> Then you will get new patched package in parent directory.
> 
> BTW, fcitx and ibus all support non-xim way, so I don't think this bug
> affect any fcitx or ibus user if they setup correctly. So if you want to
> test, you need to make sure you run firefox with:
> 
> GTK_IM_MODULE=xim firefox

Wen, thank you for your advice. I'll follow your instructions to try it.
Comment 48 ISHIKAWA, Chiaki 2012-10-25 00:35:28 PDT
[I have a suggestion to Karl to think about in the latter half of this post.]

(In reply to Hunter Tao from comment #47)
> (In reply to Weng Xuetian from comment #45)
> > I have some difficulty reproduce this bug right now (well.. I know this bug
> > is kinds of "random", so it doesn't means this bug is gone without patch).
> > 
> > So I'm not sure whether I can give out some useful info recently. But I will
> > try to find some of my friend to test your patch.

Are the affected users of Chinese input mostly Debian or Ubuntu users?
[I use Debian myself.]

I am not sure if I can help users of other distributions, but
given the patch I placed in freedesktop URL, 
https://bugs.freedesktop.org/show_bug.cgi?id=39367
it would not be difficult for
someone familiar with the building of packages of a particular linux
distribution to create a local copy for testing.
Again, maybe you can post your requirements here then someone familiar with your
linux distribution can help (hopefully).

TIA

PS: I am encouraged to learn that there seem to be many users who have problems with this
incorrect timestamp. And if my fix to libx11 fixes their problems, I would be glad to thear that.
Strangely, I don't see many blogs about this problem in Japan. Maybe the ratio of users of XIM input method is in decline in Japan, but there are definitely users of 
old XIM input methods such as one used by kinput2 and I am surprised at the relative lack of
reporting of this particular problem from Japanese users. Once this problem occurs,
I can't even choose "cc:" or "bcc:" in the addresses in thunderbird, and this made me
to think "Enough is enough" and I began investigating the problem, which led to the discovery of the problems in libx11 (not in thunderbird itself.)

Now, though, I noticed that people who are not using XIM input mode are also affected (?)
Then there MUST be some other causes.
I wonder what thunderbird / firefox can do about it.

Karl, I noticed that you have heavily fixed the event handling code and you must be familar with it. Does the following sound impractical? (Technically feasible. I suspect
the inclusion of mods to GTK may have some issues.)

A:
- Let the gecko code to record the latest timestamp received and the 
window id (or whatever unique ID) whenever the internally ketp timestamp is replaced.

B:
- Also, let GTK code to record the window id (or whatever unique id) when it tries to replace the recorded timestamp with the new one (so that it knows who produced the new timestamp.)
This requires change to GTK library code. But not that difficult technically in principle.

C:
- I don't know if this is even possible [dont know if the variable is exported (visible) or there is an externally visible method to change the timestamp value], but at the timing of replacing timestamp recorded within Gecko library within thunderbird / or firefox, *IF* the timestamp recorded WITHIN GTK library seems to be way off into the future, *REPLACE* it with the timestamp (that is about to replace the one inside Gecko and to be used in the future) so that the bogus timestamp is removed from GTK internal, AND report the previous bogus timestamp value and the original owner (window ID or whatever) that passed it to the GTK library [recorded in Step B above] to error console (to be printed in ~/.xsession-error
in the case of GNOME desktop.)

I am afraid that GTK library today seems to pursue an idealistic goal in an imperfect world.
XIM bug was not noticed more than 10 years if I am not mistaken. XIM related routines never bothered to check the timestamp and serial even but they seem to work more or less.
In a perfect world, every event has the correct time and serial number.
But this is an imperfect world.

Pursuing ideal is fine. But the change in GTK has made it very difficult for non-related third-party software to track down the problems caused by bogus timestamp. 
We can at least try to make the analysis of the problem easier.
At least the change suggested in the step (B) and (C) above, would make it possible for other software packages to find out who is generating the bogus timestamp.

What do you think, Karl?

TIA
Comment 49 ISHIKAWA, Chiaki 2012-10-25 01:33:55 PDT
>*REPLACE* it with the timestamp (that is about to replace the one inside Gecko and to be used >in the future) so that the bogus timestamp is removed from GTK internal, AND report the >previous bogus timestamp value and the original owner (window ID or whatever) that passed it >to the GTK library [recorded in Step B above] to error console (to be printed in >~/.xsession->error
>in the case of GNOME desktop.)

Come to think of it, I can live with only "AND report" part if maitainer of GTK library is not
happy about the external meddling of the "*REPLACE*" part. sure thunderbird/firefox may
behave funnily, but at least we would know who passed the bogus timestamp.

TIA
Comment 50 Weng Xuetian 2012-10-25 06:28:27 PDT
(In reply to ISHIKAWA, chiaki from comment #48)
> [I have a suggestion to Karl to think about in the latter half of this post.]
> 
> (In reply to Hunter Tao from comment #47)
> > (In reply to Weng Xuetian from comment #45)
> > > I have some difficulty reproduce this bug right now (well.. I know this bug
> > > is kinds of "random", so it doesn't means this bug is gone without patch).
> > > 
> > > So I'm not sure whether I can give out some useful info recently. But I will
> > > try to find some of my friend to test your patch.
> 
> Are the affected users of Chinese input mostly Debian or Ubuntu users?
> [I use Debian myself.]
> 
> I am not sure if I can help users of other distributions, but
> given the patch I placed in freedesktop URL, 
> https://bugs.freedesktop.org/show_bug.cgi?id=39367
> it would not be difficult for
> someone familiar with the building of packages of a particular linux
> distribution to create a local copy for testing.
> Again, maybe you can post your requirements here then someone familiar with
> your
> linux distribution can help (hopefully).
> 
Well I don't use debian/ubuntu (Chakra), and I'm sure I reproduced this sometimes when some user complain about this.

BTW, I modified fcitx code to make it send a bogus event on purpose, and now I reproduce this and it works with your patch.

As for problem itself, the bogus timestamp may have more than one source, but XIM seems to be the most common one to trigger this.

And I'm also wondering that why this bug appear recently.. firefox still using gtk2, and gtk2 is not changed for a long time.
Comment 51 ISHIKAWA, Chiaki 2012-10-25 07:31:30 PDT
(In reply to Weng Xuetian from comment #50)
> (In reply to ISHIKAWA, chiaki from comment #48)
> > [I have a suggestion to Karl to think about in the latter half of this post.]
> > 
> > (In reply to Hunter Tao from comment #47)
> > > (In reply to Weng Xuetian from comment #45)
> > > > I have some difficulty reproduce this bug right now (well.. I know this bug
> > > > is kinds of "random", so it doesn't means this bug is gone without patch).
> > > > 
> > > > So I'm not sure whether I can give out some useful info recently. But I will
> > > > try to find some of my friend to test your patch.
> > 
> > Are the affected users of Chinese input mostly Debian or Ubuntu users?
> > [I use Debian myself.]
> > 
> > I am not sure if I can help users of other distributions, but
> > given the patch I placed in freedesktop URL, 
> > https://bugs.freedesktop.org/show_bug.cgi?id=39367
> > it would not be difficult for
> > someone familiar with the building of packages of a particular linux
> > distribution to create a local copy for testing.
> > Again, maybe you can post your requirements here then someone familiar with
> > your
> > linux distribution can help (hopefully).
> > 
> Well I don't use debian/ubuntu (Chakra), and I'm sure I reproduced this
> sometimes when some user complain about this.
> 
> BTW, I modified fcitx code to make it send a bogus event on purpose, and now
> I reproduce this and it works with your patch.
> 

Sounds good !

> As for problem itself, the bogus timestamp may have more than one source,
> but XIM seems to be the most common one to trigger this.

It seems to be the case.
 
> And I'm also wondering that why this bug appear recently.. firefox still
> using gtk2, and gtk2 is not changed for a long time.

This also puzzles me a lot. I have noticed this problem recently since Spring of this year. Wonder why I didn't notice this before.
Comment 52 Hunter Tao 2012-10-25 15:50:13 PDT
Ishikawa san, following Mr Wen's instruction, I succussfully compiled and installed libx11 with the new patch on my pc. I started ibus-daemon --xim, and tested chinese input in firefox16.0.1 with IM ibus-pinyin, lasting for about 10 minuts, all original problems were gone, all pull down and right-click popup menus worked normally. Pls help to double confirm if I did the test correctly. I'll install new compiled libx11 and test it again under another mint 12 system with the same problem, and update later if there is any progress. Thank you very much.

To Wen, although I met problem by using the commands you suggested, but you did help me to learn the steps of how to set up the compiling environment of libx11 with a new patch. Many thanks.
Comment 53 ISHIKAWA, Chiaki 2012-10-25 20:02:12 PDT
(In reply to Hunter Tao from comment #52)
> Ishikawa san, following Mr Wen's instruction, I succussfully compiled and
> installed libx11 with the new patch on my pc. I started ibus-daemon --xim,
> and tested chinese input in firefox16.0.1 with IM ibus-pinyin, lasting for
> about 10 minuts, all original problems were gone, all pull down and
> right-click popup menus worked normally. 

Sounds good to me.

> Pls help to double confirm if I did the test correctly.

I think other Chinese users can confirm this. Anyone?
 
> I'll install new compiled libx11 and test it again under
> another mint 12 system with the same problem, and update later if there is
> any progress. Thank you very much.

You are very welcome.

But just in case, you are still using the patch with fprintf statement to print
the warning about fixed timestamp, I posted a renewed patch to

https://bugs.freedesktop.org/show_bug.cgi?id=39367

to eliminate the warning.

Before installing the modified libX11, please make sure the extra debug warning is not printed.
(Maybe you can remove the old patch from debian, and then put the new patch and try again?
If you have removed the fprintf statements already, you don't have to do anything, but
if you have not, please use the latest patch to re-create libx11.
Otherwise, the debug warning may show up unexpectedly in normal program output.)

TIA
Comment 54 Hunter Tao 2012-10-25 21:28:18 PDT
(In reply to ISHIKAWA, chiaki from comment #53)

> But just in case, you are still using the patch with fprintf statement to
> print
> the warning about fixed timestamp, I posted a renewed patch to
> 
> https://bugs.freedesktop.org/show_bug.cgi?id=39367
> 
> to eliminate the warning.
> 
> Before installing the modified libX11, please make sure the extra debug
> warning is not printed.
> (Maybe you can remove the old patch from debian, and then put the new patch
> and try again?
> If you have removed the fprintf statements already, you don't have to do
> anything, but
> if you have not, please use the latest patch to re-create libx11.
> Otherwise, the debug warning may show up unexpectedly in normal program
> output.)
> 
> TIA

Ishikawa san, thank you for your upadte. I didn't remove the fprintf statements. I'll download the latest patch in the url you provided, test again after re-compiling libx11 with the new patch.
Comment 55 Hunter Tao 2012-10-26 14:16:36 PDT
Hi, Ishikawa san, I've completed the test of the new compiled libx11 with latest patch on my 2nd linux mint 12 system. Below is the detailed test steps I performed, and the results.

1. Re-compile libx11 with the patch eliminate-bogus-timestamp-less-fprintf.diff
   a. remove previous patch t.diff, and remove t.diff entry in debian/patches/series
   b. copy new patch eliminate-bogus-timestamp-less-fprintf.diff into debian/patches, and add new diff entry in debian/patches/series
   c. re-compile, created new libx11 deb packages successfully.

2. Confirm if the bug can be triggered out on my 2nd linux mint 12 system
   a. start ibus-daemon --xim with ibus-pinyin or ibus-sunpinyin
   b. perform test to trigger out the problem on firefox 16.0.1/14.0.1/17.0b1, and thunderbird 16.0.1, one at a time.
   c. every time the problem can be successfully triggered out.
   d. after the problem is triggered out, some defunct ibus processes are found in 'ps -ef|grep ibus'. to make a clear process table for the next test, sign off gnome session and sign in again, go back to step 2.a for the next test.

   Test result: the bug can be triggered out on all versions listed above, and the symptom is the same.

3. Install new compiled libx11 debian package with the latest patch eliminate-bogus-timestamp-less-fprintf.diff, reboot system.

4. Perform the same test steps in step 2 for all versions listed above. The patch works! The problem can not be found anymore!

Your latest patch fixed this XIM bug on both of my linux mint 12 systems. Great job! Thanks again for your hard work!
Comment 56 Karl Tomlinson (:karlt) 2012-10-28 21:33:16 PDT
(In reply to ISHIKAWA, chiaki from comment #48)
> Does the following sound impractical? (Technically
> feasible. I suspect
> the inclusion of mods to GTK may have some issues.)
> 
> A:
> - Let the gecko code to record the latest timestamp received and the 
> window id (or whatever unique ID) whenever the internally ketp timestamp is
> replaced.
> 
> B:
> - Also, let GTK code to record the window id (or whatever unique id) when it
> tries to replace the recorded timestamp with the new one (so that it knows
> who produced the new timestamp.)
> This requires change to GTK library code. But not that difficult technically
> in principle.
> 
> C:
> - I don't know if this is even possible [dont know if the variable is
> exported (visible) or there is an externally visible method to change the
> timestamp value], but at the timing of replacing timestamp recorded within
> Gecko library within thunderbird / or firefox, *IF* the timestamp recorded
> WITHIN GTK library seems to be way off into the future, *REPLACE* it with
> the timestamp (that is about to replace the one inside Gecko and to be used
> in the future) so that the bogus timestamp is removed from GTK internal, AND
> report the previous bogus timestamp value and the original owner (window ID
> or whatever) that passed it to the GTK library [recorded in Step B above] to
> error console (to be printed in ~/.xsession-error
> in the case of GNOME desktop.)

I suspect the GTK2 maintainers are unlikely to accept significant changes at
this stage as development has now moved to GTK3.

The window id is not really so helpful in this situation where a library is
adding events to the Xlib event loop.  Any window ID would be an ID of the
application having the problem, but this wouldn't do much to help identify the
library causing the problem, because that library didn't create the window.
Reporting the window id makes more sense for the window manager because it
communicates with separate processes and the window ID helps identify the
process involved.

Detecting bad times is perhaps a possibility, but it is not immediately clear
just from looking at the timestamp.  These timestamps wrap around every ~50
days.  GDK currently considers timestamps in the next 25 days to be in the
future and those in the last 25 days to be in the past.  It is quite
reasonable for an application to remain unused for a week or two, in which
case the next timestamp could look bad.  However, when noticing a significant
jump in time, the app or GDK could query the server for the current time, and
only accept the timestamp if it seems to be recent.  My guess though is that
people will suggest the problem be fixed at the root cause rather than adding
code to work around bugs.

The other possible improvement that perhaps could be made to GDK (version 3)
is that it always consider some events as having the most recent timestamp,
instead of always comparing with the last timestamp.  When these events are
received they always reset any possibly buggy timestamp.  Events from the
server should always have accurate timestamps.  Events from the window manager
or other apps may be a little in the past and so should still be compared with
the last timestamp.  Events from XPutBackEvent would often (or perhaps
usually) look like server events, but at least any buggy timestamp would soon
be overwritten.

This last option may be a sensible approach even if there were no
buggy timestamps, because it would mean that an application would still get
its timestamp(s) updated even if it hasn't been used for 25 days.
Comment 57 Karl Tomlinson (:karlt) 2012-10-28 21:34:56 PDT
(In reply to Weng Xuetian from comment #50)
> And I'm also wondering that why this bug appear recently.. firefox still
> using gtk2, and gtk2 is not changed for a long time.

I suspect the menu problems here started when Gecko started using
gdk_x11_display_get_user_time to get a timestamp in bug 500081.  Before then
the incorrect timestamps would have only caused more subtle problems, such as
the metacity messages and the problems with focus stealing prevention.

It's not trivial for Gecko to get all the same information in another way
because GDK doesn't provide the timestamp in all the events that it sends to
the client.
http://hg.mozilla.org/mozilla-central/annotate/fb076a446870/widget/gtk2/nsWindow.cpp#l1347

However, Gecko probably doesn't need to consider the timestamp of
WM_DELETE_WINDOW events.  It is unlikely, that any app would initialize a drag
or pointer grab on a request to close a window.

Gecko could record the timestamp of drop events through the drag-drop signal
on GtkWidgets.

If Gecko kept track of enough timestamps then it would only need to fall back
to gdk_x11_display_get_user_time to get the info from DESKTOP_STARTUP_ID when
it has received no other interesting events.

The advantage then is that, if Gecko does get a bad timestamp, then at least
the next event (button press, or similar) would contain a good timestamp and
the problem would not remain for long.

The main disadvantage would be that it could end up with an old timestamp from
a late drop event, but any problem should still be short lived.
Comment 58 ISHIKAWA, Chiaki 2012-10-28 22:57:26 PDT
(In reply to Karl Tomlinson (:karlt) from comment #56)
> 
> This last option may be a sensible approach even if there were no
> buggy timestamps, because it would mean that an application would still get
> its timestamp(s) updated even if it hasn't been used for 25 days.

(In reply to Karl Tomlinson (:karlt) from comment #57)
> (In reply to Weng Xuetian from comment #50)

> The advantage then is that, if Gecko does get a bad timestamp, then at least
> the next event (button press, or similar) would contain a good timestamp and
> the problem would not remain for long.
> 
> The main disadvantage would be that it could end up with an old timestamp
> from
> a late drop event, but any problem should still be short lived.

Karl, thank you for the explanation.

I can only hope that your last approaches may make it to GDK3 (so GKD2 is abaondoned for new fixes and such?). 
Even if it is indeed difficult to figure out where an event originates, let the users 
give a clue. 

The manner GDK3 handle events seem to open a great hole for DoS-like annoyances.
What I would like the library to let the affected programs (and its users) to detect
something is amiss and to use a built-in hook to get a clue about it. (The clue may be
obscure according to your explanation, though.)
Having to run gdb or valgrind and wait for something to happen is 
not a favorite cup of of tee many users, I believe. [And they have no clues where to set the
break points and what to look for.]
If thunderbird/firefox/your-favorite-application-here can report even an obscure warning about the pending problem, that at least gets the user reporting to knowledgeable users like you.

[I had a hard time figuring out the problem of drop-down menu not showing is related to
this timestamp problem until I hit this bugzilla entry. 
And it took me lietrally hours (on and off) spread over days to reach this bugzilla.

Last week, I could count at least half a dozen web pages where people afflicted with the same problem have discussed the issues without any clue at all: many are still open. This is because they have  no clues about what are causing it and not sure what the problems are indeed. [Initially I searched for "pull down menus", but some prefer to use "drop down menus" and I found more bug discussions when I use the phrase "drop down menu" for searching. ]
So correcting this problem completely may not be possible, but giving clue that something is wrong from the afflicted client is more important. 
That is why I call for some feature WITHIN client that the server/window manager (or whatever) may be behaving undesirably due to the warped time data within the library that the client itself can't easily change, etc. Yes, the problem is compounded because it is not entirely clear who owns the data and have the authority to tweak the value if necessary.

(Not that I claim the underlying X11 architecture is free from DoS opportunities. It is full of them, I am afraid. But at least, let us make it easy to spot the occurring problems first.)

Thank you again for your explanation, I now can fathom how difficult it seems to be.
Comment 59 ISHIKAWA, Chiaki 2012-11-12 07:12:06 PST
Created attachment 680626 [details]
warning about 0 timestamp from the window manager

FYI:

I was checking the use of uninitialized variable within thunderbird just in case using valgrind. 

(See  Bug 803816 Valgrind warnings about uninitialised memory use (Thunderbird))

Each test runs about 500+ tests (due to the slowdown by valgrind, some tests timeout and the number of tests that run to completion varies from one run to the other.
For example, one run showed this line at the end.
INFO | (runtestlist.py) | Directories Run: 33, Passed: 507, Failed: 32
)

While I ran maybe three such runs, I noticed that there are many 0 time stamp warnings from the Window manager (under gnome desktop of linux). These warnings are recorded only maybe a couple of times a week on my office computer, so I was quite surprised. 

I ran 

  sort ~/.xsessoin-errors | uniq -c

and removed non-related lines and sanitized some characters to summaraized the errors.
(I also replaced the Japanese message "Warning from the Window Manager" into 
English equivalent manually in the editor.)

Attached is the result of the above processing produeced from ~/.xsession-errors.

The test (make mozmill) seems to show many dialog for thunderbird configuration set up and other usage,  and I wonder if they are related to it. However, "(Write: (no)" indicates the window title when a new composition window is opened, it seems. So I am not entirely sure where the timestamp 0 crept into the event for thunderbird, but there are enough events that caused the window manager during the "make mozmill" run to generate these warnings.
(Well, at least, libx11 has been modified to remove the XIM bug.)

I am not sure if this many warnings are related to the particular setup configured for "make mozmill", or these warnings are to be expected but
my office usage does not invoke enough dailog display and such to produce this
many warnings. (I do write many e-mails on my office PC, but I don't think I see so many warnings in my ~/.xsession-errors. Maybe under certain circumstances, TB does pickup this 0-timestamp somehow on startup and uses to it when it creates a new window (In my office PC, it picks up non-zero timestamp, and sticks to it as long as I run the same copy?)

Just FYI to think about subtle issues that may arise in the future.

TIA
Comment 60 Karl Tomlinson (:karlt) 2012-11-12 10:47:44 PST
(In reply to ISHIKAWA, chiaki from comment #59)
> I am not sure if this many warnings are related to the particular setup
> configured for "make mozmill",

A zero timestamp is probably expected when running mozmill because there are no user events to get another timestamp and zero is the right timestamp to use when there is no known user event timestamp.
Comment 61 ISHIKAWA, Chiaki 2012-11-12 17:49:47 PST
(In reply to Karl Tomlinson (:karlt) from comment #60)
> A zero timestamp is probably expected when running mozmill because there are
> no user events to get another timestamp and zero is the right timestamp to
> use when there is no known user event timestamp.

I see. Unless there are serious issues, I won't be worried about this too much, then.
Comment 62 ISHIKAWA, Chiaki 2013-01-10 21:16:33 PST
Should we change the status to CONFIRMED or INVALID?
(The symptoms were confirmed, but the causes turn out to be
outside of TB / FF: the vulnerability caused by bogus timestamps exists in the
library, though.)

Leaving the status "UNCONFIRMED" is not quite right IMHO.
So any appropriate status designation is OK but not UNCONFIRMED.

TIA
Comment 63 ISHIKAWA, Chiaki 2013-07-08 19:02:17 PDT
I think there are still many people affected by bogus timestamp issues.

https://bugzilla.mozilla.org/show_bug.cgi?id=779900#c39

Your suggestion in Comment 56
--- snip
The other possible improvement that perhaps could be made to GDK (version 3)
is that it always consider some events as having the most recent timestamp,
instead of always comparing with the last timestamp.  When these events are
received they always reset any possibly buggy timestamp.  Events from the
server should always have accurate timestamps.  Events from the window manager
or other apps may be a little in the past and so should still be compared with
the last timestamp.  Events from XPutBackEvent would often (or perhaps
usually) look like server events, but at least any buggy timestamp would soon
be overwritten.

This last option may be a sensible approach even if there were no
buggy timestamps, because it would mean that an application would still get
its timestamp(s) updated even if it hasn't been used for 25 days.
--- end snip

or the follow-up in comment 57 is really necessary for the
root cause of the problem (robustness against incorrect timestamps from
buggy clients) IMHO.

Note You need to log in before you can comment on or make changes to this bug.