MUC display names not properly UTF-8 decoded (double deocded?)
Categories
(Chat Core :: XMPP, defect)
Tracking
(thunderbird_esr78 unaffected, thunderbird89 fixed)
Tracking | Status | |
---|---|---|
thunderbird_esr78 | --- | unaffected |
thunderbird89 | --- | fixed |
People
(Reporter: freaktechnik, Assigned: freaktechnik)
References
(Regression)
Details
(Keywords: regression)
Attachments
(3 files)
5.91 KB,
image/png
|
Details | |
5.70 KB,
image/png
|
Details | |
48 bytes,
text/x-phabricator-request
|
wsmwk
:
approval-comm-beta+
|
Details | Review |
STR:
- Join an XMPP MUC where people have display names with UTF-8 characters
Expected result:
The participants list and tooltips correctly show the UTF-8 characters.
Actual result:
The UTF-8 characters appear double encoded or similar.
This might also be an issue on the UI side and not in the protocol. But because it affects both the participants list and tooltips I'm assuming it's on the protocol layer.
Assignee | ||
Comment 1•3 years ago
|
||
Comment 2•3 years ago
|
||
What's different about those two images? Are they different rooms or does this only happen sometimes? (Or is this a regression?)
Assignee | ||
Comment 3•3 years ago
|
||
The broken one is on c-c, the ok one is on 78 ESR. According to mozregression the range is https://hg.mozilla.org/comm-central/pushloghtml?fromchange=0bc379dfb5184277046debb04b0b7aec4796f841&tochange=01b42daa8a2675fd108dcfd4873dd75970425e2c so I suspect Bug 1647252
Comment 4•3 years ago
|
||
Bug 1647252 was going to be my guess unfortunately. :(
It looks like the SAX parser in there doesn't some Unicode stuff manually, maybe it is interfering.
Any chance you could grab some protocol logs to get the raw XML?
Assignee | ||
Comment 5•3 years ago
|
||
c-c:
[30.04.21, 16:12:36 MESZ] LOG (@ prpl-jabber: _logReceivedData resource:///modules/xmpp-xml.jsm:402)
received:
<presence xmlns="jabber:client" from="tatoeba@chat.tatoeba.org/Ãtienne" to="freaktechnik@jabber.lugs.ch/30547725431619791947512145" xml:lang="en" id="5edeae8670514a6189037da4e2a1ded6">
<c xmlns="http://jabber.org/protocol/caps" ver="ZyB1liM9c9GvKOnvl61+5ScWcqw=" node="https://poez.io" hash="sha-1"/>
<x xmlns="vcard-temp:x:update">
<photo xmlns="vcard-temp:x:update"/>
</x>
<idle xmlns="urn:xmpp:idle:1" since="2021-04-13T10:09:14.786178+00:00"/>
<occupant-id xmlns="urn:xmpp:occupant-id:0" id="wNZPCZIVQ51D/heZQpOHi0ZgHXAEQonNPaLdyzLxHWs="/>
<x xmlns="http://jabber.org/protocol/muc#user">
<item xmlns="http://jabber.org/protocol/muc#user" jid="ejls@ejls.fr/poezio-8kIy" affiliation="member" role="participant"/>
</x>
</presence>
[30.04.21, 16:12:36 MESZ] DEBUG (@ prpl-jabber: onPresenceStanza resource:///modules/xmpp-base.jsm:2192)
Received presence stanza for tatoeba@chat.tatoeba.org/Ãtienne
release:
[4/30/21, 4:10:19 PM GMT+2] DEBUG (@ prpl-jabber: _logReceivedData resource:///modules/xmpp-xml.jsm:390)
received:
<presence xmlns="jabber:client" from="tatoeba@chat.tatoeba.org/Étienne" to="freaktechnik@jabber.lugs.ch/Thunderbird" xml:lang="en" id="5edeae8670514a6189037da4e2a1ded6">
<c xmlns="http://jabber.org/protocol/caps" ver="ZyB1liM9c9GvKOnvl61+5ScWcqw=" node="https://poez.io" hash="sha-1"/>
<x xmlns="vcard-temp:x:update">
<photo xmlns="vcard-temp:x:update"/>
</x>
<idle xmlns="urn:xmpp:idle:1" since="2021-04-13T10:09:14.786178+00:00"/>
<occupant-id xmlns="urn:xmpp:occupant-id:0" id="wNZPCZIVQ51D/heZQpOHi0ZgHXAEQonNPaLdyzLxHWs="/>
<x xmlns="http://jabber.org/protocol/muc#user">
<item xmlns="http://jabber.org/protocol/muc#user" jid="ejls@ejls.fr/poezio-8kIy" affiliation="member" role="participant"/>
</x>
</presence>
[4/30/21, 4:10:19 PM GMT+2] DEBUG (@ prpl-jabber: onPresenceStanza resource:///modules/xmpp-base.jsm:2196)
Received presence stanza for tatoeba@chat.tatoeba.org/Étienne
Comment 6•3 years ago
|
||
Ping -- any idea what could be going on here? I suspect adding a unit test for this might help debug it.
Assignee | ||
Comment 7•3 years ago
•
|
||
Adding this decode step before writing to the parser fixes it:
new TextDecoder().decode(new Uint8Array(Array.from(data, c => c.charCodeAt(0))))
IRC uses the scriptable unicode converter (see irc.jsm#805)
Comment 8•3 years ago
|
||
We seem to do something similar in the IRC code: https://searchfox.org/comm-central/source/chat/protocols/irc/irc.jsm#825-837 (I thought this happened in the socket code). So it seems the old SAX parser handled going from raw bytes (as UTF-8) -> Unicode, while the new one assumes it is getting Unicode-decoded text already?
Comment 9•3 years ago
|
||
(In reply to Patrick Cloke [:clokep] from comment #8)
So it seems the old SAX parser handled going from raw bytes (as UTF-8) -> Unicode, while the new one assumes it is getting Unicode-decoded text already?
Right, The old SAX parser gets BinaryString (string in idl) in OnDataAvailable, and emits UTF8 string (AString in idl) in startElement/endElement.
(In reply to Martin Giger [:freaktechnik] from comment #7)
Adding this decode step before writing to the parser fixes it:
new TextDecoder().decode(new Uint8Array(Array.from(data, c => c.charCodeAt(0))))
I think this is the right fix, please send a patch, thanks. Can put it in onDataReceived
of xmpp-session.jsm
Assignee | ||
Comment 10•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 11•3 years ago
|
||
(In reply to Ping Chen (:rnons) from comment #9)
I think this is the right fix, please send a patch, thanks. Can put it in
onDataReceived
of xmpp-session.jsm
I decided to put it in xmpp-xml.jsm since that is actually tested and I didn't want to explore xmpp-session to figure out how to test it for this patch.
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 12•3 years ago
|
||
Pushed by thunderbird@calypsoblue.org:
https://hg.mozilla.org/comm-central/rev/6e3047fcecc7
Decode byte string to UTF-8 for XMPP parser. r=clokep
Comment 13•3 years ago
|
||
Comment on attachment 9219770 [details]
Bug 1708695 - Decode byte string to UTF-8 for XMPP parser. r=clokep
[Approval Request Comment]
Regression caused by (bug #): 1647252
User impact if declined: unreadable user display names
Testing completed (on c-c, etc.): tests included in patch
Risk to taking this patch (and alternatives if risky):
Comment 14•3 years ago
|
||
Comment on attachment 9219770 [details]
Bug 1708695 - Decode byte string to UTF-8 for XMPP parser. r=clokep
[Triage Comment]
Approved for beta
Comment 15•3 years ago
|
||
bugherder uplift |
Thunderbird 89.0b4:
https://hg.mozilla.org/releases/comm-beta/rev/e228e9d7d3f5
Description
•