Closed Bug 955420 Opened 10 years ago Closed 10 years ago

Default normalizedName normalization is too aggressive

Categories

(Chat Core :: General, defect)

defect
Not set
trivial

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aleth, Assigned: aleth)

References

Details

Attachments

(1 file, 2 obsolete files)

*** Original post on bio 1983 at 2013-05-30 09:36:00 UTC ***

The current normalize() default in jsProtoHelper used for the various normalizedNames is

function normalize(aString) aString.replace(/[^a-z0-9]/gi, "").toLowerCase()

This will cause problems for contacts and conversations with names that don't contain any such characters, e.g. помощь, but also in less dramatic cases.

Currently this default is overridden by js-xmpp and js-irc, but the latter will change with bug 954887 (bio 1454). Twitter doesn't, but the only allowed character (for twitter usernames) that is stripped is the underscore, so that should be fairly safe (though I don't know why we remove it).

One solution would be to mark this bug WONTFIX and insist that protocols which allow for unicode names provide their own normalization implementation. Another would be to change the default, to be more tolerant and/or to do some escaping.
Blocks: 954887
Attached patch Patch (obsolete) — Splinter Review
*** Original post on bio 1983 as attmnt 3013 at 2013-11-04 14:19:00 UTC ***

Since we now encode characters which are problematic for log file names in encodeName() in logger.js, we can make the default normalize() in jsProtohelper more lenient. As twitter currently uses the jsProtoHelper normalize(), this patch retains the current normalize() for twitter for backwards compatibility.
Attachment #8354794 - Flags: review?(florian)
Assignee: nobody → aleth
Status: NEW → ASSIGNED
*** Original post on bio 1983 at 2013-11-04 14:39:09 UTC ***

Marking as trivial as this patch in effect only changes the default example, which (after this patch) no protocol is actually using.
Severity: normal → trivial
Comment on attachment 8354794 [details] [diff] [review]
Patch

*** Original change on bio 1983 attmnt 3013 at 2013-11-04 14:56:11 UTC ***

Twitter can't even have special characters. I believe it should just be toLowerCase in that instance.
Attachment #8354794 - Flags: review-
*** Original post on bio 1983 at 2013-11-04 14:57:49 UTC ***

(In reply to comment #3)
> Comment on attachment 8354794 [details] [diff] [review] (bio-attmnt 3013) [details]
> Patch
> 
> Twitter can't even have special characters. I believe it should just be
> toLowerCase in that instance.

As I said in the comment, changing the current normalization for twitter would mean *nobody* will find their twitter logs anymore.
Attached patch Patch (obsolete) — Splinter Review
*** Original post on bio 1983 as attmnt 3016 at 2013-11-04 15:51:00 UTC ***

Inlines normalize().
Attachment #8354797 - Flags: review?(florian)
Comment on attachment 8354794 [details] [diff] [review]
Patch

*** Original change on bio 1983 attmnt 3013 at 2013-11-04 15:51:49 UTC was without comment, so any subsequent comment numbers will be shifted ***
Attachment #8354794 - Attachment is obsolete: true
Attachment #8354794 - Flags: review?(florian)
Comment on attachment 8354797 [details] [diff] [review]
Patch

*** Original change on bio 1983 attmnt 3016 at 2013-11-04 17:50:40 UTC was without comment, so any subsequent comment numbers will be shifted ***
Attachment #8354797 - Flags: review?(clokep)
*** Original post on bio 1983 at 2013-11-04 18:54:24 UTC ***

(In reply to comment #4)
> As I said in the comment, changing the current normalization for twitter would
> mean *nobody* will find their twitter logs anymore.

A half-way house for twitter would be to strip spaces and then lowercase. This would however still break things for everyone with a special charcacter in their username. Imho that's more likely to happen than the edge cases of the current normalization (the ones I could think of were 1) someone having two usernames which with the present normalization give the same normalizedName (e.g. "a" and "a_"), and 2) someone having a username containing only special characters like https://twitter.com/____).
*** Original post on bio 1983 at 2013-11-05 02:47:17 UTC ***

(In reply to comment #6)
> (In reply to comment #4)
> > As I said in the comment, changing the current normalization for twitter would
> > mean *nobody* will find their twitter logs anymore.
> 
> A half-way house for twitter would be to strip spaces and then lowercase.
Spaces are invalid in Twitter usernames. Only [a-zA-Z0-9_] is valid, I believe.
*** Original post on bio 1983 at 2013-11-05 10:06:00 UTC ***

(In reply to comment #7)
> > > mean *nobody* will find their twitter logs anymore.
> > 
> > A half-way house for twitter would be to strip spaces and then lowercase.
> Spaces are invalid in Twitter usernames. Only [a-zA-Z0-9_] is valid, I believe.

The conversation name includes a space before "timeline".
Comment on attachment 8354797 [details] [diff] [review]
Patch

*** Original change on bio 1983 attmnt 3016 at 2013-11-06 17:31:05 UTC ***

>diff --git a/chat/protocols/twitter/twitter.js b/chat/protocols/twitter/twitter.js
>@@ -365,16 +366,18 @@ function Account(aProtocol, aImAccount)
>+  normalize: function(aString) aString.replace(/[^a-z0-9]/gi, "").toLowerCase(),
Isn't normalization for twitter just toLowerCase()? Is this for backwards compatibility?
Attachment #8354797 - Flags: review?(clokep) → review-
*** Original post on bio 1983 at 2013-11-06 17:36:45 UTC ***

(In reply to comment #9)
> >diff --git a/chat/protocols/twitter/twitter.js b/chat/protocols/twitter/twitter.js
> >@@ -365,16 +366,18 @@ function Account(aProtocol, aImAccount)
> >+  normalize: function(aString) aString.replace(/[^a-z0-9]/gi, "").toLowerCase(),
> Isn't normalization for twitter just toLowerCase()? Is this for backwards
> compatibility?

The correct normalization for twitter would be toLowerCase. This patch doesn't change the normalization actually used by twitter (currently inherited from jsProtoHelper). If we feel we should change the normalization for twitter we need to worry about backwards compatibility as explained with examples in earlier comments.
Attached patch PatchSplinter Review
*** Original post on bio 1983 as attmnt 3019 at 2013-11-06 17:42:00 UTC ***

Add explanatory comment to normalize () in twitter.
Attachment #8354800 - Flags: review?(clokep)
Comment on attachment 8354797 [details] [diff] [review]
Patch

*** Original change on bio 1983 attmnt 3016 at 2013-11-06 17:42:05 UTC was without comment, so any subsequent comment numbers will be shifted ***
Attachment #8354797 - Attachment is obsolete: true
Attachment #8354797 - Flags: review?(florian)
Comment on attachment 8354800 [details] [diff] [review]
Patch

*** Original change on bio 1983 attmnt 3019 at 2013-11-08 11:28:38 UTC ***

I'm OK with the changes here, but for the record I'd rather do it the "right" way and not care about the old logs OR migrate the old logs.
Attachment #8354800 - Flags: review?(clokep) → review+
Whiteboard: [checkin-needed]
*** Original post on bio 1983 at 2013-11-10 03:19:55 UTC ***

I think all the issues we've had are fixed with this now, is that correct aleth?

Thanks for dealing with this gross bug. :)

http://hg.instantbird.org/instantbird/rev/a69666e461e9
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: [checkin-needed]
Target Milestone: --- → 1.5
Depends on: 1058360
You need to log in before you can comment on or make changes to this bug.