Open Bug 954572 Opened 11 years ago Updated 2 years ago

fewer false positives for emoticon detection

Categories

(Thunderbird :: Instant Messaging, defect)

defect

Tracking

(Not tracked)

People

(Reporter: bugzilla, Unassigned)

References

Details

(Whiteboard: [1.6-wanted])

Attachments

(2 files)

*** Original post on bio 1139 by Eric Kow <eric.kow AT gmail.com> at 2011-11-03 02:52:00 UTC ***

I wonder if there is some sort of ultimate-finessed emoticon detection algorithm InstantBird could use that wouldn't do silly things like translate "wxDisplay" into "w[xD]isplay"...

Barring that, any chance InstantBird could err a bit more on the side of being overly conservative about what it considers to be an emoticon vs regular text?  I hazard it's less annoying to the user if InstantBird misses one than if it oversteps...

Thanks!  I'm loving the careful attention to UI detail IB has shown so far
*** Original post on bio 1139 at 2011-11-03 09:32:41 UTC ***

Hi,
thanks for reporting this. So you finally created a Bugzilla account with us? I saw you pondering this on Twitter if I'm not mistaken.


Here's where the regexps for smiley-detection are created:

http://hg.instantbird.org/instantbird/file/351bf781388d/chat/modules/imSmileys.jsm#l131


I wonder how sophisticated this should become? We could either change the regexps to match less (or at least differently [1]) or double check[2] that a detected smiley is really one before we replace it.


[1] e.g. if the smiley is made of letters and in between other letters (could be regexp-ified)
[2] if it is part of a word (we could extract the letters around it and use the spellchecker to see if it's a correct word: in my opinion it's unlikely to find something and probably not worth doing the work) or whatever is possible to decide if the surrounding part of the message suggests that this is a smiley or not...
Status: UNCONFIRMED → NEW
Ever confirmed: true
*** Original post on bio 1139 at 2011-11-03 09:49:53 UTC ***

See bug 953653 (bio 207) for a somewhat related discussion, especially around comment 3.
*** Original post on bio 1139 at 2011-11-03 10:33:48 UTC ***

(In reply to comment #1)
> Here's where the regexps for smiley-detection are created:
> 
> http://hg.instantbird.org/instantbird/file/351bf781388d/chat/modules/imSmileys.jsm#l131
The code could use a few more comments. :P

> I wonder how sophisticated this should become? We could either change the
> regexps to match less (or at least differently [1]) or double check[2] that a
> detected smiley is really one before we replace it.
The easy / dumb solution is to just look for a word break (\B) on either side of each emoticon, but this does miss the case of something like "XDXD" (from bug 953653 (bio 207) comment 3). But perhaps something like:
\B(...the list of emoticon expressions...)+\B could be used so it will only match a standalone emoticon OR a set of ONLY emoticons. (This would still miss the case of an emoticon hanging off the last word:) <-- like that.)
*** Original post on bio 1139 by Eric Kow <eric.kow AT gmail.com> at 2011-11-03 11:01:35 UTC ***

Benedikt, indeed! It's such a tiny amount of friction, I wonder what makes people like me resist.  :-)

Patrick's idea of detection emoticon sequences as a single block surrounded by mandatory whitespace seems like a helpful approach.  It could be softened by looking for whitespace or ^/$
*** Original post on bio 1139 at 2011-11-03 11:21:29 UTC ***

I don't think it's possible to find a solution that works all the time, so it would probably be helpful to have concrete examples of where false positive are annoying so that we are sure to fix the cases that really matter.

The cases I know about:
 - when pasting some code. I saw yesterday something like this: SomeClass::ShowSomething (C++ code fragment that can match the :S emoticon)
 - when discussing maths, and writing equations. f(x)=3x+5 If we have MSN style emoticons in the theme, (x) will be matched. We have usually avoided that problem by not putting MSN style emoticons in our default themes.
 - at boundaries of system messages (bug 953752 (bio 309) is about this specific case).
 - the : symbol near a closing parenthesis, for example:
     blah blah blah (or did you mean blah blah?): blah blah
   This will match ):. This case seems silly, but it frequently annoys me.

Are there other cases?
*** Original post on bio 1139 at 2011-11-03 12:03:29 UTC ***

(In reply to comment #4)
> Patrick's idea of detection emoticon sequences as a single block surrounded by
> mandatory whitespace seems like a helpful approach.  It could be softened by
> looking for whitespace or ^/$
Just a heads up, a word boundary (\b) "Matches a word boundary, such as a space, a newline character, punctuation character or end of string. (Not to be confused with [\b].)" which is actually softer than whitespace (\s), you could also use a non-word character (\W).

Anyway, I agree we should make some test cases about what to match, what not to match, fix our expressions + add some tests.
Attached patch Tests WIP v1Splinter Review
*** Original post on bio 1139 as attmnt 1590 at 2012-06-12 02:49:00 UTC ***

This is some test cases I began working on. Note that this test fails right now! I'd appreciate some feedback, ideas for other edge cases that annoy us, etc. before looking at using a more complicate algorithm.
Attachment #8353347 - Flags: feedback?
Comment on attachment 8353347 [details] [diff] [review]
Tests WIP v1

*** Original change on bio 1139 attmnt 1590 at 2012-06-12 08:42:09 UTC ***

The test looks good.

More cases:
(comment 0) "wxDisplay": "wxDisplay"
(comment 3) "xDxD": createSmileyTag("xD") + createSmileyTag("xD")

I'm wondering if you would want to add a way to mark cases that aren't supported yet as TODO so that we can check-in the test sooner, and avoid regressing cases that already work with any other patch :).
Attachment #8353347 - Flags: feedback? → feedback+
*** Original post on bio 1139 at 2012-06-12 11:46:23 UTC ***

(In reply to comment #8)
> Comment on attachment 8353347 [details] [diff] [review] (bio-attmnt 1590) [details]
> Tests WIP v1
> I'm wondering if you would want to add a way to mark cases that aren't
> supported yet as TODO so that we can check-in the test sooner, and avoid
> regressing cases that already work with any other patch :).

We can comment them out and put a TODO tag above them, would that work? :)
*** Original post on bio 1139 at 2012-06-12 12:14:21 UTC ***

(In reply to comment #9)

> We can comment them out and put a TODO tag above them, would that work? :)

What about todo_check_eq? ;)
*** Original post on bio 1139 at 2012-06-12 12:23:39 UTC ***

Another case: "A sentence (and some text in parenthesis:)"
i.e. I think if possible we should separate out opening/closing brackets and not match smileys in that case.
*** Original post on bio 1139 at 2012-06-12 12:43:12 UTC ***

I think:
(?:^|\W)(:\)|\^\^|xD|XD|:S|\):|\(x\)|o_o)+(?=\W|$)

in combination with the URL check from bug 953653 (bio 207) should work pretty well.

Tried with:
:)
http://instantbird.org
:)
^^^^
^^:)
SomeClass::ShowSomething
f(x)=3x+5
blah blah blah (or did you mean blah blah?): blah blah
blah blah blah (or did you mean blah blah): blah blah
wxDisplay
xDxD
blah some text :) some more text?
Are you sure about that :)?
Are you sure about that :).
A sentence (and some text in parenthesis:)
http://lxde.org/wiki/Planet_LXDE
http://foo.com/XD/bar
surrounding http://lxde.org/wiki/Planet_LXDE text
surrounding http://foo.com/XD/bar test
surrounding XD:) text
surrounding :) text
surrounding XD text XDXD
aaa teXDst text
http://foo.com/fr/bar :)
XD test
o_o
This is some ^^:):):) text!

I'll update the algorithm to use this.
Assignee: nobody → clokep
Status: NEW → ASSIGNED
Whiteboard: [1.3-wanted]
Blocks: 953752
*** Original post on bio 1139 at 2012-11-02 00:12:35 UTC ***

Too late to work on this for 1.3, replacing 1.3-wanted by 1.4-wanted in the whiteboard.
Whiteboard: [1.3-wanted] → [1.4-wanted]
Attached image Screenshot
*** Original post on bio 1139 as attmnt 2061 at 2012-11-06 14:00:00 UTC ***

For people using some of the larger emoticon sets, this will be very helpful ;)
*** Original post on bio 1139 at 2013-03-24 13:53:51 UTC ***

I'm not actively working on this.
Status: ASSIGNED → NEW
*** Original post on bio 1139 by qlum <qlumreg AT gmail.com> at 2013-03-31 20:22:31 UTC ***

Another situation where I personally find a lot of false positive emotes is when you have something between ( ) and after that use :
For example in a minecraft server I am on when someone checks on its irc how many players are online the server would return something like this: 20:38:46 - MystiaTest: Online (2/20): NikkoTheNeko qlum
If it would check if something is between ( ) it would fix the problem.
*** Original post on bio 1139 at 2013-03-31 21:37:57 UTC ***

(In reply to comment #17)

I've sometimes been annoyed by this case too. I think just removing ): from the default theme would be a sane "solution".
*** Original post on bio 1139 at 2013-04-09 14:49:50 UTC ***

s/1.4-wanted/1.5-wanted/ in the whiteboard for bugs that probably won't happen before 1.4 is released.
Whiteboard: [1.4-wanted] → [1.5-wanted]
*** Original post on bio 1139 at 2013-11-02 17:41:54 UTC ***

Mass changing bugs from 1.5-wanted -> 1.6-wanted.
Whiteboard: [1.5-wanted] → [1.6-wanted]
Assignee: clokep → nobody
Component: Conversation → Instant Messaging
Product: Instantbird → Thunderbird
Version: trunk → unspecified
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: