I have been working on a review in reviewboard today: https://reviewboard.mozilla.org/r/20507/#comment26171 Some of the code I have commented about contains emojis. When I published the review and went to the bug in bugzilla, I noticed that all my comments that came after the emoji were missing. The attached screenshot shows my comment in reviewboard The bug is bug 835896 comment 84
I'm a huge fan of emoji and personally consider busted Emoji support a P1 bug :) But the real justification for P1 is that apparently Unicode isn't working properly and this can lead to silent loss of updates in Bugzilla. This needs to get fixed. FWIW, we do have https://hg.mozilla.org/hgcustom/version-control-tools/file/783405fe84cb/hgext/reviewboard/tests/test-unicode.t. I wonder if the encoding issue is at the browser level. We should be able to reproduce this in a Selenium test.
Priority: -- → P1
Oh, so the code itself (not the Review Board comment) contains emoji. That's likely what the issue is here. I just fixed another Unicode issue (bug 1213884), so I might as well do this one too.
Assignee: nobody → gps
Status: NEW → ASSIGNED
I can reproduce this. Surprisingly it doesn't reproduce when using Chinese or Japanese code points. Only after I plugged in your exact code did I get it to repro.
This is almost certainly an issue with rbbz.diffs.render_comment_plain(). The suspect part is where we feed each diff chunk into HTMLParser.HTMLParser.unescape(). wat. I have a feeling smacleod can shed some light onto the weirdness that is rbbz.diffs.
Note that BMO itself does not yet support emojis. They were implemented upstream (in apparently a somewhat hacky but functional way) in bug 405011. It hasn't been backported specifically to BMO, but we're working on merging in all upstream changes this quarter. Also see bug 1162122, which is the same issue, although it sounds like at least we don't get an ISE anymore. I'll dupe that one over to here, despite it being older, since there's more investigation here.
There /might/ be 2 separate issues here: 1) We're not properly creating a Unicode/UTF-8 string to post to Bugzilla 2) Bugzilla isn't recording/rendering said string properly After looking at our code, I'm not convinced #1 isn't a problem.
#1 could indeed be a problem, but I'm positive that #2 will prevent this from working properly even if #1 is resolved. But worth investigating so that it'll all work properly once #2 is fixed.
(In reply to Gregory Szorc [:gps] from comment #4) > This is almost certainly an issue with rbbz.diffs.render_comment_plain(). > The suspect part is where we feed each diff chunk into > HTMLParser.HTMLParser.unescape(). wat. > > I have a feeling smacleod can shed some light onto the weirdness that is > rbbz.diffs. That whole file is kind of a hack tbh. I believe we call unescape on the diff information as it has been escaped to be displayed as part of the html. I don't remember there being a way to just pull the plain unescaped diff data out nicely to do the plain text formatting without duplicating even more of Review Board's code.
Product: Developer Services → MozReview
I feel like we fixed this as part of another bug. Either way, I'm not actively working on it.
Assignee: gps → nobody
Status: ASSIGNED → NEW
Status: NEW → RESOLVED
Last Resolved: 8 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.