Closed
Bug 495185
Opened 15 years ago
Closed 15 years ago
contentWhittle doesn't cope with messages that include unicode non-breaking spaces
Categories
(MailNews Core :: Database, defect)
MailNews Core
Database
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: davida, Assigned: davida)
Details
Attachments
(2 files)
1.22 KB,
patch
|
asuth
:
review+
Bienvenu
:
superreview+
|
Details | Diff | Splinter Review |
4.82 KB,
text/plain
|
Details |
I was noticing that some messages end up not de-quoted correctly by the contentWhittle code.
The following patch has code that works in my test email, which I'll attach. The test was made to work with the code, which isn't the right way to do things, but the relationship between the test format and the original email is puzzling. Specifically, the key lines in the test email are:
"Mark Banner <bugzilla@standard8.plus.com> wrote:\n\xc2\xa0\n> We haven't nailed anything down in detail yet, depending on how we are \n> when we get to beta 3, my proposal would be to do the final freeze two"
note that the second line is \xc2\xa0, which is (so i learn) a unicode sequence meaning non-breaking space.
When dealing with that mail, the main for each loop in contentWhittle gets a line that is just "\xa0", not the "\xc2\xa0" that I'd expect.
So, the code seems to work in practice, the test matches the code, but I don't understand why the test shouldn't have the two-character sequence.
Attachment #380049 -
Flags: review?(bugmail)
Assignee | ||
Comment 1•15 years ago
|
||
Comment 2•15 years ago
|
||
(In reply to comment #0)
> When dealing with that mail, the main for each loop in contentWhittle gets a
> line that is just "\xa0", not the "\xc2\xa0" that I'd expect.
0xc2 0xa0 is the UTF-8 encoding of 0xa0, the nbsp character. Once your real underlying character needs the high order bit set, it needs the second byte.
ipython:
In [30]: unichr(0xa0).encode('utf-8')
Out[30]: '\xc2\xa0'
Comment 3•15 years ago
|
||
Comment on attachment 380049 [details] [diff] [review]
patch with test
>diff --git a/mailnews/db/gloda/modules/fundattr.js b/mailnews/db/gloda/modules/fundattr.js
>- if (!line)
>+ if (!line || (line == "\xa0")) /* unicode non breaking space */
as someone who just thought about this code, any thoughts on just doing:
if (!line.trim())
then any empty or all-whitespace line is treated as empty:
js> z = "\xa0"
�
js> !z.trim()
true
Please make that change unless you have a reason not to.
>+ bode: [[false, "Mark Banner <bugzilla@standard8.plus.com> wrote:"],
>+ [false, "\xa0"],
>+ [false, "> We haven't nailed anything down in detail yet, depending on
You are a lucky man... messageGenerator doesn't perform any type of escaping/encoding on its output, and our default charset in use for these tests is ISO-8859-1, and 0xa0 is NBSP in ISO-8859-1, so this works. Of course, it's probably no accident that 0xa0 is nbsp in unicode, but the lucky thing was that you didn't have to deal with making messageGenerator be cool.
Please add a comment before that test-case or that line indicating that the \xa0 is fine as long as the charset in use is ISO-8859-1 or the 0xa0 otherwise doesn't need to be encoded. That way when that test inevitably explodes because of some change, the person will have a better chance of knowing why it broke.
r+ with those changes.
Attachment #380049 -
Flags: superreview?(bienvenu)
Attachment #380049 -
Flags: review?(bugmail)
Attachment #380049 -
Flags: review+
Updated•15 years ago
|
Attachment #380049 -
Flags: superreview?(bienvenu) → superreview+
Comment 4•15 years ago
|
||
Comment on attachment 380049 [details] [diff] [review]
patch with test
sr=me, modulo asuth's comments.
Comment 5•15 years ago
|
||
pushed: http://hg.mozilla.org/comm-central/rev/3016655dbf53
This got lost for a bit because of general bugzilla workflow issues. davida, when possible, please mark things checkin-needed once reviews are complete.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Updated•15 years ago
|
Assignee: bugmail → david.ascher
You need to log in
before you can comment on or make changes to this bug.
Description
•