Closed Bug 417899 Opened 16 years ago Closed 16 years ago

Bonsai doesn't handle UTF-8 data

Categories

(Webtools Graveyard :: Bonsai, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 395003

People

(Reporter: zwnj, Assigned: tara)

References

(Blocks 1 open bug, )

Details

As you can see on the page of the URL, line-break algorithm doesn't count unicode characters, and just count raw bytes.  This causes:
- Less non-ascii characters on each line, and
- Breaking UTF-8 sequences. (like the last line-break on the URL)
Also the encoding of the page content are not set to UTF-8 too.
before anyone is crazy enough to try to "fix" behnam's bug. please keep in mind the *goal* of this function, which is to get a fixed length string.

1. 80 Chinese "characters" are generally twice as wide (physical width, not byte encoding) as 80 ASCII characters.
2. some characters contribute no width (e.g. ZWNJ).
3. RTL markings (and pops) can't safely be split anyway.

I'd propose to only do splitting if all characters in a line are in the Latin-1 character set of UTF8.

An alternative is to replace line breaking with browser requested wrapping, and with each table row containing two table cells, each with a single line.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
Product: Webtools → Webtools Graveyard
You need to log in before you can comment on or make changes to this bug.