Created attachment 274020 [details] testcase The recent changes in bug 255990 have created line break opportunities in a lot of places where we don't want them. Some of the worst problems can be fixed by taking in two characters of context on either side and not breaking when there's a space there. a) We should never break at punctuation when there's an adjacent space. b) We should not break at punctuation when there's a space one character over unless the intervening character is CJK.
Did you check bug 389056 (and bug 389595)? I think that SLASH should be breakable for file paths and URLs. But for other punctuation (not all), I'm working for it on bug 389056.
Or, your test cases have only one SLASH in a word. In English (and also other Western languages), is there a case one word has two or more SLASH? If it is not, we can fix the testcase by the difference.
Generally, I would prefer not to allow linebreaking after a slash even in strings such as "in/on". There just _may_ exist cases where it is significant to know whether there is a space after the slash or not. Two or even more slashes in one string may occur in forms, when you are supposed to pick one of the options given, for example: Marital status: single/cohabiting/married/widowed However, that should be rare (and rather bad style if used in normal text). As much as I dislike the idea of allowing line-breaks after slashes in general, I suppose there is no practical way to differentiate between the example and a URL or file-path. So probably breaks should be allowed in a string that includes two or more slashes -- or better yet, after the second slash (and any following slashes) but not after the first. Seeing that there is no space after the first slash should give the reader a hint that probably there were no spaces after the other slashes either (although that would be deceiving in URLs that end to a slash, but I guess that's something we just have to live with). On the other hand, in short file-paths, such as /etc/apt a line-break after the second slash would seem rather pointless. Perhaps a break could be allowed only after the third slash (or any later) in strings where the first slash is preceded by a space.
Actually, one _could_ reduce the deceptiveness of the URLs that end to a slash, by specifically prohibiting line-breaks both after the last slash and after the following space(s). That would mean that if there was a slash at the end of a line, the reader could always expect the same string to continue on the next line. It would be unconventional, but as long as the behavior was consistent, it might be justifiable. However, changing the default break behavior of space might trigger some new problems. I'm not sure which option would be the lesser evil in this case.
I just realized that it could also be confusing if a line-break was allowed after the last slash in file-paths such as "/etc/local/bin". When the last part of the string is also a regular word in the context language (as "bin" is a word in English), it may not always be clear whether the part separated by a line-break belongs to the string or to the context. This kind of confusion may be more likely with some URLs if there isn't either a filename extension or a slash at the end of the URL. In the example path, the least harmful break point would perhaps be after the second slash: /etc/ local/bin This way, the presence of slashes on both lines would give the reader a hint that the parts did probably belong to the same string even though they are separated by an unconventional line-break. Thus, line-breaks should not be allowed after the first slash nor after the last, but any other slash might be considered to offer a break opportunity.
Prioritized breaking seems like a much simpler solution to these problems than complicated heuristics to determine whether things are file paths.
See http://www.unicode.org/reports/tr14/ for line breaking rules that should be applied, in particular LB13. The current behavior is particularly annoying with the French "thin space" (U+2009). FYI, WebKit-based browsers do not have this problem.
(In reply to comment #7) > The current behavior is particularly annoying with > the French "thin space" (U+2009). FYI, WebKit-based browsers do not have this > problem. Please file a new bug for a line breaking problem with particular characters. This bug isn't for such problems.
Hmm, currently, line break opportunities are marked in each CompressedGlyph: https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFont.h#726-729,732-733 However, unfortunately, there is no room to store if each opportunity is high or low priority. So, it may be impossible to implement prioritized line breaker which is available with too narrow boxes. But I guess that we can improve that GetJISx4051Breaks would not mark some opportunities in every word as breakable with some additional checks.
Ah, FLAG_CHAR_IS_SPACE is set only when the character is an ASCII space. So, then, we can use 3 bits for the flags, e.g., 0x0: non-breakable 0x1: breakable (high priority) 0x2: space, non-breakable 0x3: space, breakable (high priority) 0x4: hyphen 0x5: breakable (low priority) 0x6: (reserved) 0x7: (reserved)