95067 - (uax14) UAX14: line-break should be allowed after hyphens (unless followed by number)

Reporter

Comment 20

•

23 years ago

Attached patch patch v2 (obsolete) — Details — Splinter Review

Mr. timeless, thank you for your advice. The patch was updated.

Hideo Saito

Comment 21

•

23 years ago

Although it is also in a former comment, if there is an element of <xx>...</xx> which follows the multiple Kanji characters which does not contain a white space, also contains the patch coping with the fault which the portion of "..." is not turned up correctly. The following is an portion of the patch. layout/html/base/src/nsTextFrame.cpp: } else { firstChar = *bp2; + if (IS_CJK_CHAR(firstChar)) + aTextData.mIsBreakable = PR_TRUE; } Refer to the patch of Bug 135323 for the macro IS_CJK_CHAR used in a patch.

timeless

•

23 years ago

Saito san, This is again a very difficult problem to handle. We have problem in breaking certain symbol sequence, like :-). Those commonly used stuff should be kept together, and that's why we only break ascii word by space. From this point of view, your current approach will be unacceptable. My proposal is (logically) to do this in 2 steps. First, we try all those old logic to wrap words, as what we are doing now. Second, if and only if a single word is too long to fit in current cell, should we try to break the word. In Linebreaker, we need to implement an new API to break a word into word segment. That should be rather easy to do, basically you can move your new code to this function. In layout code, we want to call this api and break a word only when such situation arises. That is difficult because layout code looks too complicated, but it is doable. Let me know if you agree with this approach and if you have time to do it.

Hideo Saito

•

23 years ago

Attached file testcase for patch v6 — Details

Hideo Saito

•

23 years ago

*** Bug 154541 has been marked as a duplicate of this bug. ***

Hideo Saito

Comment 37

•

23 years ago

Attached patch patch v8 (obsolete) — Details — Splinter Review

I tried to fix the problem of soft-hypen, the changes is only a display part. This patch also includes the fix of word-wrapping, a problem of nowrap property of CSS and a problem of <nobr></nobr> tag.

Jesse Ruderman

Comment 38

•

23 years ago

*** Bug 160852 has been marked as a duplicate of this bug. ***

Mats Palmgren (inactive)

Comment 39

•

23 years ago

*** Bug 149137 has been marked as a duplicate of this bug. ***

Boris Zbarsky [:bzbarsky]

Comment 40

•

•

22 years ago

Attinasi is gone. Reassigning to patch author.

Assignee: attinasi → saito

Alfonso Martinez

Comment 48

•

22 years ago

*** Bug 192757 has been marked as a duplicate of this bug. ***

Koike Kazuhiko

Comment 49

•

22 years ago

Comment on attachment 108709 [details] [diff] [review] patch v9 for mozilla-1.2.1 Saito-san, please post your new patch.

Attachment #108709 - Attachment is obsolete: true

Hideo Saito

Comment 50

•

22 years ago

Attached patch patch for mozilla-1.3b (obsolete) — Details — Splinter Review

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 51

•

22 years ago

I don't think we should break on hyphens. So why should we add this much additional code complexity to fix something that isn't even a bug? (Adding support for soft hyphen, etc., is definitely a good thing, but I imagine the changes to do that would be much simpler.)

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 52

•

22 years ago

OK, I'll partially retract that statement (in response to email sent to me that should have been a comment on this bug). I don't think we should break on hyphens due to the complexity of the current linebreaking code. (I recall a discussion of this issue in much more detail in another bug, though, that led me to think we shouldn't break on hyphens at all.)

David G King

Comment 53

•

Comment 65

•

22 years ago

Attached patch patch (obsolete) — Details — Splinter Review

I checked only ascii code. Please refer to a function of nsTextTransformer::GetNextDividedWord. This patch includes a bug fix shown below, because style of text should be updated for connecting some fragmentary text. nsTextFrame::ComputeWordFragmentDimensions + nsIStyleContext* aStyleContext; + aTextFrame->GetStyleContext(&aStyleContext); + const nsStyleText* textStyle = (const nsStyleText*) + aStyleContext->GetStyleData(eStyleStruct_Text); + aCanBreakBefore = (NS_STYLE_WHITESPACE_NORMAL == textStyle->mWhiteSpace) || + (NS_STYLE_WHITESPACE_MOZ_PRE_WRAP == textStyle->mWhiteSpace);

Attachment #114308 - Attachment is obsolete: true

rbs

Comment 66

•

22 years ago

Care to file a separate bug for that? Do well to also include a testcase to show the problem. It is not clear why |aCanBreakBefore| is out-of-sync when it is passed to ComputeWordFragmentDimensions().

Jo Hermans

Comment 67

•

22 years ago

*** Bug 204233 has been marked as a duplicate of this bug. ***

Hideo Saito

Comment 68

•

22 years ago

Updated

•

22 years ago

Component: Layout: Tables → Layout: Fonts and Text

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 75

•

22 years ago

Comment on attachment 123515 [details] [diff] [review] patch I think one reason this patch introduces so much additional complexity is that it's trying to modify line breaking from a level of the code other than the one at which line breaking happens. I would expect the fix to this bug to be closer to our current line breaking code, i.e., nsJISX4501LineBreaker (sic) and nsTextTransformer::Scan*. I don't see why a fix for this bug would need to modify other code.

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Updated

•

22 years ago

Blocks: 218580

andré

Comment 76

•

22 years ago

My problem (originally 217520, but transferred to 95067) concerned very_long_words (long string of text without white space) that exceeded cell width, and often window width. This has little to do with splitting a word on hyphens, etc. What is needed at a minimum is too ensure that a very_long_words will NEVER cause the cell width to exceed the window width. This is in itself a relatively simple problem, unlike splitting the very_long_word in an esthetically pleasing manner. It also solves a serious problem. When a very_long_word causes a text cell to exceed the window width, in a long text, the text is rendered unreadable, due to requiring scrolling right/left for each line, without visual cues to maintain one's place in the text. As this is often important information (e.g. re security patches) this poses a SERIOUS PROBLEM. Agreed, it would be NICE to have a more esthetically pleasing presentation, at A MINIMUM, THIS PARTICULAR PROBLEM should be solved as A PRIORITY.

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 77

•

22 years ago

No. As I said in comment 74, we're not changing the basic table algorithm. The web depends on it, and your proposal really won't help for all but the simplest case. (Why does breaking at the *window width* help for a table that has multiple columns?) But I reopened your bug and marked it a duplicate of a different bug. Please do not discuss the issue further on *this* bug. It's off-topic.

Alan Wood

Reporter

Comment 78

•

22 years ago

I am not completely happy with the end of the new title "unless followed by number". I can see the validity of this for dates, as in 2002-12-31 (comment 43), but I feel that some breaking of dates can be avoided by a widow/orphan setting of 3 characters, i.e. don't break if it would result in 1, 2 or 3 characters at the start or end of a line. Perhaps a widow/orphan setting could be included in Preferences? Not breaking after a hyphen that is followed by a number does not work so well for chemical names, which started this bug. For example: 2-bromo-4,4-dichlorophenol could happily be broken as 2-bromo- 4,4-dichlorophenol but breaking as 2-bromo-4,4- dichlorophenol is much less satisfactory. However, breaking after some hyphens would be MUCH better than never breaking after hyphens. We cannot have manual checking of each break, and so we will have to accept some imperfections. Alan Wood

Mike Cowperthwaite

Comment 110

•

20 years ago

(In reply to R.K.Aa., comment #61) > *** Bug 193360 has been marked as a duplicate of this bug. *** If this bug is indeed related to textareas, please change the summary to make it easier to find. Thanks, Prog.

williamw@sagnasty.org

Comment 111

•

•

•

18 years ago

ok, I mark this to FIXED, we don't use UAX#14, but we fix the actual bug. -> FIXED (In reply to comment #135) > Does Mozilla break at U+2010 Hyphen now or just U+002D Hypen‐Minus? Now, U+2010 is not breaking the line. But we can fix it easy. Please file a new bug and CC me.

•

18 years ago

Depends on: 389595

Alan Wood

Reporter

Comment 150

•

18 years ago

I have updated the URL, because the original one no longer exists. The page is exactly the same.

URL: http://www.hclrss.demon.co.uk/abamect... → http://www.alanwood.net/pesticides/ab...

Takanori MATSUURA

Comment 151

•

18 years ago

Really? I can access to the original one.

Takanori MATSUURA

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 158

•

17 years ago

I only just realized that none of the patches here ever got checked in, so reusing this bug does make sense, sorry about that.

Status: REOPENED → RESOLVED

Closed: 18 years ago → 17 years ago

Resolution: --- → FIXED

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 159

•

17 years ago

I'm afraid that I don't have the cycles to work on this for Gecko 1.9. A reduced testcase would help us get it fixed in the next release.

Flags: wanted-next+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 160

•

17 years ago

oops

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Alan Wood

Reporter

Comment 161

•

17 years ago

(In reply to comment #157) > There are 162 hyphens in the IUPAC cell of the example page, but the "FIXED" > version only seems able to break after 5 of them. Please accept my apologies for the incorrect information in #157. Robert O'Callahan is absolutely correct, we do need a simpler test case. Try this file: http://www.alanwood.net/demos/bug-95067-systematic.html and the long systematic names DO wrap. My thanks to everyone who has worked on this bug. However, the data sheets in my pesticide website still don't display properly in Firefox 3.04b. This is now because of problems with wrapping InChIs, which did not exist when I first filed the bug. Here is a test file for InChIs: http://www.alanwood.net/demos/bug-95067-inchi.html Lines are not being broken after hyphens (ASCII decimal 45 or hexa 0x2D) that separate 2 numbers: 52-37-25-16-26-38-52,47-59(9,10)53-39-27-17-28-40-53 resulting in some very long lines. Interner Explorer, Safari and Opera do break after these hyphens. Would allowing breaks after these hyphens in Firefox cause problems for any other data? If not, would it be simple to amend the new wrapping code? Firefox is also not breaking after hyphens like these: 33+,34-,35-,36-,37-, Internet Explorer and Safari do break after these hyphens, but although this allows wrapping, it puts a comma as the first character in a line, which does not look good. Would it cause any problems to allow breaks after these commas?

Simo Kaupinmäki

Comment 162

•

17 years ago

> Firefox is also not breaking after hyphens like these: > 33+,34-,35-,36-,37-, I have no idea what that string might be about, but I recognize that this can be a genuine issue for chemists. Nevertheless, I don't like the idea of allowing breaks after commas. If I see a break after a comma, I generally assume that there is a whitespace after the comma -- but I suppose it would be a (big?) mistake in this case. Seeing a line beginning with a comma could be distracting, but at least it would give me a hint that there is something exceptional going on. Unfortunately, allowing breaks between a hyphen and a comma may cause other problems. For example, in Finnish it is possible for a (compound) word to end with an elliptical hyphen (indicating that the last part of the compound has been omitted), and sometimes the hyphen may be followed by a comma. Now, as the comma would normally be followed by a whitespace, I grant that usually the odd comma is likely to fit in the same line as the preceding word with hyphen. But occasionally there would not be enough space and the comma would have to be moved to the next line. I think this would be rather unfortunate in a natural language context, where the reader expects the text to flow according to the general orthographic conventions. There may be other problematic cases too. In many languages, comma is used as the decimal marker (instead of the decimal point, as in English). If, in addition, a leading zero is replaced with a hyphen, you may see strings such as "-,50" (e.g., in a price tag). Here it would clearly be undesirable to break after the hyphen (and even more so after the comma). Perhaps some kind of an emergency break rule could be composed, though. The rule could allow exceptional breaks between a hyphen and a comma, but only in very long strings and if there was no better break opportunity within 10 (or even more?) characters.

Patrick Dark

Comment 163

•

Comment 177

•

17 years ago

(In reply to comment #164) > Sorry, but this is not a solution for InChIs. They are specified by IUPAC as > containing only ASCII characters, and so the horizontal line has to be the > hyphen-minus. Firefox 3.1a2 has introduced support for the CSS3 property word-wrap: break-word. This now makes it possible to break InChIs nicely in Firefox, with an appropriate style applied to them. See my updated test file: http://www.alanwood.net/demos/bug-95067-inchi.html As far as I am concerned, this bug can now be closed. My thanks to the Firefox developers.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 178

•

17 years ago

OK, thanks Alan!

Status: REOPENED → RESOLVED

Closed: 17 years ago → 17 years ago

Resolution: --- → WORKSFORME

Thomas Wisniewski

Updated

•

8 years ago

See Also: → https://github.com/webcompat/web-bugs/issues/4443

Whiteboard: [webcompat]

patch to wrap long word by key-characters 23 years ago Hideo Saito 10.16 KB, patch		Details \| Diff \| Splinter Review
Test view of wrapped long words 23 years ago Hideo Saito 4.48 KB, text/html		Details
testcase of a word not wrapped after Kanji 23 years ago Hideo Saito 485 bytes, text/html		Details
patch v2 23 years ago Hideo Saito 9.21 KB, patch		Details \| Diff \| Splinter Review
patch v3 23 years ago Hideo Saito 9.26 KB, patch		Details \| Diff \| Splinter Review
patch v4 23 years ago Hideo Saito 9.51 KB, patch		Details \| Diff \| Splinter Review
patch v5 23 years ago Hideo Saito 23.13 KB, patch		Details \| Diff \| Splinter Review
patch v6 23 years ago Hideo Saito 39.70 KB, patch		Details \| Diff \| Splinter Review
testcase for patch v6 23 years ago Hideo Saito 1.95 KB, text/html		Details
patch v7 23 years ago Hideo Saito 43.10 KB, patch		Details \| Diff \| Splinter Review
testcase for patch v7 23 years ago Hideo Saito 3.04 KB, text/html		Details
Soft-hyphen + table test case 23 years ago Aidas Kasparas 319 bytes, text/html		Details
patch v8 23 years ago Hideo Saito 43.71 KB, patch		Details \| Diff \| Splinter Review
patch v9 for mozilla-1.2.1 22 years ago Hideo Saito 45.10 KB, patch		Details \| Diff \| Splinter Review
patch for mozilla-1.3b 22 years ago Hideo Saito 44.85 KB, patch		Details \| Diff \| Splinter Review
screen shot of testcase for patch v7 22 years ago Hideo Saito 41.63 KB, image/png		Details
patch 22 years ago Hideo Saito 9.91 KB, patch		Details \| Diff \| Splinter Review
patch 22 years ago Hideo Saito 22.80 KB, patch		Details \| Diff \| Splinter Review
Example Screenshot 22 years ago Thomas Iversen 19.27 KB, image/gif		Details
Possibly due to HR tag aswell as hyphens 20 years ago williamw@sagnasty.org 90.85 KB, image/gif		Details