Closed Bug 54467 Opened 24 years ago Closed 22 years ago

punctuation mark in the :first-letter pseudo-element does not cover all the Unicode punctuation marks

Categories

(Core :: Layout, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla1.2alpha

People

(Reporter: ekrock, Assigned: shanjian)

References

Details

(Keywords: css1, relnote, testcase, Whiteboard: [CSS1-2.4][FIX](py8ieh: find testcase) renote-devel, patch available, need r/sr)

Attachments

(2 files)

The "5.12.2 :first-letter pseudo-element" section of CSS2 (
http://www.w3.org/TR/REC-CSS2/selector.html#first-letter) said
The :first-letter pseudo-element may be used for "initial caps" and "drop caps",
which are common typographical effects. This type of initial letter is similar
to an inline-level element if its 'float' property is 'none', otherwise it is
similar to a floated element.
...
Punctuation (i.e, characters defined in Unicode [UNICODE] in the "open" (Ps),
"close" (Pe), and "other" (Po) punctuation classes), that precedes the first
letter should be included

Howerver, currently, we only include two character - "  and '



------- Additional Comments From Frank Tang 2000-09-25 17:41 -------

We should include all the Ps Pe and Po characters in the following file
http://www.unicode.org/Public/3.0-Update1/UnicodeData-3.0.1.txt


------- Additional Comments From Frank Tang 2000-09-25 17:42 -------

Here is the command I got all the Ps Pe and Po characters
egrep "Ps|Pe|Po" UnicodeData-3.0.1.txt |cut -f1 -d; > punc.txt



------- Additional Comments From Frank Tang 2000-09-25 17:44 -------

0021
0022
0023
0025
0026
0027
0028
0029
002A
002C
002E
002F
003A
003B
003F
0040
005B
005C
005D
007B
007D
00A1
00B7
00BF
0313
037E
0387
055A
055B
055C
055D
055E
055F
0589
05BE
05C0
05C3
05F3
05F4
060C
061B
061F
066A
066B
066C
066D
06D4
0700
0701
0702
0703
0704
0705
0706
0707
0708
0709
070A
070B
070C
070D
0964
0965
0970
0DF4
0E4F
0E5A
0E5B
0F04
0F05
0F06
0F07
0F08
0F09
0F0A
0F0B
0F0C
0F0D
0F0E
0F0F
0F10
0F11
0F12
0F3A
0F3B
0F3C
0F3D
0F85
104A
104B
104C
104D
104E
104F
10FB
1361
1362
1363
1364
1365
1366
1367
1368
166D
166E
169B
169C
16EB
16EC
16ED
17D4
17D5
17D6
17D7
17D8
17D9
17DA
17DC
1800
1801
1802
1803
1804
1805
1807
1808
1809
180A
2016
2017
201A
201E
2020
2021
2022
2023
2024
2025
2026
2027
2030
2031
2032
2033
2034
2035
2036
2037
2038
203B
203C
203D
203E
2041
2042
2043
2045
2046
2048
2049
204A
204B
204C
204D
207D
207E
208D
208E
2329
232A
3001
3002
3003
3008
3009
300A
300B
300C
300D
300E
300F
3010
3011
3014
3015
3016
3017
3018
3019
301A
301B
301D
301E
301F
FD3E
FD3F
FE30
FE35
FE36
FE37
FE38
FE39
FE3A
FE3B
FE3C
FE3D
FE3E
FE3F
FE40
FE41
FE42
FE43
FE44
FE49
FE4A
FE4B
FE4C
FE50
FE51
FE52
FE54
FE55
FE56
FE57
FE59
FE5A
FE5B
FE5C
FE5D
FE5E
FE5F
FE60
FE61
FE68
FE6A
FE6B
FF01
FF02
FF03
FF05
FF06
FF07
FF08
FF09
FF0A
FF0C
FF0E
FF0F
FF1A
FF1B
FF1F
FF20
FF3B
FF3C
FF3D
FF5B
FF5D
FF61
FF62
FF63
FF64

you can also find it from http://warp/u/ftang/tmp/punc.txt



------- Additional Comments From Frank Tang 2000-09-25 17:46 -------

Created an attachment (id=15481)
test case show punctuation mark problem in :first-letter



------- Additional Comments From Frank Tang 2000-09-25 17:47 -------

The position of the beginning of the line problem is file as 54099



------- Additional Comments From Frank Tang 2000-09-25 17:54 -------

The problem is in nsTextFrame.cpp
3749 
3750       if (aTextData.mFirstLetterOK) {
3751         // XXX need a lookup function here; plus look ahead using the
3752         // text-runs
3753         if ((firstChar == '\'') || (firstChar == '\"')) {
3754           if (contentLen > 1)

It only check '\'' and '\"' . It should be replace by
if(IsPunct(firstChar)) and we need to implement IsPunct(PRUnichar) to include 
all the Ps Pe Po characters.



------- Additional Comments From Frank Tang 2000-09-26 10:17 -------

please consdier for april release
Reassigning to erik. Future.
Assignee: clayton → erik
Keywords: css1, relnote
Target Milestone: --- → Future
:first-letter, taking QA.

I have a *thorough* testcase for this, when you need it give me a shout and 
I'll look for it.
QA Contact: petersen → py8ieh=bugzilla
Whiteboard: (py8ieh: find testcase)
Status: NEW → ASSIGNED
Attached file test cases
Hixie: What's the relnote status here?

Gerv
Whiteboard: (py8ieh: find testcase) → (py8ieh: find testcase) renote-devel
Target Milestone: Future → ---
reassign to ftang.
Mark this as moz1.0
Assignee: erik → ftang
Status: ASSIGNED → NEW
Target Milestone: --- → mozilla1.0
Status: NEW → ASSIGNED
nhotta- can you help on this one ? should be an easy one for you to start hacking 
nsTextFrame.cpp
Assignee: ftang → nhotta
Status: ASSIGNED → NEW
Target Milestone: mozilla1.0 → Future
Reassign to ftang.
Assignee: nhotta → ftang
give to shanjian
Assignee: ftang → shanjian
Target Milestone: Future → mozilla1.0
Attached patch patch v1.0Splinter Review
The fix might not be as simple as we thought. I proposed a fix that implementing 
the existing idea proposed by frank in this email. There are several issues need  
to be discussed:
1, Should we include other characters in our punctuation set? CSS2 specs does 
say Ps, Pe, Po should be included, but it does not say other characters should 
not be included. How about "$" as shown in frank's test case? It is not a 
punctuation mark as defined by unicode. 
2, As CSS specs 5.12.2 metioned :
   "Some languages may have specific rules about how to treat certain letter 
combinations. In Dutch, for example, if the letter combination "ij" apprears at 
the beginning of a word, both letters should be considered within the 
:first-letter pseudo-element."
   Should we include those situations? I doubt if anyone will list a lot of such 
examples. Even to include above mentioned example is rather difficult, because 
we have to know what language it is. (Encoding normally is easier to get. )

Status: NEW → ASSIGNED
Well since we shouldn't be doing anything based merely on the encoding...
Well lets just limit this fix to handle punctuation mark. Bug 92176 has been 
filed for dutch "ij" and possible other problem. 

Can somebody review this one? ian? pierre?
Whiteboard: (py8ieh: find testcase) renote-devel → (py8ieh: find testcase) renote-devel, patch available, need r/sr
I don't think we are supposed to include all the "Po" punctuation characters but 
only some of them, like INVERTED EXCLAMATION MARK or INVERTED QUESTION MARK.  It 
can't really hurt, though, to list them all, so you have my blessing. r=pierre
chris, can you sr this one? 
Please check jrgm's page load tests and iBench. If they are unaffected, then
sr=waterson.
Chris, for most the webpages, my code is never executed. For those web pages 
that do have such pseudo element, the function is only applied to the first 
character. I doubt it will have any impact on performance. So can you waive the 
requirement? (I am asking this is because I don't know how to do such testing, 
and I have to ask someone else do it for me.) 
What a great time to learn! ;-)

jrgm's tests are at the following URL, and are self-explanatory.

  <http://cowtools.mcom.com/>

An internal version of iBench is at the following URL, and you should run the
``HTML load speed test''. (Click ``Start'', then ``I accept'', then ``Select
Tests'' at the very bottom of the page, then the ``HTML Load Speed'' link.)

  <http://dogspit.mcom.com/ibench/i-bench.htm>
I tried the 2 test, but it seems I have to prepare 2 optimized build for the 
test. My dialup connection is really intolerable. So I will leave this bug for 
future. 
Keywords: mozilla1.0
Whiteboard: (py8ieh: find testcase) renote-devel, patch available, need r/sr → [CSS1-2.4][FIX](py8ieh: find testcase) renote-devel, patch available, need r/sr
Moving Netscape owned 0.9.9 and 1.0 bugs that don't have an nsbeta1, nsbeta1+,
topembed, topembed+, Mozilla0.9.9+ or Mozilla1.0+ keyword.  Please send any
questions or feedback about this to adt@netscape.com.  You can search for
"Moving bugs not scheduled for a project" to quickly delete this bugmail.
Target Milestone: mozilla1.0 → mozilla1.2
If you can test with a printf that the code isn't executed, then that seems good
enough to know that it won't affect performance.  It would be good to see this
in.  That said, it's technically a duplicate of bug 23605, which I filed, but
feel free to dupe the other way around, especially if you actually get the fix
in. :-)

That said, I'm somewhat surprised that you didn't need to modify any frame
construction code.  Does this still work correctly if you use styles such as:

p:first-letter {
  border: thin solid black;
  background: yellow;
  font-size: 200%;
}

(Actually, some of those properties may not be working right now for
:first-letter due to bug 103189.)
I put in the printf and ran a lot of pages (using browser buster test cases), I
simple did not record a single one call instance. I guess this patch should be
OK in terms of performance. 
Attachment #42563 - Flags: superreview+
Attachment #42563 - Flags: review+
Comment on attachment 42563 [details] [diff] [review]
patch v1.0

Mark r/sr (done by pierre/waterson a while ago.)
fix checked in. 
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Blocks: 23605
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: