Closed Bug 534136 Opened 15 years ago Closed 14 years ago

Switch atoms back to being UTF-16; have them hold string buffers (dromaeo)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla1.9.3a3

Tracking Flags:

Tracking

Status

blocking2.0

---

People

(Reporter: bzbarsky, Assigned: sicking)

References

Details

Attachments

(8 files, 8 obsolete files)

Make the attribute qualified name stuff work with UTF-16 15 years ago Boris Zbarsky [:bzbarsky] 11.68 KB, patch		Details \| Diff \| Splinter Review
Some more attribute stuff moving to UTF-16 15 years ago Boris Zbarsky [:bzbarsky] 3.14 KB, patch		Details \| Diff \| Splinter Review
Part1: Use nsStringBuffers 15 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 79.72 KB, patch	benjamin : review+	Details \| Diff \| Splinter Review
Part2: Use 16bit chars in buffers 15 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 76.79 KB, patch	bzbarsky : review-	Details \| Diff \| Splinter Review
Part3: Take advantage of new API 15 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 64.98 KB, patch	bzbarsky : review-	Details \| Diff \| Splinter Review
Part4: Inline accessors into nsIAtom 15 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 11.75 KB, patch	peterv : review+ jst : superreview+	Details \| Diff \| Splinter Review
Part5: Clean up atomtable hash entries 15 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 13.89 KB, patch	jst : review+	Details \| Diff \| Splinter Review
Part6: Use an PLArenaPool to store atoms 15 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 8.55 KB, patch		Details \| Diff \| Splinter Review
Part6b: Use an pool to store first X atoms 15 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 4.22 KB, patch		Details \| Diff \| Splinter Review
Part6: Fix handling of invalid UTF8 strings passed to NS_NewAtom 14 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 3.47 KB, patch	jst : review+	Details \| Diff \| Splinter Review
Part7: Atom tests 14 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 9.82 KB, patch	jst : review+	Details \| Diff \| Splinter Review
Part2 review comment fixes 14 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 18.75 KB, patch		Details \| Diff \| Splinter Review
Part2: Use 16bit chars in buffers (updated) 14 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 77.38 KB, patch	bzbarsky : review+ mrbkap : superreview+	Details \| Diff \| Splinter Review
Fixes for Part3 14 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 790 bytes, patch		Details \| Diff \| Splinter Review
Part3: Take advantage of new API 14 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 64.95 KB, patch	bzbarsky : review+ hsivonen : review+	Details \| Diff \| Splinter Review
patch for winmo debug build 14 years ago Brad Lassey [:blassey] (use needinfo?) 1.66 KB, patch	sicking : review+	Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

15 years ago

I suspect that a lot more consumers get them as UTF-16, and the string copies involved in the DOM's access to atoms (for attribute names, values, etc) show up in DOM benchmark profiles quite noticeably.

I guess I should measure to see how many AtomImpls we create with UTF-16 as opposed to UTF-8.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 1

•

15 years ago

I ran some numbers a long time ago, which I think I put somewhere in bugzilla, on how often we have UTF16 data and want to get or compare to an atom, vs. how often we have UTF8 data. It was a bit tricky since many places, especially performance critical ones, are aware of the fact that atoms are UTF8 based and adjust the code to this fact. For example by converting UTF16->UTF8 before comparing to an atom.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 2

•

15 years ago

Found it:

https://bugzilla.mozilla.org/show_bug.cgi?id=277479#c8

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 3

•

15 years ago

See also bug 277479 comment 9

Boris Zbarsky [:bzbarsky]

Reporter

Comment 4

•

15 years ago

Ah.  Sounds like we should do it (per bug 277479 comment 9).  Do you want me to dup this bug there, or vice versa?  Doing this, and fixing GetExistingAttrNameFromQName to not use ACString, would speed up DOM attribute perf a good bit...

Again, I think we should make the atoms hold shareable string buffers, not just PRUnichar* strings.

Summary: Switch atoms back to being UTF-16; have them hold string buffers → Switch atoms back to being UTF-16; have them hold string buffers (dromaeo)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 5

•

15 years ago

Lets keep this bug open, the other covers enough topics as it is. Don't really care if we dupe the other one or just mark it dependent.

Totally agree on the holding sharable buffers part. Not sure how to make that happen for static atoms though? Can we create static "fake" buffers which we give an extra addref to ensure that they are freed.

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

15 years ago

Blocks: 534148

Boris Zbarsky [:bzbarsky]

Reporter

Comment 6

•

15 years ago

OK.  Let's mark dependent for now.

As far as how to make static atoms happy, that sounds like a good plan.  Basically create an nsStringBuffer (not nsStringBuffer*) member and pre-generate not only the PRUnichars but the mRefCnt (1) and the mStorageSize (whatever length is plus 1 and then times 2).  Might have to stick the nsStringBuffer in a union with a struct that has the relevant members and use an initializer on the struct, or something....

Jonas, want to just do this?

Blocks: 277479

blocking2.0: --- → ?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 7

•

15 years ago

Yup, I can take this

Assignee: nobody → jonas

Status: NEW → ASSIGNED

Boris Zbarsky [:bzbarsky]

Reporter

Comment 8

•

15 years ago

Attached patch Make the attribute qualified name stuff work with UTF-16 (obsolete) — Details — Splinter Review

Boris Zbarsky [:bzbarsky]

Reporter

Comment 9

•

15 years ago

Attached patch Some more attribute stuff moving to UTF-16 (obsolete) — Details — Splinter Review

Maybe we should have a UTF16** getter on atoms, instead of refcounting the string buffer in some of these cases?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 10

•

15 years ago

Attached patch Part1: Use nsStringBuffers — Details — Splinter Review

Use nsStringBuffers rather than inline-allocated buffers. Still uses 8-bit chars so that the same nsIAtom API can be maintained. Mostly changes the macro-magic around at the places where we use static atoms.

This also allows getting rid of the static-atom-wrapped implementation of nsIAtom. So static atoms are now simply permanent atoms that use a buffer that's statically allocated on startup.

Attachment #417283 - Attachment is obsolete: true

Attachment #417284 - Attachment is obsolete: true

Attachment #419641 - Flags: review?(bzbarsky)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 11

•

15 years ago

Attached patch Part2: Use 16bit chars in buffers (obsolete) — Details — Splinter Review

Changes to 16-bit PRUnichars to store string data inside the atoms, rather than 8bit chars. This requires changing nsIAtom::getUTF8String to nsIAtom::getUTF16String, and thus all callers of getUTF8String.

Note that this doesn't change any callers more than necessary, later patches will change code to be more performant.

Attachment #419642 - Flags: review?(mrbkap)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 12

•

15 years ago

Attached patch Part3: Take advantage of new API (obsolete) — Details — Splinter Review

Change callers to take advantage of the fact that we now use 16bit data in atoms. And use the new nsDependentAtomString/nsAtomString/nsAtomCString classes where it makes sense.

I went through all callsites of nsIAtom::EqualsUTF8 and nsIAtom::ToString (not sure if I did nsIAtom::Equals). Should also go through NS_NewAtom/do_GetAtom, but haven't done so yet.

Attachment #419643 - Flags: review?(bzbarsky)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

15 years ago

Attachment #419643 - Attachment is patch: true

Attachment #419643 - Attachment mime type: application/octet-stream → text/plain

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 13

•

15 years ago

Attached patch Part4: Inline accessors into nsIAtom — Details — Splinter Review

This is sort of ugly, but seems to work. Allows inlining the most perf critical methods on nsIAtom, but means that nsIAtom can't be implemented from script. Although that's already impossible since double-wrapping kills pointer identity.

Attachment #419645 - Flags: superreview?(jst)

Attachment #419645 - Flags: review?(benjamin)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

15 years ago

Attachment #419641 - Flags: review?(bzbarsky) → review?(benjamin)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

15 years ago

Attachment #419642 - Flags: review?(mrbkap) → review?(bzbarsky)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 14

•

15 years ago

Attached patch Part5: Clean up atomtable hash entries — Details — Splinter Review

There's really no reason to to use the hashkey trickeries. We don't use "real" hashentries as keys ever anyway.

Attachment #419646 - Flags: review?(jst)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 15

•

15 years ago

Attached patch Part6: Use an PLArenaPool to store atoms (obsolete) — Details — Splinter Review

The downside here is that we'll never be able to reclaim memory used by atoms (though we do reclaim the memory used by the string data inside the atom). So if a page creates 100000 different atom names, we'll forever use 1200000 bytes of memory for atom pools.

Attachment #419647 - Flags: review?(bzbarsky)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 16

•

15 years ago

Attached patch Part6b: Use an pool to store first X atoms (obsolete) — Details — Splinter Review

Alternative solution to "Part6" patch. Instead of using pools for all atoms, just use a pool for the first X atoms. This applies on top of the Part6 patch.

I do think we should use some type of pool since we currently do on trunk for static atoms, which probably saves us some cycles during startup.

Attachment #419649 - Flags: review?(bzbarsky)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

15 years ago

Attachment #419649 - Attachment description: Part6b: Use an PLArenaPool to store atoms → Part6b: Use an pool to store first X atoms

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 17

•

15 years ago

Note for posterity: Why does nsSVGDataParser use 8-bit strings rather than 16-bit?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 18

•

15 years ago

How many static atom allocations do we do from that arena on startup right now?  I would rather avoid the arena complexity, for the reason described in comment 15 and because picking a good cap is hard, unless there's actually measurable Ts impact...

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 19

•

15 years ago

We allocate 2158 "static" atoms during startup. This does not include duplicates, i.e. if the same string appears in multiple atom tables passed to NS_RegisterStaticAtoms, that atom is only counted once.

During a run when I opened a bunch of different websites, including youtube, gmail, and lots of bbc news tabs, I ended up with a maximum of 10266 atoms existing at once. Including static atoms.

Simply opening and closing the browser, only displaying the default minefield startup page, I got a maximum of 3604 atoms.

We never seem to create any permanent atoms (other than through calls to NS_RegisterStaticAtoms). Nor do we ever promote a non-permanent atom to be permanent, i.e. NS_RegisterStaticAtoms never results in any atoms being promoted. So we should probably remove the ability to create permanent atoms other than through NS_RegisterStaticAtoms.

Dunno how long calling malloc 2158 times takes, but that is how much Ts would be hit by not using an arena.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 20

•

15 years ago

> Dunno how long calling malloc 2158 times takes

I just checked on mac (where we complain all the time about the slow allocator) and it looks like a test program that does |malloc(20)| 1e7 times (allocating a total of 200MB or so) and stores all the results in an array takes about 0.7 seconds to run on my machine.  So 2158 mallocs would take about 0.2ms.

Things would get a bit worse if we had to free() as well, since it looks like free() takes 2-3x as long as malloc() on Mac.

My gut feeling is that we should try without the arena, and if that shows up in Ts add a capped arena sized to somewhat above the static atom count.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

15 years ago

Attachment #419647 - Flags: review?(bzbarsky)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

15 years ago

Attachment #419649 - Flags: review?(bzbarsky)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 21

•

15 years ago

Sounds good

Benjamin Smedberg

Comment 22

•

15 years ago

Comment on attachment 419645 [details] [diff] [review]
Part4: Inline accessors into nsIAtom

Ugh. The reason this is an interface at all is that there is still some script code which needs to manipulate atoms? Do we know what that code is, and whether we can remove it short-term?

Absent that, we should really have some xpconnect magic for nsIAtom so that script can't pretend to implement it.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 23

•

15 years ago

nsITreeView hands back arrays of atoms (getRow/Cell/ColumnProperties).  There are JS implementations of nsITreeView.  Then again, they just need to be able to create atoms from a given string but not DO anything with them.

There are also editor APIs that might be using atoms.  On the other hand, XPConnect will convert strings to atoms for in params, so those should be usable from JS without nsIAtom as-is, right?

Full list of idl that uses atoms at http://mxr.mozilla.org/mozilla-central/search?string=nsIAtom&find=idl&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 24

•

15 years ago

For what it's worth, script has never been able to properly implement nsIAtom, since that would produce objects with the wrong pointer identity.

I think medium to long term we could maybe get rid of all places in script that use nsIAtom, but we would have to look out for performance.

OTOH I don't think this hack has particularly bad consequences. The uglyness is mostly confined to the nsIAtom.idl and nsAtomTable.cpp files.

Benjamin Smedberg

Comment 25

•

15 years ago

Sure, it couldn't *properly* implement it, but if it wanted to crash or cause other problems it could try to improperly implement. If we're absolutely certain that content script can never implement nsITreeView, maybe that's ok, but I'd really like a followup bug for that in particular.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 26

•

15 years ago

Untrusted script is not allowed to implement nsITreeView.

Zack Weinberg (:zwol)

Comment 27

•

15 years ago

FWIW, I was pretty seriously considering converting the CSS parser to work entirely in UTF-8, in order to avoid the conversion overhead (nearly all CSS is pure ASCII and is served over the network as such).  It has to look up lots of atoms, though, and I'm not sure where the tradeoff would lie if atom lookup has to convert to UTF-16.

I thought the long-term goal was to stop using UTF-16 altogether.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 28

•

15 years ago

How are you planning on dealing with the fact that all of our charset converters output UTF16? Are you going to have a fast-path UTF8 parser, and a second UTF16 parser? Or were you planning on taking the output from the charset converters and convert to UTF8?

While I agree that it would be better if our entire codebase was UTF8 based, I think converting code here and there does not make sense. The atom experiment was a good example of that where using UTF8 in just a part of the code has lead to slower performance and more complex code.

Zack Weinberg (:zwol)

Comment 29

•

15 years ago

(In reply to comment #28)
> How are you planning on dealing with the fact that all of our charset
> converters output UTF16? Are you going to have a fast-path UTF8 parser, and a
> second UTF16 parser? Or were you planning on taking the output from the charset
> converters and convert to UTF8?

I hadn't got that far yet, but I probably would have added a second set of charset converters that target UTF8; even more important from my perspective is to not have to do conversion at all for input labeled as UTF8 (note that this is the default for CSS).

> While I agree that it would be better if our entire codebase was UTF8 based, I
> think converting code here and there does not make sense. The atom experiment
> was a good example of that where using UTF8 in just a part of the code has lead
> to slower performance and more complex code.

I don't think we will ever make the switch if we have to do it all at once.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 30

•

15 years ago

The other thing I should note is that atom-lookup using a UTF8 string is not much slower than atom-lookup using a UTF16 string, if the atom already exists. So unless you're creating a lot of atoms based on author provided random strings, you shouldn't see much of a difference.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 31

•

15 years ago

> I don't think we will ever make the switch if we have to do it all at once.

If you want to work towards switching our codebase to UTF8 in a progressive manner I'm all for that. But I think we need a better plan than just converting code here and there and hope that it doesn't regress performance.

Brendan Eich [:brendan]

Comment 32

•

15 years ago

DOM and JS standards married UTF-16 (via UCS-2) in the '90s. This is hard to change. The trade-off between transcoding up front to UTF-16 vs. mapping the UCS-2 semantics onto UTF-8 strings under the JS and DOM hoods is not clear to me, but if sicking's right that piecewise UTF-8 regressed perf recently, that seems meaningful.

/be

Zack Weinberg (:zwol)

Comment 33

•

15 years ago

(In reply to comment #30)
> The other thing I should note is that atom-lookup using a UTF8 string is not
> much slower than atom-lookup using a UTF16 string, if the atom already exists.
> So unless you're creating a lot of atoms based on author provided random
> strings, you shouldn't see much of a difference.

Good to know -- mostly these are atoms for element names and the like, so they should exist.

(In reply to comment #31)
> > I don't think we will ever make the switch if we have to do it all at once.
> 
> If you want to work towards switching our codebase to UTF8 in a progressive
> manner I'm all for that. But I think we need a better plan than just converting
> code here and there and hope that it doesn't regress performance.

It seems to me that if switching to UTF8 is on the table *at all*, there should be no conversions in the other direction.  Granted that nsIAtom lookups from JS/DOM currently hurt because of conversion, my choice would therefore have been to tackle switching JS/DOM over to UTF8 rather than to go backward.

(In reply to comment #32)
> DOM and JS standards married UTF-16 (via UCS-2) in the '90s. This is hard to
> change. The trade-off between transcoding up front to UTF-16 vs. mapping the
> UCS-2 semantics onto UTF-8 strings under the JS and DOM hoods is not clear to
> me, but if sicking's right that piecewise UTF-8 regressed perf recently, that
> seems meaningful.

I'm prepared to bet that the mapping is the way to go, because the only thing that loses is array indexing on string objects when those strings actually contain non-ASCII characters, ne?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 34

•

15 years ago

(In reply to comment #33)
> > > I don't think we will ever make the switch if we have to do it all at once.
> > 
> > If you want to work towards switching our codebase to UTF8 in a progressive
> > manner I'm all for that. But I think we need a better plan than just converting
> > code here and there and hope that it doesn't regress performance.
> 
> It seems to me that if switching to UTF8 is on the table *at all*, there should
> be no conversions in the other direction.  Granted that nsIAtom lookups from
> JS/DOM currently hurt because of conversion, my choice would therefore have
> been to tackle switching JS/DOM over to UTF8 rather than to go backward.

We need to be pragmatic. Switching to UTF8 isn't a goal in and of itself. It's a means towards getting better performance and lower footprint. Having atoms in UTF8 does not currently fulfill that goal and so we should not do it at this time.

Brendan Eich [:brendan]

Comment 35

•

15 years ago

(In reply to comment #33)
> I'm prepared to bet that the mapping is the way to go, because the only thing
> that loses is array indexing on string objects when those strings actually
> contain non-ASCII characters, ne?

Not just indexing, regular expression matching, and don't discount other parts of the world where 7 bits ain't enough. The asymptotic cost could be bad compared to just sucking up more memory and using "UTF-16".

/be

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

14 years ago

Depends on: 541245

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

14 years ago

Attachment #419647 - Attachment is obsolete: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

14 years ago

Attachment #419649 - Attachment is obsolete: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 36

•

14 years ago

Attached patch Part6: Fix handling of invalid UTF8 strings passed to NS_NewAtom — Details — Splinter Review

So I tested what happens if invalid UTF8 strings are passed to NS_NewAtom(char*). For strings that contain "recoverable" errors, such as overlong surrogates, the fixes in bug 541245 will handle that.

However for errors like broken surrogates, where the UTF8->UTF16 conversion code simply returns an empty UTF16 string, we need special code.

So what this patch does is if it detects that a string is "too" broken while hashing the UTF8 string, we do an upfront conversion to UTF16 (which means simply taking the empty string) and use that to look up an atom.

Attachment #422899 - Flags: review?(jst)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 37

•

14 years ago

Attached patch Part7: Atom tests — Details — Splinter Review

This contains tests for all patches in this bug. Applies on top of tests in bug 541245.

Attachment #422901 - Flags: review?(jst)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

14 years ago

Attachment #422901 - Attachment description: Atom tests → Part7: Atom tests

Boris Zbarsky [:bzbarsky]

Reporter

Comment 38

•

14 years ago

Comment on attachment 419642 [details] [diff] [review]
Part2: Use 16bit chars in buffers

>+++ b/xpcom/tests/TestAtoms.cpp
>@@ -76,19 +76,19 @@ int main(int argc, char** argv)
>     ids[i] = NS_NewAtom(strings[i]);
>   }
>   PRUnichar qqs[1]; qqs[0] = 0;
>   nsIAtom* qq = NS_NewAtom(qqs);
>   PRTime end1 = PR_Now();
> 
>   // Now make sure we can find all the idents we just made
>   for (i = 0; i < count; i++) {
>-    const char *utf8String;
>-    ids[i]->GetUTF8String(&utf8String);
>-    nsIAtom* id = NS_NewAtom(utf8String);
>+    const PRUnichar *utf16String;
>+    ids[i]->GetUTF16String(&utf16String);
>+    nsIAtom* id = NS_NewAtom(utf16String);
>     if (id != ids[i]) {
>       id->ToString(s1);
>       ids[i]->ToString(s2);
>       printf("find failed: id='%s' ids[%d]='%s'\n",
>              NS_LossyConvertUTF16toASCII(s1).get(), i, NS_LossyConvertUTF16toASCII(s2).get());
>       return -1;
>     }
>     NS_RELEASE(id);
>diff --git a/xpcom/tests/TestPermanentAtoms.cpp b/xpcom/tests/TestPermanentAtoms.cpp
>--- a/xpcom/tests/TestPermanentAtoms.cpp
>+++ b/xpcom/tests/TestPermanentAtoms.cpp
>@@ -46,40 +46,38 @@
> 
> static void Assert(PRBool aCondition, const char* aStatement)
> {
>     printf("%s: %s\n", aCondition?"PASS":"FAIL", aStatement);
> }
> 
> static void AssertString(nsIAtom *aAtom, const nsACString& aString)
> {
>-    const char *str;
>-    static_cast<AtomImpl*>(aAtom)->GetUTF8String(&str);
>-    Assert(nsDependentCString(str) == aString, "string is correct");
>+    Assert(aAtom->EqualsUTF8(aString), "string is correct");
> }
> 
> static void AssertPermanence(nsIAtom *aAtom, PRBool aPermanence)
> {
>     Assert(static_cast<AtomImpl*>(aAtom)->IsPermanent() == aPermanence,
>            aPermanence ? "atom is permanent" : "atom is not permanent");
> }
> 
> int main()
> {
>     nsCOMPtr<nsIAtom> foo = do_GetAtom("foo");
>     AssertString(foo, NS_LITERAL_CSTRING("foo"));
>     AssertPermanence(foo, PR_FALSE);
> 
>-    nsCOMPtr<nsIAtom> foop = do_GetPermanentAtom("foo");
>+    nsCOMPtr<nsIAtom> foop = NS_NewPermanentAtom(NS_LITERAL_STRING("foo"));
>     AssertString(foop, NS_LITERAL_CSTRING("foo"));
>     AssertPermanence(foop, PR_TRUE);
>     
>     Assert(foo == foop, "atoms are equal");
>     
>-    nsCOMPtr<nsIAtom> barp = do_GetPermanentAtom("bar");
>+    nsCOMPtr<nsIAtom> barp = NS_NewPermanentAtom(NS_LITERAL_STRING("bar"));
>     AssertString(barp, NS_LITERAL_CSTRING("bar"));
>     AssertPermanence(barp, PR_TRUE);
>     
>     nsCOMPtr<nsIAtom> bar = do_GetAtom("bar");
>     AssertString(bar, NS_LITERAL_CSTRING("bar"));
>     AssertPermanence(bar, PR_TRUE);
> 
>     Assert(bar == barp, "atoms are equal");
>

Attachment #419642 - Flags: review?(bzbarsky) → review-

Boris Zbarsky [:bzbarsky]

Reporter

Comment 39

•

14 years ago

Er... those review comments were _supposed_ to be:

>+++ b/content/base/src/nsAttrValue.cpp
>@@ -803,36 +804,34 @@ nsAttrValue::Contains(nsIAtom* aValue, n
>+      return nsLayoutUtils::CheatingEqualsIgnoreCase(val1, val2);

I would prefer that:

1) We call the function EqualsIgnoreASCIICase or some such.
2) We document at all callsites why we're using it instead of the usual
   unicode-aware case-insensitive compare.

Is there a reason this is in nsLayoutUtils and not nsContentUtils?
   
>+++ b/content/smil/nsSMILCompositor.cpp
>+  // XXX This used to hash on the string value rather than the atom.
>+  //     Check that there wasn't a reason for that.

Check with dholbert, but I would guess not.

That said, is just adding two pointers going to give good keys?  Would it make sense to at least shift one of them or something?

>+++ b/layout/base/nsLayoutUtils.cpp
>+nsLayoutUtils::CheatingEqualsIgnoreCase(const PRUnichar* aStr1, const PRUnichar* aStr2)
>+      // the same ascii char, but just differeing in case

"differing"

>+      // They are the same. Check if we're at the end of the strings.

This assumes that the two strings don't contain embedded nulls, right?  In particular, if they do, this could return true for strings that aren't actually equal.

This somewhat worries me, but the old behavior was no different at the callsites where this happens...  Add some warnings in the comments, though?  If caller doesn't control at least one of the strings, this comparison can lie.

And honestly, I think at least some callers don't control these strings sufficiently.  Maybe a followup patch to just do the comparison on nsStrings?

>+++ b/layout/style/nsComputedDOMStyle.cpp
>@@ -4278,19 +4278,18 @@ nsComputedDOMStyle::GetTransitionPropert
>+      property->SetString(nsAtomCString(transition->GetUnknownProperty()));

This can use nsAtomString.  Save two conversions.

>+++ b/parser/htmlparser/src/nsHTMLTags.cpp
>+#ifdef DEBUG && NS_STATIC_ATOM_USE_WIDE_STRINGS

Shouldn't that be:

  #if defined(DEBUG) && defined(NS_STATIC_ATOM_USE_WIDE_STRINGS)

?

>+++ b/xpcom/ds/nsAtomTable.cpp
>@@ -552,16 +559,32 @@ NS_RegisterStaticAtoms(const nsStaticAto
>       *aAtoms[i].mAtom = atom;
> 
>       if (!gStaticAtomTableSealed) {
>         nsAutoString key;
>         atom->ToString(key);
>         gStaticAtomTable->Put(key, atom);

You can use nsAtomString or nsDependentAtomString there to avoid the extra copy, right?

>+++ b/xpcom/ds/nsCRT.cpp
>+PRUint32 nsCRT::HashCodeAsUTF16(const char* start, PRUint32 length)
>+      else if ((c & 0xF0) == 0xE0) { // 4-byte
>+        ucs4 = (PRUint32(c) << 18) & 0x001F0000L;

Those both look wrong to me.  Where did they come from?  I would expect something like:

  else if ((c & 0xF8) == 0xF0) { // 4-byte
    ucs4 = (PRUint32(c) << 18) & 0x001C0000L;

Where did your numbers come from?

>+      else if ((c & 0xF0) == 0xE0) { // 5-byte

  (c & 0xFC) == 0xF8

right?

>+      else if ((c & 0xF0) == 0xE0) { // 6-byte

  (c & 0xFE) == 0xFC

>+      while (l--) {

I would almost prefer we named this |len| instead of l.  |l*6| looked too much like |1*6| over here.

>+        char c = *s++;

This is shadowing the |char c| in the outer loop.  Call it something else?

I have to ask... is there a reason we're reinventing UTF8CharEnumerator here?  Especially in a buggy way?

If we really do need this separate copy of the code, please get smontagu or someone else intl-competent to look at it?  And please write some tests for it!

>+      if (ucs4 < minUcs4) {
>+        ADD_TO_HASHVAL(h, UCS2_REPLACEMENT_CHAR);

Comment that this is an overlong sequence?

>+      else if ( ucs4 <= 0xD7FF ) {
>+        ADD_TO_HASHVAL(h, ucs4);
>+      }
>+      else if ( ucs4 <= 0xDFFF || ucs4 == 0xFFFE || ucs4 == 0xFFFF ) {
>+        // Surrogates and prohibited characters
>+        ADD_TO_HASHVAL(h, UCS2_REPLACEMENT_CHAR);
>+      }

This is fine, but I assume there was a reason to not just test IS_SURROGATE() instead of the 0xD7FF and 0xDFFF business?  This looks like it was more or less copy/pasted from ConvertUTF8toUTF16, right?

>+  // Computes a hashcode for a length number of UTF8
>   // characters. Returns the same hash code as the HashCode method
>-  // taking a |char*| would if the string were converted to UTF8. This
>-  // hash function treats invalid UTF16 data as 0xFFFD (0xEFBFBD in
>+  // taking a |char*| would if the string were converted to UTF16. This

|PRUnichar*|, right?

>+class nsDependentAtomString : public nsDependentString
>+{
>+public:
>+  nsDependentAtomString(nsIAtom* aAtom)
>+    : nsDependentString(aAtom->GetUTF16String(), aAtom->GetLength())

Do we know whether this is cheaper than nsAtomString in practice?  I guess it doesn't have to refcount, but might have to copy depending on what the consumer uses it for...  OK for now.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 40

•

14 years ago

(In reply to comment #39)
> Is there a reason this is in nsLayoutUtils and not nsContentUtils?

IIRC it's mostly (or only) used by layout related code. I don't care at all though, I'm totally fine with putting it in nsContentUtils if you prefer.

> That said, is just adding two pointers going to give good keys?  Would it make
> sense to at least shift one of them or something?

I suspect that shifting away the two/three (or so) lowest bits on at least one pointer would be a good idea.

> >+++ b/layout/base/nsLayoutUtils.cpp
> >+      // They are the same. Check if we're at the end of the strings.
> 
> This assumes that the two strings don't contain embedded nulls, right?  In
> particular, if they do, this could return true for strings that aren't actually
> equal.
> 
> This somewhat worries me, but the old behavior was no different at the
> callsites where this happens...  Add some warnings in the comments, though?  If
> caller doesn't control at least one of the strings, this comparison can lie.
> 
> And honestly, I think at least some callers don't control these strings
> sufficiently.  Maybe a followup patch to just do the comparison on nsStrings?

I think you're right. I'm fine with fixing this in this patch even?

> >+++ b/parser/htmlparser/src/nsHTMLTags.cpp
> >+#ifdef DEBUG && NS_STATIC_ATOM_USE_WIDE_STRINGS
> 
> Shouldn't that be:
> 
>   #if defined(DEBUG) && defined(NS_STATIC_ATOM_USE_WIDE_STRINGS)

Yes! Thanks.

> >+++ b/xpcom/ds/nsAtomTable.cpp
> >@@ -552,16 +559,32 @@ NS_RegisterStaticAtoms(const nsStaticAto
> >       *aAtoms[i].mAtom = atom;
> > 
> >       if (!gStaticAtomTableSealed) {
> >         nsAutoString key;
> >         atom->ToString(key);
> >         gStaticAtomTable->Put(key, atom);
> 
> You can use nsAtomString or nsDependentAtomString there to avoid the extra
> copy, right?

This doesn't actually copy, but it does addref/release the buffer. Will use an nsDependentAtomString.

> >+++ b/xpcom/ds/nsCRT.cpp
> >+PRUint32 nsCRT::HashCodeAsUTF16(const char* start, PRUint32 length)
> >+      else if ((c & 0xF0) == 0xE0) { // 4-byte
> >+        ucs4 = (PRUint32(c) << 18) & 0x001F0000L;
> 
> Those both look wrong to me.  Where did they come from?  I would expect
> something like:
> 
>   else if ((c & 0xF8) == 0xF0) { // 4-byte
>     ucs4 = (PRUint32(c) << 18) & 0x001C0000L;
> 
> Where did your numbers come from?

Indeed. There's some copy'n'paste errors in this. Found and fixed when I wrote tests!

> I have to ask... is there a reason we're reinventing UTF8CharEnumerator here? 
> Especially in a buggy way?

I tried #including nsUTF8Utils.h, but I think I ran into macro magic errors. Though I can't see now what those errors would be. I'll have to re-try.

> If we really do need this separate copy of the code, please get smontagu or
> someone else intl-competent to look at it?  And please write some tests for
> it!

Tests already attached.

> >+class nsDependentAtomString : public nsDependentString
> >+{
> >+public:
> >+  nsDependentAtomString(nsIAtom* aAtom)
> >+    : nsDependentString(aAtom->GetUTF16String(), aAtom->GetLength())
> 
> Do we know whether this is cheaper than nsAtomString in practice?  I guess it
> doesn't have to refcount, but might have to copy depending on what the
> consumer uses it for...  OK for now.

Yeah, this depends entirely on what the consumer does. I guessed that in most cases the string isn't further assigned elsewhere, which is when having a buffer available would be faster.

Ideally nsString would support pointing to an "unowned" buffer. Should be pretty easy to implement. I'll investigate in a follow-up bug.


Anything I didn't reply to, just assume a "Agreed, Will do" comment :)

Mike Shaver (:shaver -- probably not reading bugmail closely)

Comment 41

•

14 years ago

The low 3 bits on most of our pointers are 0 all the time -- probably not a useful contribution to the hash, so shifting them out as a first step seems virtuous.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 42

•

14 years ago

Attached patch Part2 review comment fixes (obsolete) — Details — Splinter Review

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 43

•

14 years ago

Above attachment is an interdiff to address bzs review comments.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 44

•

14 years ago

Attached patch Part2: Use 16bit chars in buffers (updated) — Details — Splinter Review

The resulting Part 2

Attachment #419642 - Attachment is obsolete: true

Attachment #423584 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 45

•

14 years ago

Are you sure the nsAttrValue::Contains stuff only happens in quirks mode?  Note that doing ASCII-only folding may be required for correctness in some cases (e.g. the turkish i/I cases).

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 46

•

14 years ago

(In reply to comment #45)
> Are you sure the nsAttrValue::Contains stuff only happens in quirks mode?  Note
> that doing ASCII-only folding may be required for correctness in some cases
> (e.g. the turkish i/I cases).

No, I don't know that. Do you have a suggestion for what comment to put there instead? I'm fine with simply saying "I don't know why we only do ascii-only folding, but that's what the old code did. Perf maybe?" sine that is case :)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 47

•

14 years ago

> No, I don't know that.

Can you check?  Do we have any callsites other than getElementsByClassName?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 48

•

14 years ago

It appears that the only callers we have are getElementsByClassName and DOMTokenList

http://scotland.proximity.on.ca/dxr/search.cgi?tree=mozilla-central&amp;callers=nsAttrValue%3A%3AContains

(dxr is the roxxors)

So it is in fact true that this is only used in quirks mode.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 49

•

14 years ago

Comment on attachment 423584 [details] [diff] [review]
Part2: Use 16bit chars in buffers (updated)

Lovely.

Attachment #423584 - Flags: review?(bzbarsky) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 50

•

14 years ago

Comment on attachment 423584 [details] [diff] [review]
Part2: Use 16bit chars in buffers (updated)

Given the interface change, this should get an sr too.

FWIW I forgot to change the interface IID. Fixed locally.

Attachment #423584 - Flags: superreview?(mrbkap)

Johnny Stenback (:jst)

Comment 51

•

14 years ago

Comment on attachment 419645 [details] [diff] [review]
Part4: Inline accessors into nsIAtom

- In AtomImpl::AtomImpl(const nsAString& aString):

+  NS_ASSERTION(buf && buf->StorageSize() >= (mLength+1) * 2, "enough storage");

Maybe replace the magic '2' there with sizeof(PRUnichar)...

And given that this patch changes nsIAtom, the uuid needs to be bumped, which I know you've done locally, so just a reminder to make sure it makes it in :)

And also, this patch makes it so that bad things happen if someone is passing a random object from JS as an argument to a method that takes an nsIAtom argument. Granted, that should only ever be doable from chrome, but I'd still argue we should throw in that case. If you file the bug on that, I'll help with the XPConnect details to make that happen.

sr=jst

Attachment #419645 - Flags: superreview?(jst) → superreview+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

14 years ago

Depends on: 543613

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 52

•

14 years ago

Filed bug 543613 on the XPConnect feature.

Johnny Stenback (:jst)

Updated

•

14 years ago

Attachment #419646 - Flags: review?(jst) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

14 years ago

Attachment #423583 - Attachment is obsolete: true

Johnny Stenback (:jst)

Updated

•

14 years ago

Attachment #422899 - Flags: review?(jst) → review+

Johnny Stenback (:jst)

Comment 53

•

14 years ago

Comment on attachment 422901 [details] [diff] [review]
Part7: Atom tests

Looks good.

Attachment #422901 - Flags: review?(jst) → review+

Boris Zbarsky [:bzbarsky]

Reporter

Comment 54

•

14 years ago

Comment on attachment 419643 [details] [diff] [review]
Part3: Take advantage of new API

In nsHTMLContentSink, don't we need to ASCII-lowercase the attr name instead of Unicode-lowercasing?  It should be possible to write a testcase for this (using turkish Is and such).  r- until this is resolved.

Check with hsivonen about the html5 bits?

Attachment #419643 - Flags: review?(bzbarsky) → review-

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 55

•

14 years ago

(In reply to comment #54)
> (From update of attachment 419643 [details] [diff] [review])
> In nsHTMLContentSink, don't we need to ASCII-lowercase the attr name instead of
> Unicode-lowercasing?  It should be possible to write a testcase for this (using
> turkish Is and such).  r- until this is resolved.

Unicode case conversion could be language-dependent or language-independent; I think our ToLowerCase should be language-independent.  However, either of those is different from doing case transformation only within the ASCII range, which may well be what the old code did.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 56

•

14 years ago

I believe ours is language-independent, but it language-independently lowercases İ to i, as I recall.  So for example something like this:

  <span DİR="rtl" dir="ltr">

will set the "dir" attribute to "rtl" with the patch in question applied, but should be setting it to "ltr".  The old code only performed case conversion in the ASCII range, yes.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 57

•

14 years ago

Attached patch Fixes for Part3 (obsolete) — Details — Splinter Review

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 58

•

14 years ago

Attached patch Part3: Take advantage of new API — Details — Splinter Review

Same as before, but with review comment fixed. Interdiff is in attachment 428409 [details] [diff] [review]

Attachment #428410 - Flags: review?(bzbarsky)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 59

•

14 years ago

I should add that this now depends on bug 547850, where nsContentUtils::ASCIIToLower is added.

Depends on: 547850

Boris Zbarsky [:bzbarsky]

Reporter

Comment 60

•

14 years ago

Comment on attachment 428410 [details] [diff] [review]
Part3: Take advantage of new API

r=bzbarsky

Attachment #428410 - Flags: review?(bzbarsky) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

14 years ago

Attachment #428409 - Attachment is obsolete: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

14 years ago

Attachment #419643 - Attachment is obsolete: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

14 years ago

Attachment #419645 - Flags: review?(benjamin) → review?(peterv)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 61

•

14 years ago

Comment on attachment 428410 [details] [diff] [review]
Part3: Take advantage of new API

Henri, can you take a look at the HTML5-parser related changes here.

Attachment #428410 - Flags: review?(hsivonen)

Henri Sivonen (:hsivonen)

Comment 62

•

14 years ago

(In reply to comment #61)
> (From update of attachment 428410 [details] [diff] [review])
> Henri, can you take a look at the HTML5-parser related changes here.

>diff --git a/parser/html/nsHtml5Portability.cpp b/parser/html/nsHtml5Portability.cpp
>--- a/parser/html/nsHtml5Portability.cpp
>+++ b/parser/html/nsHtml5Portability.cpp
>@@ -74,16 +74,17 @@ nsHtml5Portability::newStringFromString(
>   nsString* newStr = new nsString();
>   newStr->Assign(*string);
>   return newStr;
> }
> 
> jArray<PRUnichar,PRInt32>
> nsHtml5Portability::newCharArrayFromLocal(nsIAtom* local)
> {
>+  // XXX what type of atom is this? Is this a real one or a fake HTML5 one?

This always happens to be a real, static one.

>   nsAutoString temp;
>   local->ToString(temp);
>   PRInt32 len = temp.Length();
>   jArray<PRUnichar,PRInt32> arr = jArray<PRUnichar,PRInt32>(len);
>   memcpy(arr, temp.BeginReading(), len * sizeof(PRUnichar));
>   return arr;
> }

>@@ -97,16 +98,17 @@ nsHtml5Portability::newCharArrayFromStri
> }
> 
> nsIAtom*
> nsHtml5Portability::newLocalFromLocal(nsIAtom* local, nsHtml5AtomTable* interner)
> {
>   NS_PRECONDITION(local, "Atom was null.");
>   NS_PRECONDITION(interner, "Atom table was null");
>   if (local->IsStaticAtom()) {
>+    // XXX what type of atom is this? Is this a real one or a fake HTML5 one?

Inside the if, the atom is supposed to be fake, but the if condition is the wrong way round (and apparently copied and pasted the wrong way twice below). It should be if(!local->IsStaticAtom()). I'm very surprised to find I've accidentally omitted the negation here and even more surprised that there have been no crashers filed about it.

>     nsAutoString str;
>     local->ToString(str);
>     local = interner->GetAtom(str);
>   }
>   return local;
> }
> 
> void
>diff --git a/parser/html/nsHtml5ReleasableAttributeName.cpp b/parser/html/nsHtml5ReleasableAttributeName.cpp
>--- a/parser/html/nsHtml5ReleasableAttributeName.cpp
>+++ b/parser/html/nsHtml5ReleasableAttributeName.cpp
>@@ -45,19 +45,17 @@ nsHtml5ReleasableAttributeName::nsHtml5R
> }
> 
> nsHtml5AttributeName*
> nsHtml5ReleasableAttributeName::cloneAttributeName(nsHtml5AtomTable* aInterner)
> {
>   nsIAtom* l = getLocal(0);
>   if (aInterner) {
>     if (l->IsStaticAtom()) {
>-      nsAutoString str;
>-      l->ToString(str);
>-      l = aInterner->GetAtom(str);
>+      l = aInterner->GetAtom(nsDependentAtomString(l));

This can't use nsDependentAtomString, because if (l->IsStaticAtom()) is bogus and should be if (!l->IsStaticAtom()).

>     }
>   }
>   return new nsHtml5ReleasableAttributeName(nsHtml5AttributeName::ALL_NO_NS, 
>                                             nsHtml5AttributeName::SAME_LOCAL(l), 
>                                             nsHtml5AttributeName::ALL_NO_PREFIX);
> }
> 
> void
>diff --git a/parser/html/nsHtml5ReleasableElementName.cpp b/parser/html/nsHtml5ReleasableElementName.cpp
>--- a/parser/html/nsHtml5ReleasableElementName.cpp
>+++ b/parser/html/nsHtml5ReleasableElementName.cpp
>@@ -49,15 +49,13 @@ nsHtml5ReleasableElementName::release()
> }
> 
> nsHtml5ElementName* 
> nsHtml5ReleasableElementName::cloneElementName(nsHtml5AtomTable* aInterner)
> {
>   nsIAtom* l = name;
>   if (aInterner) {
>     if (l->IsStaticAtom()) {
>-      nsAutoString str;
>-      l->ToString(str);
>-      l = aInterner->GetAtom(str);
>+      l = aInterner->GetAtom(nsDependentAtomString(l));

This can't use nsDependentAtomString, either, because if (l->IsStaticAtom()) is bogus and should be if (!l->IsStaticAtom()). 

>     }
>   }
>   return new nsHtml5ReleasableElementName(l);
> }
>diff --git a/parser/html/nsHtml5TreeOperation.h b/parser/html/nsHtml5TreeOperation.h
>--- a/parser/html/nsHtml5TreeOperation.h
>+++ b/parser/html/nsHtml5TreeOperation.h
>@@ -297,16 +297,17 @@ class nsHtml5TreeOperation {
>     }
> 
>     nsresult Perform(nsHtml5TreeOpExecutor* aBuilder, nsIContent** aScriptElement);
> 
>     inline already_AddRefed<nsIAtom> Reget(nsIAtom* aAtom) {
>       if (!aAtom || aAtom->IsStaticAtom()) {
>         return aAtom;
>       }
>+      // XXX This appears to be a fake HTML5 atom, so can't use nsAtomString

Correct.

>       nsAutoString str;
>       aAtom->ToString(str);
>       return do_GetAtom(str);
>     }
> 
>   private:
> 
>     nsresult AppendTextToTextNode(PRUnichar* aBuffer,

Henri Sivonen (:hsivonen)

Comment 63

•

14 years ago

(In reply to comment #62)
> >+      // XXX This appears to be a fake HTML5 atom, so can't use nsAtomString

Actually, the fake atoms could use nsDependentAtomString, too, if the fake atoms implemented GetUTF16String() and GetLength().

Henri Sivonen (:hsivonen)

Updated

•

14 years ago

Blocks: 547928

Henri Sivonen (:hsivonen)

Updated

•

14 years ago

No longer blocks: 547928

Peter Van der Beken [:peterv]

Comment 64

•

14 years ago

Comment on attachment 419645 [details] [diff] [review]
Part4: Inline accessors into nsIAtom

> diff --git a/parser/html/nsHtml5Atom.cpp b/parser/html/nsHtml5Atom.cpp

>  nsHtml5Atom::nsHtml5Atom(const nsAString& aString)

> +  }

Do we want the same assertions here as the ones you added in AtomImpl?

>  }

> diff --git a/xpcom/ds/nsAtomTable.cpp b/xpcom/ds/nsAtomTable.cpp

> +  NS_ASSERTION(buf && buf->StorageSize() >= (mLength+1) * 2, "enough storage");

Spaces around +
s/2/sizeof(PRUnichar)/

> +  NS_ASSERTION(aStringBuffer && aStringBuffer->StorageSize() == (mLength+1) * 2, "correct storage");

See above, also long line.

Attachment #419645 - Flags: review?(peterv) → review+

Blake Kaplan (:mrbkap) (inactive)

Comment 65

•

14 years ago

Comment on attachment 423584 [details] [diff] [review]
Part2: Use 16bit chars in buffers (updated)

sr=mrbkap with the missing iid bump fixed.

Attachment #423584 - Flags: superreview?(mrbkap) → superreview+

Benjamin Smedberg

Comment 66

•

14 years ago

Is there a reason nsFakeStringBuffer couldn't include nsStringBuffer as the first member, with a little friendship, e.g.

template<size N>
struct nsStaticAtomStringBuffer
{
  nsStringBuffer mStringBuffer;
  char mStringData[N];
};

The initialization would then be, I think:

static nsStaticAtomStringBuffer<sizeof(str_data)> buffer_name = {{1, sizeof(str_data)}, str_data};

If this really isn't possible, you should use static assertions that the sizeof and offsetof the members of nsFakeStringBuffer match nsStringBuffer.

Perhaps more importantly, though, I thought I understood that when NS_RegisterStaticAtoms is called, the buffer was going to be ASCII but have room to store a null-expanded version of the string, and that NS_RegisterStaticAtoms would do the null-expansion. But it appears that length=sizeof("literal string") for nsFakeStringBuffer<length>, which doesn't leave enough room for the expansion.

I may be totally confused about that point, in which case I need more/better comments in nsStaticAtom.h explaining what actually goes on.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 67

•

14 years ago

(In reply to comment #66)
> Is there a reason nsFakeStringBuffer couldn't include nsStringBuffer as the
> first member, with a little friendship, e.g.
> 
> template<size N>
> struct nsStaticAtomStringBuffer
> {
>   nsStringBuffer mStringBuffer;
>   char mStringData[N];
> };
> 
> The initialization would then be, I think:
> 
> static nsStaticAtomStringBuffer<sizeof(str_data)> buffer_name = {{1,
> sizeof(str_data)}, str_data};

I believe that this requires the members in nsStringBuffer to be public. I don't think there is a way to friend a global initializer. Suggestions welcome.

> If this really isn't possible, you should use static assertions that the
> sizeof and offsetof the members of nsFakeStringBuffer match nsStringBuffer.

That is a good idea.

> Perhaps more importantly, though, I thought I understood that when
> NS_RegisterStaticAtoms is called, the buffer was going to be ASCII but have
> room to store a null-expanded version of the string, and that
> NS_RegisterStaticAtoms would do the null-expansion. But it appears that
> length=sizeof("literal string") for nsFakeStringBuffer<length>, which doesn't
> leave enough room for the expansion.
> 
> I may be totally confused about that point, in which case I need more/better
> comments in nsStaticAtom.h explaining what actually goes on.

The "Part 1" patch here does not make us use 16bit atoms. That happens in the "Part 2" patch. There I use the NS_L macro to expand the string at compile time, when that is available. When it's not available we fall back to a slow path where we allocate string buffers on startup. Shouldn't be a big deal since all our primary platforms support expanding NS_L.

See the nsStaticAtom.h changes in attachment 423584 [details] [diff] [review].

Benjamin Smedberg

Comment 68

•

14 years ago

Comment on attachment 419641 [details] [diff] [review]
Part1: Use nsStringBuffers 

r=me with static assertions

Attachment #419641 - Flags: review?(benjamin) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 69

•

14 years ago

Benjamin: Just to be clear, does your r+ cover all of the "Part 1" spec, or just the nsStaticAtom.h changes? I.e. would you like me to get someone to review the rest of the patch?

Benjamin Smedberg

Comment 70

•

14 years ago

I did not do a line-by-line review of the entire patch, but I did skim it and it looked like a pretty mechanical sort of change. I don't think another review is necessary unless you do.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 71

•

14 years ago

So question....  Is it ok that we're assuming nsStringBuffer::Alloc will never return null?

Henri Sivonen (:hsivonen)

Updated

•

14 years ago

Attachment #428410 - Flags: review?(hsivonen) → review+

Henri Sivonen (:hsivonen)

Comment 72

•

14 years ago

Comment on attachment 428410 [details] [diff] [review]
Part3: Take advantage of new API

Per Tuesday's telecon, marking r+ in the sense that r+ from me for the nsHtml5 parts provided that comment 62 or comment 63 is addressed.

Mark Banner (:standard8)

Updated

•

14 years ago

Depends on: 550980

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 73

•

14 years ago

Checked in!!!

I ended up removing the HTML5-parser changes, I wasn't 100% sure what to do everywhere, and there's no harm in waiting to make those changes until later. Might even be easier then given the changes in part 4.

So here are the parts, in order:

1: http://hg.mozilla.org/mozilla-central/rev/5b152f233117
2: http://hg.mozilla.org/mozilla-central/rev/cb17f4d92942
3: http://hg.mozilla.org/mozilla-central/rev/1995edaefd3f
4: http://hg.mozilla.org/mozilla-central/rev/5901c6b98f83
5: http://hg.mozilla.org/mozilla-central/rev/62bc77046ac9
6: http://hg.mozilla.org/mozilla-central/rev/9cb12f3ff156
7: http://hg.mozilla.org/mozilla-central/rev/f8dd7b5b02ef

Additionally, I forgot to actually fix petervs review comments before checking in. So landed those separately in
http://hg.mozilla.org/mozilla-central/rev/a5de049fc2fb

Thanks everyone for reviewing!

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 74

•

14 years ago

(for what it's worth, i don't think this is a blocker, so marking minus)

blocking2.0: ? → -

Mark Banner (:standard8)

Comment 75

•

14 years ago

I checked in a bustage fix for the TestStaticAtoms test (which only runs on non-libxul builds), with r=sicking:

http://hg.mozilla.org/mozilla-central/rev/dcfcfca09b45

Serge Gautherie (:sgautherie)

Comment 76

•

14 years ago

TestUTF fails too on SeaMonkey: see bug 550980 comment 9.

Depends on: 549100

Keywords: assertion

Mark Banner (:standard8)

Updated

•

14 years ago

Depends on: 551153

Mark Banner (:standard8)

Comment 77

•

14 years ago

(In reply to comment #76)
> TestUTF fails too on SeaMonkey: see bug 550980 comment 9.

Filed bug 551153 as originally suggested.

No longer depends on: 549100

Keywords: assertion

Serge Gautherie (:sgautherie)

Updated

•

14 years ago

Flags: in-testsuite+

Target Milestone: --- → mozilla1.9.3a3

Daniel Holbert [:dholbert]

Updated

•

14 years ago

Depends on: 551397

Wolfgang Rosenauer [:wolfiR]

Comment 78

•

14 years ago

Build fails in
http://mxr.mozilla.org/mozilla-central/source/gfx/thebes/src/gfxFT2Fonts.cpp#667
if MOZ_PANGO isn't enabled.
That's probably a rather unsupported build configuration and I only ran into it by accident but anyway something is wrong if it's possible to trigger a build error in that case.

Serge Gautherie (:sgautherie)

Updated

•

14 years ago

Depends on: 551129

Brad Lassey [:blassey] (use needinfo?)

Comment 79

•

14 years ago

Attached patch patch for winmo debug build — Details — Splinter Review

Attachment #431659 - Flags: review?

Brad Lassey [:blassey] (use needinfo?)

Updated

•

14 years ago

Attachment #431659 - Flags: review? → review?(jonas)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 80

•

14 years ago

Comment on attachment 431659 [details] [diff] [review]
patch for winmo debug build

Please use

nsAtomCString(langGroup).get()

instead That will remove the need for the #ifdef.

r=me with that.

Attachment #431659 - Flags: review?(jonas) → review+

Daniel Holbert [:dholbert]

Updated

•

14 years ago

Depends on: 551736

Daniel Holbert [:dholbert]

Updated

•

14 years ago

Depends on: 556005

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

14 years ago

Blocks: 307832

You need to log in before you can comment on or make changes to this bug.