Closed Bug 192139 Opened 22 years ago Closed 20 years ago

Integrate latest Expat

Categories

(Core :: XML, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla1.7alpha

People

(Reporter: hjtoi-bugzilla, Assigned: peterv)

References

()

Details

Attachments

(10 files, 3 obsolete files)

56.15 KB, patch
Details | Diff | Splinter Review
3.72 KB, patch
axel
: review+
Details | Diff | Splinter Review
1.71 KB, patch
axel
: review+
Details | Diff | Splinter Review
2.16 KB, patch
sicking
: review+
peterv
: superreview+
Details | Diff | Splinter Review
54.20 KB, patch
Details | Diff | Splinter Review
37.94 KB, patch
axel
: review+
Details | Diff | Splinter Review
488.93 KB, patch
Details | Diff | Splinter Review
35.72 KB, patch
Details | Diff | Splinter Review
19.65 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
97.04 KB, patch
bzbarsky
: review+
Details | Diff | Splinter Review
Our current Expat is very old. We should take advantage of the work the Expat maintainers have done over the years and switch to the latest version of it. It includes many correctness, stability and performance fixes. While doing the integration, we should pay attention to what we need to change in Expat, and if possible make the changes in such a way that they can be integrated with the mainline Expat. The hope is that in the future we could just upgrade our Expat whenever we wanted without any changes in it.
The new expat should be placed in mozilla/modules/expat I think, to follow the convention. We could then get rid of the toplevel mozilla/expat dir...
If the expat functions will be placed as a module, will it be possible to have severall XML parsers ? It seems LibXML (under GNU/Linux) is much more powerful than expat, according to this site : http://xmlsoft.org/ and to many people I spoke with. Could we imagine Mozilla using it one day, or giving the possibility to chose our own XML parser ?
We have thought about libxml but it seems too hard to integrate to Mozilla for now.
libxml doesn't seem to be outperforming expat in parsing to SAX. Which is close to what mozilla needs. Given that expat is now maintained differently that it was back in the days, (Clark Cooper (still?) et al. on sourceforge.net vs. James Clark), I wonder if it's reasonable to get expat relicensed again. If it's not, should a new version of expat go into other-licenses or would the expat license be ok for it to go to modules/expat? (see http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/expat/expat/COPYING?rev=1.2&content-type=text/vnd.viewcvs-markup) (One of the changes that I'd like to see would be return values for start and end element handlers so that we can handle errors gracefully. Comments on the expat mailing list on this mention "just use a longjmp", that's evil.)
bug 119934 mentions link problems with expat and asks for namespaced/prefixed symbols for the shipped expat.
One person on the Expat team that has been very responsive to my emails has been Karl Waclawek. I think he would be a good person to initiate contact with once there is a specific plan(s) on our end. http://sourceforge.net/project/memberlist.php?group_id=10127
The Expat license linked to by Axel is fine. We can keep it in the main part of the tree. Problems only arise if we have a license (e.g. straight LGPL, like libart) which has terms over and above those of the MPL. This BSD-like license has no such terms, and so is fine in the tree - like libjpeg, libpng and so on. I've CCed myself in case there's licensing discussion; please de-CC me if there isn't. :-) Gerv
Taking, I have this mostly working.
Assignee: hjtoi-bugzilla → peterv
Priority: -- → P3
Target Milestone: --- → mozilla1.7alpha
This contains a bunch of changes and cleanup in nsExpatDriver that I'll probably separate into several patches. It does not include changes to mozilla/allmakefiles.sh and mozilla/build/unix/modules.mk (patch would be too big). Still to be done: fix Transformiix standalone and the activex plugin.
Attached patch Patch 1 v1Splinter Review
This removes some changes in API that we made but that have no effect inside expat since nothing checks the return value of the handlers. If we ever want to make expat care about return values of the handlers we can add this back.
Attachment #134513 - Flags: superreview?(hjtoi-bugzilla)
Attachment #134513 - Flags: review?(axel)
Comment on attachment 134513 [details] [diff] [review] Patch 1 v1 I'd prefer to see you fixing txXMLParser.cpp as well. fixup the indenting in xmlparse.h?
The indenting in xmlparse.h is the same as in the standard distribution that it was based on so I'd rather not touch it.
Attachment #134699 - Flags: review?(axel)
Attachment #134699 - Flags: review?(axel) → review+
Attachment #134513 - Flags: review?(axel) → review+
In the initial comment of bug 178806, darin says ... but the plan is ultimately to move it out of modules/ since that is a deprecated part of the tree. Not sure since when and why and if still, but I thought I'd add this here. (The comment was a year ago.)
peterv: XML 1.1 is currently a W3C Proposed Recommendation. Our parser currently chokes on <?xml version='1.1' ?>. This isn't too surprising, but would support for XML 1.1 be possible with the latest expat?
I don't know, you'd have to check with the Expat folks. But I won't be changing the version check in this bug anyway, this bug is just about switching to the new version of Expat.
Attachment #134513 - Flags: superreview?(hjtoi-bugzilla) → superreview?(jst)
Comment on attachment 134513 [details] [diff] [review] Patch 1 v1 sr=jst
Attachment #134513 - Flags: superreview?(jst) → superreview+
Comment on attachment 134699 [details] [diff] [review] Patch 1 v1 (transformiix standalone changes) the NS_ENSURE_TRUE(aUserData, XML_ERROR_NONE); still left in the tree are busting standalone, patch coming up
Attachment #139210 - Flags: superreview?(peterv)
Attachment #139210 - Flags: review?(bugmail)
Comment on attachment 139210 [details] [diff] [review] create own macro to check and warn about aUserData r=me another way to do it is to #define TX_EMPTY and NS_ENSURE_TRUE(aUserData, TX_EMPTY);
Attachment #139210 - Flags: review?(bugmail) → review+
Attachment #139210 - Flags: superreview?(peterv) → superreview+
Comment on attachment 139210 [details] [diff] [review] create own macro to check and warn about aUserData checked in, thanx for the reviews.
Summary: Intergrate latest Expat → Integrate latest Expat
Depends on: 235747
Did this fix bug 177780?
Attachment #147802 - Flags: superreview?(jst)
Attachment #147802 - Flags: review?(axel)
Comment on attachment 147802 [details] [diff] [review] htmlparser cleanup (diff -w) it's a girl. and it's a cutie. The comment in LookupCatalogData still has "aDTD is an in/out parameter." which is no more, right? other than that, r=me
Attachment #147802 - Flags: review?(axel) → review+
Comment on attachment 147802 [details] [diff] [review] htmlparser cleanup (diff -w) - In nsExpatDriver::HandleEndDoctypeDecl(): + mInternalState = mSink->HandleDoctypeDecl(nsString(), name, systemId, + publicId, data); EmptyString()? sr=jst
Attachment #147802 - Flags: superreview?(jst) → superreview+
Attachment #134200 - Attachment is obsolete: true
The bad news is that the new version of Expat adds about 9k to the binary size on my optimized OS X build. Not sure what we want to do, I've trimmed some of the public APIs that we don't use. Beyond that, it seems there's code in Expat that we don't need, but it's all very intertwined with the rest of the code so removing it is a) going to be a lot of work and b) will hamper future upgrades.
IMHO it's a nobrainer, another 9k is a small price to pay for keeping up to date with the expat tip.
Attached patch v1 (sink changes) (obsolete) — Splinter Review
Attachment #148301 - Attachment is obsolete: true
Comment on attachment 160666 [details] [diff] [review] v1 (sink changes) - In nsContentUtils::SplitExpatName(): + nsDependentString expatName(aExpatName); + nsReadingIterator<PRUnichar> start, end; + expatName.BeginReading(start); + expatName.EndReading(end); + + nsReadingIterator<PRUnichar> uriEnd(start); + if (FindCharInReadable('!', uriEnd, end)) { + sNameSpaceManager->RegisterNameSpace(nsDependentSubstring(start, uriEnd), + *aNameSpaceID); + + nsReadingIterator<PRUnichar> localEnd(++uriEnd); + PRBool havePrefix = FindCharInReadable('!', localEnd, end); + + *aLocalName = NS_NewAtom(nsDependentSubstring(uriEnd, localEnd)); + if (havePrefix) { + *aPrefix = NS_NewAtom(nsDependentSubstring(++localEnd, end)); + } + else { + *aPrefix = nsnull; + } + } + else { + *aNameSpaceID = kNameSpaceID_None; + *aPrefix = nsnull; + *aLocalName = NS_NewAtom(expatName); + } +} That could probably be made a bit faster (at the cost of a bit more code, maybe) if you didn't use nsDependentString there since that ends up looking up the end of the string, which you don't need to do there since you can do that while you're looping through the string looking for the '!'.
Attachment #160666 - Attachment is obsolete: true
Attachment #161917 - Flags: superreview?(jst)
Attachment #161917 - Flags: review?(bzbarsky)
Comment on attachment 160670 [details] [diff] [review] Diff with stock Expat Not really sure how to do this. These are the changes that I ported from the version of Expat in the Mozilla tree to a newer version of the stock Expat. Attachment 160668 [details] [diff] contains a patch to the Mozilla trunk.
Attachment #160670 - Flags: superreview?(jst)
Attachment #160670 - Flags: review?(bzbarsky)
Comment on attachment 161917 [details] [diff] [review] v1.1 (sink changes) >Index: content/base/src/nsContentUtils.cpp >+ /** >+ * Expat can send the following: >+ * localName >+ * namespaceURI!localName >+ * namespaceURI!localName!prefix >+ */ This bothers me. A lot. URI's are allowed to contain the '!' char. Does expat do anything about the case when namespaceURI contains that char already? If not, we have a problem here.... Can we make expat use a different char here? >Index: content/base/src/nsNameSpaceManager.cpp > NameSpaceImpl::CreateChildNameSpace(nsIAtom* aPrefix, const nsAString& aURI, >+ nsCOMPtr<nsINameSpace> child = new NameSpaceImpl(this, aPrefix, id); >+ NS_ENSURE_TRUE(child, NS_ERROR_OUT_OF_MEMORY); >+ child.swap(*aChildNameSpace); Is there a need for the nsCOMPtr? You're not passing it anywhere... I'd think that: *aChildNameSpace = new NameSpaceImpl(this, aPrefix, id); NS_ENSURE_TRUE(*aChildNameSpace, NS_ERROR_OUT_OF_MEMORY); NS_ADDREF(*aChildNameSpace); would generate smaller and faster code, no? I'm up through the beginning of XMLUtils.cpp changes.
I didn't think we used expats namespace parsing so that should be a non-issue?
(In reply to comment #40) > I didn't think we used expats namespace parsing so that should be a non-issue? IIRC, that's exactly what this patch changes.
Comment on attachment 161917 [details] [diff] [review] v1.1 (sink changes) >Index: extensions/transformiix/source/xml/XMLUtils.cpp >+XMLUtils::splitExpatName(const PRUnichar *aExpatName, nsIAtom **aPrefix, Is there a reason this uses iterators while nsContentUtils::SplitExpatName uses raw ptrs? The two algorithms should really be the same, I would think. >Index: parser/htmlparser/src/nsExpatDriver.cpp >+ static const PRUnichar kExcl[] = { '!', '\0' }; Ah, I see. This is where we tell expat what char to use for the separator, eh? How about using something nice and invalid in URIs like 0x1 or something instead of '!'? I'm still not sure what expat will do if someone sticks that in a namespace URI (via &0x1;, say), but that would be someone _really_ trying to be malicious.... >+ mExpatParser = XML_ParserCreate_MM(kUTF16, &memsuite, kExcl); Reading the documentation on this function, I don't quite like the sound of this: When a namespace is not declared, the name and prefix will be passed through without expansion. Does that apply only to the case when the separator is '\0', or in general? That is, have you tested this case with your patch? Hopefully, it's the former... >Index: parser/htmlparser/src/xmlparser.properties >+27 = unbound prefix How about "prefix not bound to a namespace"? >Index: rdf/base/src/nsRDFContentSink.cpp >+ const nsDependentSubstring SplitExpatName(const PRUnichar *aExpatName, >+ nsIAtom **aLocalName, >+ const PRUnichar* aValue = nsnull); Document this function, please? r=bzbarsky with those issues addressed.
Attachment #161917 - Flags: review?(bzbarsky) → review+
Comment on attachment 160670 [details] [diff] [review] Diff with stock Expat >+++ parser/expat/lib/expat.h Wed Sep 29 21:23:31 2004 >+/* BEGIN MOZILLA CHANGE (typedef XML_Char to PRUnichar) */ So I'm not seeing where that typedef actually happens... did I miss something? > +/* BEGIN MOZILLA CHANGE (blocking parser) */ > +XMLPARSEAPI(enum XML_Status) > +MOZ_XML_StopParser(XML_Parser parser, XML_Bool resumable); Second arg is unused; why is it there? > +/* BEGIN MOZILLA CHANGE (unused API) */ > +#if 0 > XML_Bool XMLCALL > XML_ParserReset(XML_Parser parser, const XML_Char *encodingName) I'd be curious to see the performance effect of actually recycling parsers, but not this patch, I guess. ;) >+/* BEGIN MOZILLA CHANGE (Include xmlns attributes in attributes array) */ >+ s += 5; Maybe |sizeof(xmlnsPrefix)-1| or something? That would be less magical... At least, comment where the 5 comes from. The rest looks ok, as far as I can tell... r=bzbarsky with the above comments addressed.
Attachment #160670 - Flags: review?(bzbarsky) → review+
Comment on attachment 160670 [details] [diff] [review] Diff with stock Expat sr=jst
Attachment #160670 - Flags: superreview?(jst) → superreview+
Comment on attachment 161917 [details] [diff] [review] v1.1 (sink changes) sr=jst with bz's comments addressed.
Attachment #161917 - Flags: superreview?(jst) → superreview+
(In reply to comment #39) > (From update of attachment 161917 [details] [diff] [review]) > >Index: content/base/src/nsContentUtils.cpp > >+ /** > >+ * Expat can send the following: > >+ * localName > >+ * namespaceURI!localName > >+ * namespaceURI!localName!prefix > >+ */ > > This bothers me. A lot. URI's are allowed to contain the '!' char. Does > expat do anything about the case when namespaceURI contains that char already? > If not, we have a problem here.... Can we make expat use a different char here? Yes, I've chosen 0xFFFF as the separator. > >Index: content/base/src/nsNameSpaceManager.cpp > > > NameSpaceImpl::CreateChildNameSpace(nsIAtom* aPrefix, const nsAString& aURI, > Is there a need for the nsCOMPtr? You're not passing it anywhere... I'd think > that: > > *aChildNameSpace = new NameSpaceImpl(this, aPrefix, id); > NS_ENSURE_TRUE(*aChildNameSpace, NS_ERROR_OUT_OF_MEMORY); > NS_ADDREF(*aChildNameSpace); > > would generate smaller and faster code, no? Done. (In reply to comment #42) > (From update of attachment 161917 [details] [diff] [review]) > >Index: extensions/transformiix/source/xml/XMLUtils.cpp > > >+XMLUtils::splitExpatName(const PRUnichar *aExpatName, nsIAtom **aPrefix, > > Is there a reason this uses iterators while nsContentUtils::SplitExpatName uses > raw ptrs? The two algorithms should really be the same, I would think. Done. > >+ mExpatParser = XML_ParserCreate_MM(kUTF16, &memsuite, kExcl); > > Reading the documentation on this function, I don't quite like the sound of > this: > > When a namespace is not declared, the name and prefix will be > passed through without expansion. > > Does that apply only to the case when the separator is '\0', or in general? > That is, have you tested this case with your patch? Hopefully, it's the > former... The documentation is outdated (I asked on the expat list). Unmapped prefixes will return an error and we'll bail out. > >Index: parser/htmlparser/src/xmlparser.properties > > >+27 = unbound prefix > > How about "prefix not bound to a namespace"? Done. > >Index: rdf/base/src/nsRDFContentSink.cpp > >+ const nsDependentSubstring SplitExpatName(const PRUnichar *aExpatName, > >+ nsIAtom **aLocalName, > >+ const PRUnichar* aValue = nsnull); > > Document this function, please? Done. /** * Extracts the localname from aExpatName, the name that the Expat parser * passes us. aValue should contain a namespace URI if we need to register * a new prefix-to-namespace-URI mapping when aExpatName declares a * namespace (namespace URI in aExpatName being * http://www.w3.org/2000/xmlns/). aValue is typically the value of an * xmlns attribute. * aLocalName will contain the localname in aExpatName and the returned * value will be the namespace URI. */ (In reply to comment #43) > (From update of attachment 160670 [details] [diff] [review]) > >+++ parser/expat/lib/expat.h Wed Sep 29 21:23:31 2004 > > >+/* BEGIN MOZILLA CHANGE (typedef XML_Char to PRUnichar) */ > > So I'm not seeing where that typedef actually happens... did I miss something? In parser/expat/expat_config.h: +#define XML_UNICODE +typedef PRUnichar XML_Char; +typedef char XML_LChar; +#define XML_T(x) (PRUnichar)x +#define XML_L(x) x > > +/* BEGIN MOZILLA CHANGE (blocking parser) */ > > +XMLPARSEAPI(enum XML_Status) > > +MOZ_XML_StopParser(XML_Parser parser, XML_Bool resumable); > > Second arg is unused; why is it there? Preparing for Expat 1.95.8, which has a blocking API with this signature. > >+/* BEGIN MOZILLA CHANGE (Include xmlns attributes in attributes array) */ > >+ s += 5; > > Maybe |sizeof(xmlnsPrefix)-1| or something? That would be less magical... At > least, comment where the 5 comes from. I used + s += sizeof(xmlnsPrefix) / sizeof(xmlnsPrefix[0]) - 1;
> + s += sizeof(xmlnsPrefix) / sizeof(xmlnsPrefix[0]) - 1; would NS_ARRAY_LENGTH be clearer?
(In reply to comment #47) > > + s += sizeof(xmlnsPrefix) / sizeof(xmlnsPrefix[0]) - 1; > > would NS_ARRAY_LENGTH be clearer? If it were Mozilla code, yes. I'm not going to use that macro in Expat code.
Checked in. This did increase codesize by about 4-5k. I've trimmed off any trivial bloat, so we'll have to live with that for now. I will try to reduce Expat's size further in the future, but it's not going to be easy.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
I think that the patch for this bug introduced bug 276434, can one of you have a look at it, thank you.
Depends on: 313278
Depends on: 322570
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: