Closed
Bug 192139
Opened 22 years ago
Closed 20 years ago
Integrate latest Expat
Categories
(Core :: XML, defect, P3)
Core
XML
Tracking
()
RESOLVED
FIXED
mozilla1.7alpha
People
(Reporter: hjtoi-bugzilla, Assigned: peterv)
References
()
Details
Attachments
(10 files, 3 obsolete files)
56.15 KB,
patch
|
Details | Diff | Splinter Review | |
3.72 KB,
patch
|
axel
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
1.71 KB,
patch
|
axel
:
review+
|
Details | Diff | Splinter Review |
2.16 KB,
patch
|
sicking
:
review+
peterv
:
superreview+
|
Details | Diff | Splinter Review |
54.20 KB,
patch
|
Details | Diff | Splinter Review | |
37.94 KB,
patch
|
axel
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
488.93 KB,
patch
|
Details | Diff | Splinter Review | |
35.72 KB,
patch
|
Details | Diff | Splinter Review | |
19.65 KB,
patch
|
bzbarsky
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
97.04 KB,
patch
|
bzbarsky
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
Our current Expat is very old. We should take advantage of the work the Expat
maintainers have done over the years and switch to the latest version of it. It
includes many correctness, stability and performance fixes.
While doing the integration, we should pay attention to what we need to change
in Expat, and if possible make the changes in such a way that they can be
integrated with the mainline Expat. The hope is that in the future we could just
upgrade our Expat whenever we wanted without any changes in it.
Reporter | ||
Comment 1•22 years ago
|
||
The new expat should be placed in mozilla/modules/expat I think, to follow the
convention. We could then get rid of the toplevel mozilla/expat dir...
If the expat functions will be placed as a module, will it be possible to have
severall XML parsers ?
It seems LibXML (under GNU/Linux) is much more powerful than expat, according to
this site : http://xmlsoft.org/ and to many people I spoke with. Could we
imagine Mozilla using it one day, or giving the possibility to chose our own XML
parser ?
Reporter | ||
Comment 3•22 years ago
|
||
We have thought about libxml but it seems too hard to integrate to Mozilla for now.
Comment 4•21 years ago
|
||
libxml doesn't seem to be outperforming expat in parsing to SAX. Which is close
to what mozilla needs.
Given that expat is now maintained differently that it was back in the days,
(Clark Cooper (still?) et al. on sourceforge.net vs. James Clark), I wonder if
it's reasonable to get expat relicensed again. If it's not, should a new version
of expat go into other-licenses or would the expat license be ok for it to go
to modules/expat?
(see
http://cvs.sourceforge.net/cgi-bin/viewcvs.cgi/expat/expat/COPYING?rev=1.2&content-type=text/vnd.viewcvs-markup)
(One of the changes that I'd like to see would be return values for start and
end element handlers so that we can handle errors gracefully. Comments on the
expat mailing list on this mention "just use a longjmp", that's evil.)
Comment 5•21 years ago
|
||
bug 119934 mentions link problems with expat and asks for namespaced/prefixed
symbols for the shipped expat.
Reporter | ||
Comment 6•21 years ago
|
||
One person on the Expat team that has been very responsive to my emails has been
Karl Waclawek. I think he would be a good person to initiate contact with once
there is a specific plan(s) on our end.
http://sourceforge.net/project/memberlist.php?group_id=10127
Comment 7•21 years ago
|
||
The Expat license linked to by Axel is fine. We can keep it in the main part of
the tree.
Problems only arise if we have a license (e.g. straight LGPL, like libart) which
has terms over and above those of the MPL. This BSD-like license has no such
terms, and so is fine in the tree - like libjpeg, libpng and so on.
I've CCed myself in case there's licensing discussion; please de-CC me if there
isn't. :-)
Gerv
Assignee | ||
Comment 8•21 years ago
|
||
Taking, I have this mostly working.
Assignee: hjtoi-bugzilla → peterv
Priority: -- → P3
Target Milestone: --- → mozilla1.7alpha
Assignee | ||
Comment 9•21 years ago
|
||
Assignee | ||
Comment 10•21 years ago
|
||
This contains a bunch of changes and cleanup in nsExpatDriver that I'll
probably separate into several patches. It does not include changes to
mozilla/allmakefiles.sh and mozilla/build/unix/modules.mk (patch would be too
big). Still to be done: fix Transformiix standalone and the activex plugin.
Assignee | ||
Comment 11•21 years ago
|
||
This removes some changes in API that we made but that have no effect inside
expat since nothing checks the return value of the handlers. If we ever want to
make expat care about return values of the handlers we can add this back.
Assignee | ||
Updated•21 years ago
|
Attachment #134513 -
Flags: superreview?(hjtoi-bugzilla)
Attachment #134513 -
Flags: review?(axel)
Comment 12•21 years ago
|
||
Comment on attachment 134513 [details] [diff] [review]
Patch 1 v1
I'd prefer to see you fixing txXMLParser.cpp as well.
fixup the indenting in xmlparse.h?
Assignee | ||
Comment 13•21 years ago
|
||
The indenting in xmlparse.h is the same as in the standard distribution that it
was based on so I'd rather not touch it.
Assignee | ||
Comment 14•21 years ago
|
||
Assignee | ||
Updated•21 years ago
|
Attachment #134699 -
Flags: review?(axel)
Updated•21 years ago
|
Attachment #134699 -
Flags: review?(axel) → review+
Updated•21 years ago
|
Attachment #134513 -
Flags: review?(axel) → review+
Comment 15•21 years ago
|
||
In the initial comment of bug 178806, darin says
... but the plan is ultimately to move it out of modules/ since that is
a deprecated part of the tree.
Not sure since when and why and if still, but I thought I'd add this here.
(The comment was a year ago.)
Comment 16•21 years ago
|
||
peterv: XML 1.1 is currently a W3C Proposed Recommendation. Our parser
currently chokes on <?xml version='1.1' ?>. This isn't too surprising, but
would support for XML 1.1 be possible with the latest expat?
Assignee | ||
Comment 17•21 years ago
|
||
I don't know, you'd have to check with the Expat folks. But I won't be changing
the version check in this bug anyway, this bug is just about switching to the
new version of Expat.
Assignee | ||
Updated•21 years ago
|
Attachment #134513 -
Flags: superreview?(hjtoi-bugzilla) → superreview?(jst)
Comment 18•21 years ago
|
||
Comment on attachment 134513 [details] [diff] [review]
Patch 1 v1
sr=jst
Attachment #134513 -
Flags: superreview?(jst) → superreview+
Assignee | ||
Comment 19•21 years ago
|
||
Comment 20•21 years ago
|
||
Comment on attachment 134699 [details] [diff] [review]
Patch 1 v1 (transformiix standalone changes)
the
NS_ENSURE_TRUE(aUserData, XML_ERROR_NONE);
still left in the tree are busting standalone, patch coming up
Comment 21•21 years ago
|
||
Updated•21 years ago
|
Attachment #139210 -
Flags: superreview?(peterv)
Attachment #139210 -
Flags: review?(bugmail)
Comment on attachment 139210 [details] [diff] [review]
create own macro to check and warn about aUserData
r=me
another way to do it is to
#define TX_EMPTY
and
NS_ENSURE_TRUE(aUserData, TX_EMPTY);
Attachment #139210 -
Flags: review?(bugmail) → review+
Assignee | ||
Updated•21 years ago
|
Attachment #139210 -
Flags: superreview?(peterv) → superreview+
Comment 23•21 years ago
|
||
Comment on attachment 139210 [details] [diff] [review]
create own macro to check and warn about aUserData
checked in, thanx for the reviews.
Updated•21 years ago
|
Summary: Intergrate latest Expat → Integrate latest Expat
Comment 24•21 years ago
|
||
Did this fix bug 177780?
Assignee | ||
Comment 25•21 years ago
|
||
Assignee | ||
Comment 26•21 years ago
|
||
Assignee | ||
Updated•21 years ago
|
Attachment #147802 -
Flags: superreview?(jst)
Attachment #147802 -
Flags: review?(axel)
Comment 27•21 years ago
|
||
Comment on attachment 147802 [details] [diff] [review]
htmlparser cleanup (diff -w)
it's a girl. and it's a cutie.
The comment in LookupCatalogData still has
"aDTD is an in/out parameter." which is no more, right?
other than that, r=me
Attachment #147802 -
Flags: review?(axel) → review+
Comment 28•21 years ago
|
||
Comment on attachment 147802 [details] [diff] [review]
htmlparser cleanup (diff -w)
- In nsExpatDriver::HandleEndDoctypeDecl():
+ mInternalState = mSink->HandleDoctypeDecl(nsString(), name, systemId,
+ publicId, data);
EmptyString()?
sr=jst
Attachment #147802 -
Flags: superreview?(jst) → superreview+
Assignee | ||
Comment 29•21 years ago
|
||
Attachment #134200 -
Attachment is obsolete: true
Assignee | ||
Comment 30•21 years ago
|
||
The bad news is that the new version of Expat adds about 9k to the binary size
on my optimized OS X build. Not sure what we want to do, I've trimmed some of
the public APIs that we don't use. Beyond that, it seems there's code in Expat
that we don't need, but it's all very intertwined with the rest of the code so
removing it is a) going to be a lot of work and b) will hamper future upgrades.
IMHO it's a nobrainer, another 9k is a small price to pay for keeping up to
date with the expat tip.
Assignee | ||
Comment 32•20 years ago
|
||
Attachment #148301 -
Attachment is obsolete: true
Assignee | ||
Comment 33•20 years ago
|
||
Assignee | ||
Comment 34•20 years ago
|
||
Assignee | ||
Comment 35•20 years ago
|
||
Comment 36•20 years ago
|
||
Comment on attachment 160666 [details] [diff] [review]
v1 (sink changes)
- In nsContentUtils::SplitExpatName():
+ nsDependentString expatName(aExpatName);
+ nsReadingIterator<PRUnichar> start, end;
+ expatName.BeginReading(start);
+ expatName.EndReading(end);
+
+ nsReadingIterator<PRUnichar> uriEnd(start);
+ if (FindCharInReadable('!', uriEnd, end)) {
+ sNameSpaceManager->RegisterNameSpace(nsDependentSubstring(start, uriEnd),
+ *aNameSpaceID);
+
+ nsReadingIterator<PRUnichar> localEnd(++uriEnd);
+ PRBool havePrefix = FindCharInReadable('!', localEnd, end);
+
+ *aLocalName = NS_NewAtom(nsDependentSubstring(uriEnd, localEnd));
+ if (havePrefix) {
+ *aPrefix = NS_NewAtom(nsDependentSubstring(++localEnd, end));
+ }
+ else {
+ *aPrefix = nsnull;
+ }
+ }
+ else {
+ *aNameSpaceID = kNameSpaceID_None;
+ *aPrefix = nsnull;
+ *aLocalName = NS_NewAtom(expatName);
+ }
+}
That could probably be made a bit faster (at the cost of a bit more code,
maybe) if you didn't use nsDependentString there since that ends up looking up
the end of the string, which you don't need to do there since you can do that
while you're looping through the string looking for the '!'.
Assignee | ||
Comment 37•20 years ago
|
||
Attachment #160666 -
Attachment is obsolete: true
Assignee | ||
Updated•20 years ago
|
Attachment #161917 -
Flags: superreview?(jst)
Attachment #161917 -
Flags: review?(bzbarsky)
Assignee | ||
Comment 38•20 years ago
|
||
Comment on attachment 160670 [details] [diff] [review]
Diff with stock Expat
Not really sure how to do this. These are the changes that I ported from the
version of Expat in the Mozilla tree to a newer version of the stock Expat.
Attachment 160668 [details] [diff] contains a patch to the Mozilla trunk.
Attachment #160670 -
Flags: superreview?(jst)
Attachment #160670 -
Flags: review?(bzbarsky)
Comment 39•20 years ago
|
||
Comment on attachment 161917 [details] [diff] [review]
v1.1 (sink changes)
>Index: content/base/src/nsContentUtils.cpp
>+ /**
>+ * Expat can send the following:
>+ * localName
>+ * namespaceURI!localName
>+ * namespaceURI!localName!prefix
>+ */
This bothers me. A lot. URI's are allowed to contain the '!' char. Does
expat do anything about the case when namespaceURI contains that char already?
If not, we have a problem here.... Can we make expat use a different char here?
>Index: content/base/src/nsNameSpaceManager.cpp
> NameSpaceImpl::CreateChildNameSpace(nsIAtom* aPrefix, const nsAString& aURI,
>+ nsCOMPtr<nsINameSpace> child = new NameSpaceImpl(this, aPrefix, id);
>+ NS_ENSURE_TRUE(child, NS_ERROR_OUT_OF_MEMORY);
>+ child.swap(*aChildNameSpace);
Is there a need for the nsCOMPtr? You're not passing it anywhere... I'd think
that:
*aChildNameSpace = new NameSpaceImpl(this, aPrefix, id);
NS_ENSURE_TRUE(*aChildNameSpace, NS_ERROR_OUT_OF_MEMORY);
NS_ADDREF(*aChildNameSpace);
would generate smaller and faster code, no?
I'm up through the beginning of XMLUtils.cpp changes.
I didn't think we used expats namespace parsing so that should be a non-issue?
Comment 41•20 years ago
|
||
(In reply to comment #40)
> I didn't think we used expats namespace parsing so that should be a non-issue?
IIRC, that's exactly what this patch changes.
Comment 42•20 years ago
|
||
Comment on attachment 161917 [details] [diff] [review]
v1.1 (sink changes)
>Index: extensions/transformiix/source/xml/XMLUtils.cpp
>+XMLUtils::splitExpatName(const PRUnichar *aExpatName, nsIAtom **aPrefix,
Is there a reason this uses iterators while nsContentUtils::SplitExpatName uses
raw ptrs? The two algorithms should really be the same, I would think.
>Index: parser/htmlparser/src/nsExpatDriver.cpp
>+ static const PRUnichar kExcl[] = { '!', '\0' };
Ah, I see. This is where we tell expat what char to use for the separator, eh?
How about using something nice and invalid in URIs like 0x1 or something
instead of '!'? I'm still not sure what expat will do if someone sticks that
in a namespace URI (via &0x1;, say), but that would be someone _really_ trying
to be malicious....
>+ mExpatParser = XML_ParserCreate_MM(kUTF16, &memsuite, kExcl);
Reading the documentation on this function, I don't quite like the sound of
this:
When a namespace is not declared, the name and prefix will be
passed through without expansion.
Does that apply only to the case when the separator is '\0', or in general?
That is, have you tested this case with your patch? Hopefully, it's the
former...
>Index: parser/htmlparser/src/xmlparser.properties
>+27 = unbound prefix
How about "prefix not bound to a namespace"?
>Index: rdf/base/src/nsRDFContentSink.cpp
>+ const nsDependentSubstring SplitExpatName(const PRUnichar *aExpatName,
>+ nsIAtom **aLocalName,
>+ const PRUnichar* aValue = nsnull);
Document this function, please?
r=bzbarsky with those issues addressed.
Attachment #161917 -
Flags: review?(bzbarsky) → review+
Comment 43•20 years ago
|
||
Comment on attachment 160670 [details] [diff] [review]
Diff with stock Expat
>+++ parser/expat/lib/expat.h Wed Sep 29 21:23:31 2004
>+/* BEGIN MOZILLA CHANGE (typedef XML_Char to PRUnichar) */
So I'm not seeing where that typedef actually happens... did I miss something?
> +/* BEGIN MOZILLA CHANGE (blocking parser) */
> +XMLPARSEAPI(enum XML_Status)
> +MOZ_XML_StopParser(XML_Parser parser, XML_Bool resumable);
Second arg is unused; why is it there?
> +/* BEGIN MOZILLA CHANGE (unused API) */
> +#if 0
> XML_Bool XMLCALL
> XML_ParserReset(XML_Parser parser, const XML_Char *encodingName)
I'd be curious to see the performance effect of actually recycling parsers, but
not this patch, I guess. ;)
>+/* BEGIN MOZILLA CHANGE (Include xmlns attributes in attributes array) */
>+ s += 5;
Maybe |sizeof(xmlnsPrefix)-1| or something? That would be less magical... At
least, comment where the 5 comes from.
The rest looks ok, as far as I can tell...
r=bzbarsky with the above comments addressed.
Attachment #160670 -
Flags: review?(bzbarsky) → review+
Comment 44•20 years ago
|
||
Comment on attachment 160670 [details] [diff] [review]
Diff with stock Expat
sr=jst
Attachment #160670 -
Flags: superreview?(jst) → superreview+
Comment 45•20 years ago
|
||
Comment on attachment 161917 [details] [diff] [review]
v1.1 (sink changes)
sr=jst with bz's comments addressed.
Attachment #161917 -
Flags: superreview?(jst) → superreview+
Assignee | ||
Comment 46•20 years ago
|
||
(In reply to comment #39)
> (From update of attachment 161917 [details] [diff] [review])
> >Index: content/base/src/nsContentUtils.cpp
> >+ /**
> >+ * Expat can send the following:
> >+ * localName
> >+ * namespaceURI!localName
> >+ * namespaceURI!localName!prefix
> >+ */
>
> This bothers me. A lot. URI's are allowed to contain the '!' char. Does
> expat do anything about the case when namespaceURI contains that char already?
> If not, we have a problem here.... Can we make expat use a different char here?
Yes, I've chosen 0xFFFF as the separator.
> >Index: content/base/src/nsNameSpaceManager.cpp
>
> > NameSpaceImpl::CreateChildNameSpace(nsIAtom* aPrefix, const nsAString& aURI,
> Is there a need for the nsCOMPtr? You're not passing it anywhere... I'd think
> that:
>
> *aChildNameSpace = new NameSpaceImpl(this, aPrefix, id);
> NS_ENSURE_TRUE(*aChildNameSpace, NS_ERROR_OUT_OF_MEMORY);
> NS_ADDREF(*aChildNameSpace);
>
> would generate smaller and faster code, no?
Done.
(In reply to comment #42)
> (From update of attachment 161917 [details] [diff] [review])
> >Index: extensions/transformiix/source/xml/XMLUtils.cpp
>
> >+XMLUtils::splitExpatName(const PRUnichar *aExpatName, nsIAtom **aPrefix,
>
> Is there a reason this uses iterators while nsContentUtils::SplitExpatName uses
> raw ptrs? The two algorithms should really be the same, I would think.
Done.
> >+ mExpatParser = XML_ParserCreate_MM(kUTF16, &memsuite, kExcl);
>
> Reading the documentation on this function, I don't quite like the sound of
> this:
>
> When a namespace is not declared, the name and prefix will be
> passed through without expansion.
>
> Does that apply only to the case when the separator is '\0', or in general?
> That is, have you tested this case with your patch? Hopefully, it's the
> former...
The documentation is outdated (I asked on the expat list). Unmapped prefixes
will return an error and we'll bail out.
> >Index: parser/htmlparser/src/xmlparser.properties
>
> >+27 = unbound prefix
>
> How about "prefix not bound to a namespace"?
Done.
> >Index: rdf/base/src/nsRDFContentSink.cpp
> >+ const nsDependentSubstring SplitExpatName(const PRUnichar *aExpatName,
> >+ nsIAtom **aLocalName,
> >+ const PRUnichar* aValue = nsnull);
>
> Document this function, please?
Done.
/**
* Extracts the localname from aExpatName, the name that the Expat parser
* passes us. aValue should contain a namespace URI if we need to register
* a new prefix-to-namespace-URI mapping when aExpatName declares a
* namespace (namespace URI in aExpatName being
* http://www.w3.org/2000/xmlns/). aValue is typically the value of an
* xmlns attribute.
* aLocalName will contain the localname in aExpatName and the returned
* value will be the namespace URI.
*/
(In reply to comment #43)
> (From update of attachment 160670 [details] [diff] [review])
> >+++ parser/expat/lib/expat.h Wed Sep 29 21:23:31 2004
>
> >+/* BEGIN MOZILLA CHANGE (typedef XML_Char to PRUnichar) */
>
> So I'm not seeing where that typedef actually happens... did I miss something?
In parser/expat/expat_config.h:
+#define XML_UNICODE
+typedef PRUnichar XML_Char;
+typedef char XML_LChar;
+#define XML_T(x) (PRUnichar)x
+#define XML_L(x) x
> > +/* BEGIN MOZILLA CHANGE (blocking parser) */
> > +XMLPARSEAPI(enum XML_Status)
> > +MOZ_XML_StopParser(XML_Parser parser, XML_Bool resumable);
>
> Second arg is unused; why is it there?
Preparing for Expat 1.95.8, which has a blocking API with this signature.
> >+/* BEGIN MOZILLA CHANGE (Include xmlns attributes in attributes array) */
> >+ s += 5;
>
> Maybe |sizeof(xmlnsPrefix)-1| or something? That would be less magical... At
> least, comment where the 5 comes from.
I used
+ s += sizeof(xmlnsPrefix) / sizeof(xmlnsPrefix[0]) - 1;
Comment 47•20 years ago
|
||
> + s += sizeof(xmlnsPrefix) / sizeof(xmlnsPrefix[0]) - 1;
would NS_ARRAY_LENGTH be clearer?
Assignee | ||
Comment 48•20 years ago
|
||
(In reply to comment #47)
> > + s += sizeof(xmlnsPrefix) / sizeof(xmlnsPrefix[0]) - 1;
>
> would NS_ARRAY_LENGTH be clearer?
If it were Mozilla code, yes. I'm not going to use that macro in Expat code.
Assignee | ||
Comment 49•20 years ago
|
||
Checked in. This did increase codesize by about 4-5k. I've trimmed off any
trivial bloat, so we'll have to live with that for now. I will try to reduce
Expat's size further in the future, but it's not going to be easy.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment 50•20 years ago
|
||
I think that the patch for this bug introduced bug 276434, can one of you have a
look at it, thank you.
You need to log in
before you can comment on or make changes to this bug.
Description
•