Closed Bug 103255 Opened 23 years ago Closed 20 years ago

We should throw an error on non-namespace-conforming XML

Categories

(Core :: XML, defect, P3)

Other
Other
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: sicking, Assigned: sicking)

Details

Attachments

(3 files)

We should treat non-namespace-conforming xml like we treat non-valid xml. 
Basically there are two things we should look out for:

1. Prefixes without a corresponding xmlns attributes. For example the following
   two documents uses the undeclared 'foo' prefix
   <doc><foo:bar/></doc>
   and
   <doc foo:attr=""/>

This is easy to detect since nsXMLContentSink::GetNameSpaceId returns 
kNameSpaceID_Unknown when the prefixlookup fails.

2. More then one attribute with the same namespaceURI and localName. For
   example The following element has two attributes with localName 'a' and
   namespaceURI 'ns'
   <elem p1:a="foo" p2:a="bar" xmlns:p1="ns" xmlns:p1="ns"/>

This should be fairly easy to detect and throw and error in the xml-
contentsink. The idea I had to do this without performance hit is to 
have ::SetAttr return some other successvalue then NS_OK if there allready 
existed an attribute with that localName+nsURI (Such a value could probobly 
also be usefull for other clients).


However I need some hints on how to report the errors the way that xml-
validness errors are reported. I also need to abort documentloading done 
through document.load() and friends.
Status: NEW → ASSIGNED
Summary: We should thrown an error on non-namespace-conforming XML → We should throw an error on non-namespace-conforming XML
I don't think this is as easy as you think, given our current architecture.
Mozilla code does not detect the error, it is done by Expat. You should look
into htmlparser/src and how we display the well-formedness errors there. The
cleanest solution in my opinion would be to make Expat report namespace errors
to Mozilla (if you can figure out how Expat works).
Priority: -- → P3
Target Milestone: --- → mozilla0.9.8
Target Milestone: mozilla0.9.8 → mozilla0.9.9
i doubt i'll have time with this before 1.0 :(
Target Milestone: mozilla0.9.9 → mozilla1.1
With the new architecture this has become easier. We now report errors in the
XML content sink, so the sink can also do namespace checking. But I agree, we
can fix this later...
don't know what to target this for. I'll try to get to it once things settle
down a bit
Target Milestone: mozilla1.1alpha → ---
Note -- s/valid/wellformed/ above. Namespace correctness is a wellformedness
issue not a validity issue.
QA Contact: petersen → ian
marking new since i'm not working on these
Status: ASSIGNED → NEW
sicking: Any ETA on this? This is something we really should fix ASAP, before
people start relying on our lack of error handling, because then we would not be
able to fix it at all without breaking sites.
So.. detecting those first two errors and calling ReportError is easy, indeed. 
The problem is that we also need to stop the parse.  Perhaps an error code that
will cause nsExpatDriver can to call Terminate() or something along those lines?

Also, will transformiix need its own changes to deal?  It seems to use expat
directly....
Yes other places then the contentsinks will most likly need to be changed. This
includes the XSLT sink in transformiix and the RDF sink, and there might be
others too, like XBL and/or XUL
Woah, woah, can't all those go through the same layer? We don't want to be 
replicating core well-formedness checking all over the tree! Ideally this should 
in fact be in expat itself.
absolutly! I'm not sure that we want to use expats namespace support, it might
be a performance hit compared to how things are done now. However, it'd really
be nice to have the nsExpatDriver or whatever it's called to do namespace
resolution which would give us a single point of namespace-error-checking as
well as remove some codeduplication that already exists.
(In reply to comment #14)
> absolutly! I'm not sure that we want to use expats namespace support

We don't, as far as I can tell.  I just hacked nsExpatDriver to create the
parser using XML_ParserCreateNS instead of XML_ParserCreate for the XHTML MIME
type, and its error handler wasn't called for any of the three testcases in this
bug, when loaded from a .xhtml file.  So expat doesn't report those errors, it
would seem (as the documentation at
http://lxr.mozilla.org/seamonkey/source/parser/expat/lib/xmlparse.h#102 suggests...)

> it might be a performance hit compared to how things are done now

It would be, almost certinly, since it'd just convert the prefixes to namespace
strings; we'd still need to handle the nsINameSpace stuff, just with longer strings.

> However, it'd really be nice to have the nsExpatDriver or whatever it's called
> to do namespace resolution which would give us a single point of
> namespace-error-checking as well as remove some codeduplication that already
> exists.

Agreed.  The namespace manager is per-document, right?  We'd need to have a way
for the driver to do the following:

1)  Register a new xmlns on the document (ideally that would return to it the
    nsINameSpace it wants to use from that point on?).
2)  Get the namespace id for a prefix (if it has an nsINameSpace, this is a
    given).

Then it can enforce the validity constraints we want, right?

That's about it, no?  We'd need to change HandleStartElement/HandleEndElement to
pass in namespace IDs / prefixes / localnames instead of the wstrings it uses now...

One other issue -- nsIExpatDriver is idl/scriptable, while nsINameSpace is _so_
not.  How do we want to handle that?  Do we hack together yet another interface
(inheriting from nsIExpatDriver) or something?
It's even simpler, there is only one namespace manager so the current code can
more or less be moved directly to nsExpatDriver and
nsIExpatSink::HandleStartElement can be made to take atoms and namespace ids.

The scriptable part is trickier though...
Frankly, I have no clue why nsIExpatSink (not driver, my apologies) is
scriptable.  Do we have any impls that can be created from script?  I don't
think we do...  Anyway, nsIAtom is scriptable (which is bullshit, of course),
namespace ids are just longs, so there is no reason not to add methods using
those to this api.  As long as nsINameSpace is not in the api (and it wouldn't
be, since that would be handled inside the driver), it's not an issue.
Ok, so it looks like the CSS3 Selectors suite has exactly the sort of invalid
markup this bug is about... ;)

With that in mind, I was looking into fixing this.  The first question I ran
into is how to pass the strings/atoms/namespaces efficiently...  I guess I can
stack-allocate an nsIAtom[] and fill it with the atoms and pass it in; same for
namespaces.  For strings, it sounds like copying all over.  :(

Ideally, we'd just pass an array of nodeinfos for all the attribute names, I
think.  But nsINodeInfo is most definitely not scriptable....

Thoughts?
I wouldn't mind making nsIExpatSink non-scriptable.
If we do, I'd probably go with passing nsCOMArray<nsINodeInfo> for the attr
names _and_ passing through the original PRUnichar* array with notes in the api
that the array should be walked by 2s to get the attr values... Seem reasonable?

(Also, this will mean moving nsIExpatSink out of idl or tossing in %{C++ stuff...)
Er, scratch that last part.  We should be able to get by with [native] if we go
the non-scriptable route.
I think so. Note that I just discovered that newer versions of Expat seem to
throw errors for this.
Hmm.. So maybe we want to just upgrade expat and use that?

Note that I just discovered that I can't use NS_GetNameSpaceManager in
nsExpatSink, since it's not part of gklayout... :(
You can still get it as a service, no?
Ah, yes, indeed.  Good catch.
Does the new expat really catch this when used in non-namespace-mode? See
comment 14 and comment 15.
Actually, I suspect perf may not be that bad... but would we be able to preseve
namespace prefixes like we do now?  (And do we care?)
Of course it needs to be in namespace-mode but we can remove the nsINameSpace
stuff, Expat passes you the prefixes if you ask it to. The only thing I had to
do is make it pass the xmlns attributes, it removes them from the attributes
array in namespace-mode. I have a build where all of this works, but putting
Expat in namespace-mode increases codesize even more :-/ and I haven't yet
tested whether Expat is much slower in namespace-mode.
This was fixed by the fix for bug 274964.
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: