Tokenizer_HandleExternalEntityRef() gets called on standalone docs

RESOLVED FIXED in mozilla0.9.9

Status

()

RESOLVED FIXED
17 years ago
16 years ago

People

(Reporter: hsivonen, Assigned: hsivonen)

Tracking

({perf})

Trunk
mozilla0.9.9
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

Attachments

(1 attachment, 1 obsolete attachment)

FizzillaMach CVS build pulled 2001-11-15

Steps to reproduce:
1) add a printf or something like that to Tokenizer_HandleExternalEntityRef() in
nsExpatTokenizer to get feedback when the function is called.
2) Load a document that has been decalraed standalone, eg.
http://www.hut.fi/~hsivonen/test/standalone-yes.xml

Actual results:
Tokenizer_HandleExternalEntityRef() gets called even when the doc has been
declared standalone.

Expected results:
Expected Tokenizer_HandleExternalEntityRef() not to be called on standalone
docs. When Mozilla gets more elaborate external entity handling, the code will
be needlessly called for standalone docs. Even now, omitting the call to
Expected Tokenizer_HandleExternalEntityRef() would save a little time.
Keywords: perf
Created attachment 58102 [details] [diff] [review]
Change expat initialization to ignore external refs on standalone docs
lxr tells me that there are no chrome files marked standalone="yes". Also, chome
appears to work just fine with this patch applied. I'm not seeing ill effects.

Looking for r/sr.
Status: NEW → ASSIGNED
The patch sets XML_SetParamEntityParsing callback function to be called when not
standalone. But my reading of the documentation for that callback is that this
should only affect PARAMETER entities, not all entities. Also, I would like to
see a testcase that uses parameter entities and make sure that it still works.
Specifically, I would like to know that we do the proper sequence of new parser
creation etc. as the documentation states in xmlparse.h:

http://lxr.mozilla.org/mozilla/source/expat/xmlparse/xmlparse.h#434

Comment 4

17 years ago
henris: do you need this standalone="yes" functionality towards your intention 
of special-casing the XHTML+MathML DTD? If so, how to you intend to use that?

heikki: the comments mention that "including the external DTD subset" as well 
(perhaps a better name for |enum XML_ParamEntityParsing| would have simply be
|enum XML_EntityParsing|). This may explain why henris sees his |printf| in 
HandleExternalEntityRef() when his patch isn't there, and doesn't see it with 
his patch applied.
Sorry for the delay.

> But my reading of the documentation for that callback is that this
> should only affect PARAMETER entities, not all entities.

I think that is suitable for this purpose. Am I missing something?

>Also, I would like to see a testcase that uses parameter entities and 
>make sure that it still works.

Do you mean test cases such as these:
http://www.hut.fi/~hsivonen/test/standalone-undef-param.xml
http://www.hut.fi/~hsivonen/test/standalone-yes-param.xml
http://www.hut.fi/~hsivonen/test/standalone-no-param.xml
?

They work as expected with the patch. (That is
Tokenizer_HandleExternalEntityRef() gets called for non-standalone docs but not
for the standalone doc.)

> Specifically, I would like to know that we do the proper sequence of new 
> parser creation etc.

For non-standalone docs, everything works the same way as before. For standalone
docs, the external entity parser is not created/freed at all.

> do you need this standalone="yes" functionality towards your intention 
> of special-casing the XHTML+MathML DTD? If so, how to you intend to use that?

This is not absolutely required, but I think it makes sense to take the
performance gain. Handling the DTD doesn't come for free. If the document
doesn't need any declarations from the DTD but the author wishes to include the
doctype declaration for validation purposes, (s)he can declare the doc
standalone in order to instruct Mozilla (or another browser) not to waste cycles
processing declarations unnecessarily.

>the comments mention that "including the external DTD subset" as well 
>(perhaps a better name for |enum XML_ParamEntityParsing| would have simply 
>be |enum XML_EntityParsing|).

The usual external DTD subset reference is syntactic sugar for a parameter
entity reference, so the setting affects the usual DTD reference, too.
Depends on: 105137
Created attachment 65901 [details] [diff] [review]
Migrated to nsExpatDriver: change expat init to ignore external refs on standalone docs

Migrating the patch from nsExpatTokenizer to nsExpatDriver.

Requsting r/sr.
Attachment #58102 - Attachment is obsolete: true
Comment on attachment 65901 [details] [diff] [review]
Migrated to nsExpatDriver: change expat init to ignore external refs on standalone docs

r=heikki
Attachment #65901 - Flags: review+
Comment on attachment 65901 [details] [diff] [review]
Migrated to nsExpatDriver: change expat init to ignore external refs on standalone docs

sr=jst
Attachment #65901 - Flags: superreview+
Thanks for the r/sr. I don't have CVS write access. Who may I bother with the
check-in?
Target Milestone: --- → mozilla0.9.9
heikki, could you check this in?
Checked in.
Status: ASSIGNED → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED

Comment 12

16 years ago
FYI, just saw this comment of James Clark at bug 11538 comment 7, and it adds
value to this fix:

"it's important not to load DTDs unless you really need to (use
XML_PARAM_ENTITY_PARSING_UNLESS_STANDALONE not XML_PARAM_ENTITY_PARSING_ALWAYS)"

which is what the patch in attachment 65901 [details] [diff] [review] did.

Updated

16 years ago
QA Contact: petersen → rakeshmishra
You need to log in before you can comment on or make changes to this bug.