Closed Bug 174713 Opened 23 years ago Closed 23 years ago

Support XPath on HTML

Categories

(Core :: XSLT, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED

People

(Reporter: sicking, Assigned: sicking)

Details

Attachments

(1 file, 4 obsolete files)

Right now applying XPath (and XSLT) on a html-document is a bit flaky. You have to use nametests in lowercase, name() returns an uppercase name whereas local-name() returns a lowercased one. I talked with Ray about what we want to happen and came to the conclusion that nametests should be case-insensitive and name()/local-name() should return uppercased names. So I have a patch that does that :) I decided to not touch nametests that have a prefix, they are left unchanged. The reason was mostly that i didn't want to touch the prefix, and changing the case of half of the nametest seemed wierd. Those nametests should in any case never match anything since HTML doesn't have namespaces. As an added bonus it also makes XSLT work with HTML as source document. The main thing i don't like with the patch is that i had to add Node::getDOMLocalName function. Ideally we should rename Node::getLocalName to Node::getXPathLocalName or Node::getNormalizedLocalName. But i didn't want to touch too much of the code since we have a big landing comming up and eventually the Node interface should be replaced by the treewalker.
Attached patch patch to implement (obsolete) — Splinter Review
So i decided to go with the name |getLocalName| even for the new function since it doesn't conform to the DOM spec (it returns a name for processing instructions)
hmm.. maybe i should explain what this patch does. It fixes two problems 1. nametests need to be lowercased when used on html This is fixed by adding a txIParseContext::caseInsensitiveNameTests function. The parser checks with this function before createing a nametest and if it returns true the nametest is lowercased before being parsed into a txNameTest This function is implemented in dom-xpath by giving the document to the XPathEvaluator (through a new nsIXPathEvaluatorInternal interface). That can then inspect that document to decide if it is used in a html-context or not. In XSLT this function is implemented by asking the source document if it is a html-document, through a new Document::isHTML function 2. local-name() returns name in lowercase for HTML elements. this is fixed by implementing a new function, |String getLocalName()| on Nodes. The function is simply forwared to the mozilla implementation which will return the correct casing.
Attachment #110273 - Flags: superreview?(peterv)
Attachment #110273 - Flags: review?(axel)
I don't like the getLocalName addition to the transformiix DOM, please move that code into an ifdef in the code for local-name() and comment that it should be in the walker. I don't like the isHTML stuff in the transformiix DOM either, we should just set a flag in the evaluator/processor (which can pass it to their processorstate) as soon as we have the Mozilla document. Also, I think we might have an evaluator without a document, you should probably get the document using nsContentUtils::GetDocumentFromCaller when the evaluator gets created.
+ if (aContext->caseInsensitiveNameTests() && + tok->value.indexOf(':') == kNotFound) { + String qname = tok->value; + qname.toLowerCase(); + rv = resolveQName(qname, prefix, aContext, lName, nspace); should probably just get the atom of the lowercased name instead of going through resolveQName.
Attached patch patch v2 (obsolete) — Splinter Review
This removes the XSLT-on-HTML stuff since it wasn't very liked and Pike was worried that it might block future optimizations. I also remove the additions to the DOM and used an #ifdef TX_EXE on request from peterv. I didn't remove the call to resolveQName since that ended up resulting in more code.
Attachment #110273 - Attachment is obsolete: true
Attached patch patch v2 (obsolete) — Splinter Review
ugha, forgot to cvs-add the interface-file. This is otherwise the same as the previous one.
Attachment #111978 - Attachment is obsolete: true
Attachment #110273 - Flags: superreview?(peterv)
Attachment #110273 - Flags: review?(axel)
Sicking has the patches.
Assignee: peterv → bugmail
Status: ASSIGNED → NEW
peewee wrote > Also, I think we might have an evaluator without a document, you should probably > get the document using nsContentUtils::GetDocumentFromCaller when the evaluator > gets created. Jonas? I don't see a reply nor code. And attachement 111985 still changes txPatternParser, though those are pure XSLT. I still think that using nsIContent::GetContentType for XPath would be the real way to go for this, sadly enough, that is too expensive to incorporate into the xpath engine used by xslt.
> > Also, I think we might have an evaluator without a document, you should > > probably get the document using nsContentUtils::GetDocumentFromCaller when > > the evaluator gets created. > Jonas? I don't see a reply nor code. Sorry, forgot to reply. I'm not sure what to do for this. The problem is that if someone calls .createExpression we won't know which document this will end up being executed on since the evaluator is not bound to a document. The best solution I can see is to simply not support using standalone XPathEvalutors on HTML documents. Which is what the patch does. This is not a problem when using a document as XPathEvaluator. The spec says on the contextNode argument: "If the XPathEvaluator was obtained by casting the Document then this must be owned by the same document" We don't enforce this yet, but this patch makes that possible since we now have the "ownerdocument" in the evaluator-object. If you want i can add code to enforce this to the patch. > And attachement 111985 still changes txPatternParser, though those are pure > XSLT. I talked briefly with peterv about this. I've had an idea for quite some time to create a class that implements nsIDOMNodeFilter and is based on an XSLT-Pattern. That would be very usefull together with the treewalker which is quite often used on HTML-documents. If you want this part of the patch can wait until i implement that other class.
Status: NEW → ASSIGNED
Attachment #111985 - Flags: superreview?(jst)
Attachment #111985 - Flags: review?(peterv)
Comment on attachment 111985 [details] [diff] [review] patch v2 sr=jst
Attachment #111985 - Flags: superreview?(jst) → superreview+
Comment on attachment 111985 [details] [diff] [review] patch v2 >Index: content/base/public/nsIXPathEvaluatorInternal.h >=================================================================== >+ /** >+ * Sets the document this evaluator is corresponds to Drop the "is" >+ /** >+ * Returns the document this evaluator is corresponds to Drop the "is" Do we need GetDocument? I think we should make a lowercasing version of resolveQName.
I didn't create a new resolveQName function but added a flag to the existing one instead
Attachment #111985 - Attachment is obsolete: true
Attachment #113040 - Flags: review?(peterv) → review+
Comment on attachment 113040 [details] [diff] [review] fix petervs comments sr=jst
Attachment #113040 - Flags: superreview?(jst) → superreview+
Comment on attachment 113040 [details] [diff] [review] fix petervs comments This is not really a new feature (contrary to the summary in the bug) but rather making XPath-on-HTML behave more consistently and HTML-ish. The patch is lowrisk and only affects pages that uses DOM-XPath. I've tested it on a few testcases and everything behaves properly.
Attachment #113040 - Flags: approval1.3?
Comment on attachment 113040 [details] [diff] [review] fix petervs comments a=asa (on behalf of drivers) for checkin to 1.3 final.
Attachment #113040 - Flags: approval1.3? → approval1.3+
checked in. Thanks for reviews
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
(And, BTW, bug 193586 suggests that comment 16 isn't entirely true, since I think this managed to cause a leak even on pages that don't use DOM XPath.)
mass verifying
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: