Closed Bug 174713 Opened 23 years ago Closed 23 years ago

Support XPath on HTML

Tracking

()

Status:

VERIFIED FIXED

People

(Reporter: sicking, Assigned: sicking)

Details

Attachments

(1 file, 4 obsolete files)

patch to implement 23 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 23.69 KB, patch		Details \| Diff \| Splinter Review
sync to tip and fix the pattern-parser 23 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 24.86 KB, patch		Details \| Diff \| Splinter Review
patch v2 23 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 13.98 KB, patch		Details \| Diff \| Splinter Review
patch v2 23 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 16.90 KB, patch	jst : superreview+	Details \| Diff \| Splinter Review
fix petervs comments 23 years ago Jonas Sicking (:sicking) No longer reading bugmail consistently 17.60 KB, patch	peterv : review+ jst : superreview+ asa : approval1.3+	Details \| Diff \| Splinter Review

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Description

•

23 years ago

Right now applying XPath (and XSLT) on a html-document is a bit flaky. You have to use nametests in lowercase, name() returns an uppercase name whereas local-name() returns a lowercased one. I talked with Ray about what we want to happen and came to the conclusion that nametests should be case-insensitive and name()/local-name() should return uppercased names. So I have a patch that does that :) I decided to not touch nametests that have a prefix, they are left unchanged. The reason was mostly that i didn't want to touch the prefix, and changing the case of half of the nametest seemed wierd. Those nametests should in any case never match anything since HTML doesn't have namespaces. As an added bonus it also makes XSLT work with HTML as source document. The main thing i don't like with the patch is that i had to add Node::getDOMLocalName function. Ideally we should rename Node::getLocalName to Node::getXPathLocalName or Node::getNormalizedLocalName. But i didn't want to touch too much of the code since we have a big landing comming up and eventually the Node interface should be replaced by the treewalker.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 1

•

23 years ago

Attached patch patch to implement (obsolete) — Details — Splinter Review

So i decided to go with the name |getLocalName| even for the new function since it doesn't conform to the DOM spec (it returns a name for processing instructions)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 2

•

23 years ago

hmm.. maybe i should explain what this patch does. It fixes two problems 1. nametests need to be lowercased when used on html This is fixed by adding a txIParseContext::caseInsensitiveNameTests function. The parser checks with this function before createing a nametest and if it returns true the nametest is lowercased before being parsed into a txNameTest This function is implemented in dom-xpath by giving the document to the XPathEvaluator (through a new nsIXPathEvaluatorInternal interface). That can then inspect that document to decide if it is used in a html-context or not. In XSLT this function is implemented by asking the source document if it is a html-document, through a new Document::isHTML function 2. local-name() returns name in lowercase for HTML elements. this is fixed by implementing a new function, |String getLocalName()| on Nodes. The function is simply forwared to the mozilla implementation which will return the correct casing.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 3

•

23 years ago

note to self, need to fix the pattern-parser as well: http://lxr.mozilla.org/mozilla/source/extensions/transformiix/source/xslt/txPatternParser.cpp#308

Status: NEW → ASSIGNED

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 4

•

23 years ago

Attached patch sync to tip and fix the pattern-parser (obsolete) — Details — Splinter Review

Attachment #103034 - Attachment is obsolete: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

23 years ago

Attachment #110273 - Flags: superreview?(peterv)

Attachment #110273 - Flags: review?(axel)

Peter Van der Beken [:peterv]

Comment 5

•

23 years ago

I don't like the getLocalName addition to the transformiix DOM, please move that code into an ifdef in the code for local-name() and comment that it should be in the walker. I don't like the isHTML stuff in the transformiix DOM either, we should just set a flag in the evaluator/processor (which can pass it to their processorstate) as soon as we have the Mozilla document. Also, I think we might have an evaluator without a document, you should probably get the document using nsContentUtils::GetDocumentFromCaller when the evaluator gets created.

Peter Van der Beken [:peterv]

Comment 6

•

23 years ago

+ if (aContext->caseInsensitiveNameTests() && + tok->value.indexOf(':') == kNotFound) { + String qname = tok->value; + qname.toLowerCase(); + rv = resolveQName(qname, prefix, aContext, lName, nspace); should probably just get the atom of the lowercased name instead of going through resolveQName.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 7

•

23 years ago

Attached patch patch v2 (obsolete) — Details — Splinter Review

This removes the XSLT-on-HTML stuff since it wasn't very liked and Pike was worried that it might block future optimizations. I also remove the additions to the DOM and used an #ifdef TX_EXE on request from peterv. I didn't remove the call to resolveQName since that ended up resulting in more code.

Attachment #110273 - Attachment is obsolete: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 8

•

23 years ago

Attached patch patch v2 (obsolete) — Details — Splinter Review

ugha, forgot to cvs-add the interface-file. This is otherwise the same as the previous one.

Attachment #111978 - Attachment is obsolete: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

23 years ago

Attachment #110273 - Flags: superreview?(peterv)

Attachment #110273 - Flags: review?(axel)

Peter Van der Beken [:peterv]

Comment 9

•

23 years ago

Sicking has the patches.

Assignee: peterv → bugmail

Status: ASSIGNED → NEW

Axel Hecht

Comment 10

•

23 years ago

peewee wrote > Also, I think we might have an evaluator without a document, you should probably > get the document using nsContentUtils::GetDocumentFromCaller when the evaluator > gets created. Jonas? I don't see a reply nor code. And attachement 111985 still changes txPatternParser, though those are pure XSLT. I still think that using nsIContent::GetContentType for XPath would be the real way to go for this, sadly enough, that is too expensive to incorporate into the xpath engine used by xslt.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 11

•

23 years ago

> > Also, I think we might have an evaluator without a document, you should > > probably get the document using nsContentUtils::GetDocumentFromCaller when > > the evaluator gets created. > Jonas? I don't see a reply nor code. Sorry, forgot to reply. I'm not sure what to do for this. The problem is that if someone calls .createExpression we won't know which document this will end up being executed on since the evaluator is not bound to a document. The best solution I can see is to simply not support using standalone XPathEvalutors on HTML documents. Which is what the patch does. This is not a problem when using a document as XPathEvaluator. The spec says on the contextNode argument: "If the XPathEvaluator was obtained by casting the Document then this must be owned by the same document" We don't enforce this yet, but this patch makes that possible since we now have the "ownerdocument" in the evaluator-object. If you want i can add code to enforce this to the patch. > And attachement 111985 still changes txPatternParser, though those are pure > XSLT. I talked briefly with peterv about this. I've had an idea for quite some time to create a class that implements nsIDOMNodeFilter and is based on an XSLT-Pattern. That would be very usefull together with the treewalker which is quite often used on HTML-documents. If you want this part of the patch can wait until i implement that other class.

Status: NEW → ASSIGNED

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

23 years ago

Attachment #111985 - Flags: superreview?(jst)

Attachment #111985 - Flags: review?(peterv)

Johnny Stenback (:jst)

Comment 12

•

23 years ago

Comment on attachment 111985 [details] [diff] [review] patch v2 sr=jst

Attachment #111985 - Flags: superreview?(jst) → superreview+

Peter Van der Beken [:peterv]

Comment 13

•

23 years ago

Comment on attachment 111985 [details] [diff] [review] patch v2 >Index: content/base/public/nsIXPathEvaluatorInternal.h >=================================================================== >+ /** >+ * Sets the document this evaluator is corresponds to Drop the "is" >+ /** >+ * Returns the document this evaluator is corresponds to Drop the "is" Do we need GetDocument? I think we should make a lowercasing version of resolveQName.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 14

•

23 years ago

Attached patch fix petervs comments — Details — Splinter Review

I didn't create a new resolveQName function but added a flag to the existing one instead

Attachment #111985 - Attachment is obsolete: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

23 years ago

Attachment #113040 - Flags: review?(peterv)

Peter Van der Beken [:peterv]

Updated

•

23 years ago

Attachment #113040 - Flags: review?(peterv) → review+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

23 years ago

Attachment #111985 - Flags: review?(peterv)

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Updated

•

23 years ago

Attachment #113040 - Flags: superreview?(jst)

Johnny Stenback (:jst)

Comment 15

•

23 years ago

Comment on attachment 113040 [details] [diff] [review] fix petervs comments sr=jst

Attachment #113040 - Flags: superreview?(jst) → superreview+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 16

•

23 years ago

Comment on attachment 113040 [details] [diff] [review] fix petervs comments This is not really a new feature (contrary to the summary in the bug) but rather making XPath-on-HTML behave more consistently and HTML-ish. The patch is lowrisk and only affects pages that uses DOM-XPath. I've tested it on a few testcases and everything behaves properly.

Attachment #113040 - Flags: approval1.3?

Asa Dotzler [:asa]

Comment 17

•

23 years ago

Comment on attachment 113040 [details] [diff] [review] fix petervs comments a=asa (on behalf of drivers) for checkin to 1.3 final.

Attachment #113040 - Flags: approval1.3? → approval1.3+

Jonas Sicking (:sicking) No longer reading bugmail consistently

Assignee

Comment 18

•

23 years ago

checked in. Thanks for reviews

Status: ASSIGNED → RESOLVED

Closed: 23 years ago

Resolution: --- → FIXED

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 19

•

23 years ago

I suspect this caused bug 193586.

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 20

•

23 years ago

(And, BTW, bug 193586 suggests that comment 16 isn't entirely true, since I think this managed to cause a leak even on pages that don't use DOM XPath.)

Axel Hecht

Comment 21

•

22 years ago

mass verifying

Status: RESOLVED → VERIFIED

You need to log in before you can comment on or make changes to this bug.