Closed Bug 319768 (XPathGenerator) Opened 19 years ago Closed 17 years ago

Implement XPath generator

Categories

(Core :: DOM: Core & HTML, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9alpha8

People

(Reporter: WeirdAl, Assigned: WeirdAl)

References

(Depends on 1 open bug, )

Details

(Whiteboard: [landed, but not part of the regular build])

Attachments

(1 file, 26 obsolete files)

47.44 KB, patch
WeirdAl
: review+
WeirdAl
: superreview+
Details | Diff | Splinter Review
Per the DOM XPath specification, we support evaluate() as a method of XPathEvaluator (Document, basically).  This bug is to propose implementing a counterpart, for returning a XPath from the document node to a given node as the first argument.

It doesn't have to be fancy or optimized.  I'm willing to do a simple implementation.  Thoughts?
One problem is that there are many possible expressions that will result in the same node. If we want something that is resistent to DOM mutations then walking up the DOM until we hit a node with an id and building something like

id('foo')/bar[1]/baz[4]

is probably a good idea.

Alternativly something like

id('foo)/descendant::baz[17]
So we should figure out what the use cases are and see what would work best for them, perhaps.  There's some code at the end of GenerateStateKey and some code in view selection source that both do something like this.
http://lxr.mozilla.org/seamonkey/source/content/base/src/nsContentUtils.cpp#1567 for GenerateStateKey
http://lxr.mozilla.org/seamonkey/source/xpfe/browser/resources/content/viewPartialSource.js#215  for "View Selection Source"

The latter case returns an array of descendant indexes (that's what I call it) of nodes to determine where to place the selection in the view-source window.  We can definitely make that more efficient with XPath.  Even a special array.join() call would speed up things there, to generate a XPath which could then be used immediately.

The former appears to be for forms & frames, to maintain (I assume, since I haven't looked at the code) conditions such as scrollTop, checked, value, etc.

I'm still having trouble understanding what the key looks like that GenerateStateKey returns.  (It also seems that GenerateStateKey should assert that aDocument == aContent->GetCurrentDoc().)  The only part that makes sense to me is when we're not in a HTML form, and then the code appears to generate "o>5>13>12...", where the numbers following > characters are again a list of descendant indexes.

In neither case above do the indexes care about node names or types.

Interestingly, both applications use a reverse order, going from node to ancestor, in what they return.  XPath doesn't allow that.  ;)  XPath does allow you to use a chain of descending indexes (//[5][3][12]... I think).

Consolidating these two into a single XPath-generating function means adjusting code in a few places to use the new code.  I'm not so comfortable doing that.  So I'd propose that any changes to their code wait until after we have a common implementation for GetXPathForNode already in the tree.

Other use cases:

* DOM Inspector might have use for it in selecting nodes.  (This is a big maybe; the assumption being we might junk the current DOM Nodes viewer, the left-hand panel, tree construction for a proper XUL tree with descendants.)

* If I'm constructing testcases against Mozilla (or any particular application), being able to get a XPath to a particular node and store it in a test script would be nice.

* Concurrently, the evaluate() method and GetXPathForNode() could be used to test against each other and create testcases for both.

I'm sorry I can't come up with more solid use cases than this at present.  I've always been rather poor at articulating them for what I perceive as an incredibly useful feature... ;)
> The only part that makes sense to me is when we're not in a HTML form

That's the only part of that function relevant to this bug.  Note that it doesn't matter exactly what string you generate here as long as it starts with "o>" and encodes the position in the DOM.
Okay, I'm starting to get one idea about how GetXPathForNode should work.  Specifically, with TIMTOWTDI applying by principle, perhaps this should have two arguments passed in.  The first is the node we're trying to find a path to.

The second would be an unsigned long reflecting bitwise flags as a sum.  0x01 could indicate, for example, to ignore ID attributes and go all the way to the document node.  0x02 could indicate to ignore node names/types and use child indexes only. (instead of /bar[1], use [12]).

Likewise, for consumers that like to have attribute information included in the returned XPath, we could reserve a block of flag numbers (0x00010000 through 0x80000000) that represent specific attributes to include at each step, if they're available.  We wouldn't have to define all these attributes now; just reserving them for future use would be good enough.  Possible values include:

* id
* href
* class
* target
* name
* src
* xml:id
* xlink:href
* xmlns

We should start a wiki page to document a proposed implementation for GetXPathForNode.  If we do go with a bitwise flags combination such as above, it will require defining an interface similar to nsIDOMNodeFilter.
//[5][7][11] will not do what you want (in fact, it's not even a valid expression). What you want is  /node()[5]/node()[7]/node()[11]. Or, if you're just counting elements, /*[5]/*[7]/*[11].

Come to think of it. You probably do want to just count elements to avoid running into a bug in the current xpath implementation. XPath is supposed to merge adjecent textnodes which we currently do not do (for performance reasons). If you count all nodes you will be bitten once we fix that bug.
Bug 208500 is another use case.
Blocks: 208500
I've posted a first-draft API and explanation at:
http://wiki.mozilla.org/DOM:XPath_Generator

Feedback welcome; I would like to provide a good, useful API and implementation for this, which everyone can agree on at a basic level.  Please comment to the wiki and not to this bug on API/explanation issues.
Summary: nsDocument::GetXPathForNode? → Implement XPath generator
I'm not so for the include/excludeAttribute logic, I'd rather have a scalar
"fuzziness" and have a defined order of stuff to take at which level.

include/exclude would make sense on each level, and that is more inspection 
than is needed to actually generate the xpath by hand.
Thus I'd propose that the xpath generator to be just "greedy". Take all attrs in
the most strict mode, and all positions then. In the least greedy mode, do no 
predicates at all, potentially generating ambiguous results on evaluating the 
resulting expression.

On the return type, I'd rather have that generate an XPath Expression object,
we could add a toString() for that, and do something usefull to get the 
namespace resolver. Note that we don't have a serialization/parsing construct
for namespace resolvers at all yet, which I guess should be part of this bug.

This kind of thing would be useful for 
http://www.melez.com/mykzilla/2006/01/son-of-live-bookmarks.html, too.
(Oops, ok, pornzilla, too.)
Both of which only make sense if we serialize both the expression and the
namespace resolver (assuming that we need to handle XML to be future proof).
(In reply to comment #9)
> I'm not so for the include/excludeAttribute logic, I'd rather have a scalar
> "fuzziness" and have a defined order of stuff to take at which level.
...
> Thus I'd propose that the xpath generator to be just "greedy".
...
> On the return type, I'd rather have that generate an XPath Expression object

Axel, please feel free to post your improved API on the wiki page, provided that you do not replace or edit the one that's already there.
peterv has granted moa to implement this new feature, provided he sees the code before check-in.  Therefore, I will specifically seek his review on any patches for this bug.
Assignee: general → ajvincent
clarification:  peterv wants this under mozilla/content/xslt/src/xpath.

Accepting bug; I am now committed to implementation.
Status: NEW → ASSIGNED
Hm.  I'm concerned I won't be able to get the resolver from a XPathGenerator instance per bug 209701...
Depends on: 209701
Peterv, can you explain your reasoning for wanting this in xpath instead of an extension? Are we planning on using this code in gecko itself, or in Firefox? This seems (at least at this stage of development) like it is feature bloat and ought to live in an extension unless that is impossible for technical reasons I'm not aware of.
If we take the interface proposed on the wiki, it could be simply js, IMHO.
If we take an interface that returns an xpath expression, as I suggested in #9,
we'd need to add it to our internal engine, unless we expose writable scriptable
xpath expressions. Not sure what xforms needs here.

When it comes down to add "get xpath" to DOMI, of course, we need that in the 
tree. I'm not sure about the contentkey function-usecase, nor the 
viewpartialsource one, though. I'd need to see how those get simpler, or even
smaller, if we talk about bloat.

Alex, would you be able to make impls for both, so that we can compare?
Giving a sample patch for one of the internal use cases would probably be a good
argument, too.
(In reply to comment #14)
> Peterv, can you explain your reasoning for wanting this in xpath instead of an
> extension? Are we planning on using this code in gecko itself, or in Firefox?
> This seems (at least at this stage of development) like it is feature bloat and
> ought to live in an extension unless that is impossible for technical reasons
> I'm not aware of.

Comment 3 talks about using this for GenerateStateKey, I certainly don't want Gecko depending on an extension. Apart from that, I don't have strong feelings about where this should live. It does seem that there are already a bunch of usecases which makes this look like a more core feature imho, but if we want to make all of them depend on an extension that's fine with me too.
Depends on: 288513
While GenerateStateKey could use this, i'm not sure there is much of an advantage to it. The strings generated by GenerateStateKey are never used to go from string to node, so using a valid xpath expression has little value. But it might save a few lines of code.

Viewpartialsource would not be able to use this, it needs to do exactly what it is doing right now.
All things considered, if GenerateStateKey is the only reason to put it in content, this makes sense only under extensions.
Not neccesarily, if we think this is something that extensions would have use for then I think it might be usefull to include in the platform.

It all comes down to the usefullness-to-size ratio. So the smaller you can make the implementation (both source lines and binary size matters) the better.
In that case, I'll write a patch, and we can start with that at the appropriate time.
This doesn't actually do anything yet, other than compile and expose the XPathGenerator constructor.  Still, I'm posting this "vanilla" patch for reference (in case my own patch code gets messed up, and in case someone wants to create their own constructor in the future).
Attached file basic functionality test (obsolete) —
This testcase runs through a large set of nodes and tries to generate XPaths as strings from each node to each node.  None of the searchFlags functionality or adding of custom namespaces is tested here.
I've successfully tested nsXPathGeneratorResolver::LookupNamespaceURI and ::AddNamespace.  I've not yet tested the (internal so far) ::LookupPrefix, so although it compiles, it may be crash-happy.
Attachment #210657 - Attachment is obsolete: true
Attached file basic functionality test (fixed) (obsolete) —
The prior testcase was broken.  Currently, the XPathGenerator fails 98.1% of this test (it passes on the identity check, basically).
Attachment #210663 - Attachment is obsolete: true
I got lucky this time.  Relatively few compile errors, and no crashes.  I still haven't adequately tested the attribute methods, but that testing will happen as I start to implement the actual generateXPath() method.

Concerns:
* OOM.  I'm just not properly handling that.
* Splitting the attribute map records into namespace URI, local name and includeAttrValue fields again.  That should be trivial with hashtable enumerators and nsString::FindChar(NS_LITERAL_STRING(" ")).
* Verifying that the data is getting stored properly.  I'll probably write a debugging enumerator for that.
Attachment #211043 - Attachment is obsolete: true
Attached patch bug fixes (work in progress) (obsolete) — Splinter Review
The previous patch was nicely broken.  This code adds a few assertions for sanity checking and fixes them.

I'm trying to be considerate and use console messages for when the user does something wrong, and assertions for internal errors.  That's the way it should be, naturally.
Attachment #211238 - Attachment is obsolete: true
No longer depends on: 209701
FYI, if XPathGenerator ever supports crossing content boundaries (anonymous-to-real, or contentDocument-to-frameElement), it will need functions for that implemented natively in our XPath.  Reference bug 326745 and http://wiki.mozilla.org/DOM:XPath_Generator#XPath:_Content_Boundaries .
Depends on: 326745
Attached patch First draft patch for feedback (obsolete) — Splinter Review
Ladies and gentlemen, this patch is, generally speaking, a complete implementation of XPathGenerator (minus the generateXPointer functionality, for which I need a testcase).

It is very unstable.

Running it against attachment 211049 [details], it takes little time to crash (in XPCNativeWrapper.cpp).  There are also assertion failures taking place while running the test.

If anyone's interested in bleeding-edge patches, please offer whatever you can to help me fix it.
Attachment #211364 - Attachment is obsolete: true
FYI, you're not always AddReffing |out| parameters, 
for example in nsXPathGenerator::GetCommonAncestor.
Depends on: 327880
Attachment #211564 - Attachment is obsolete: true
No longer depends on: 327880
Attached file functionality test (obsolete) —
Attachment #211049 - Attachment is obsolete: true
Attached patch First mostly-usable patch (obsolete) — Splinter Review
Run against the functionality test, I get an approximate failure rate of 36.6%.  This rate is unacceptable, of course, but there is a reason.  This patch does not generate xpath's for text nodes, CDATA section nodes, or document type nodes.

(From comment 6)
XPath is supposed to merge adjecent textnodes which we currently do not do (for performance reasons).

Jonas, I can't quite find a bug filed for that; could you file one?

Should I wait for a fix for merging adjacent nodes?  Also, how should I handle doc type nodes?

For everyone else:  please feel free to bang on this patch.  Unofficial review comments are also welcome.
Attachment #212449 - Attachment is obsolete: true
Attached patch patch, v1 (obsolete) — Splinter Review
By e-mail, Jonas said:
"I would suggest not supporting expressions that depend on if nodes are
combined or not. It should be very few since most of the time you're
just dealing with elements.

The only case I can think of where adjacent nodes matter is if the
target node is a text node where any of the nodes siblings is a textnode
that should be 'merged'. You could even narrow that down to when the
target node is a textnode with an ajdacent textnode."

This patch has been updated to reflect that, and with this patch, only 4.29% of the tests in the testcase fail.  Most of those are indeed situations where two text or CDATA nodes are immediate siblings, and the few exceptions are in nodes not attached to the document (document fragment, disconnected Attr node) where our XPath code is not as generous as the generator.  I even get an OUT_OF_MEMORY error message from XPath on a disconnected Attr node and a path of ".".

peterv:  I fully expect this patch will be r-'d, but please review this patch seriously and completely.  I want to nail all the flaws in this patch and get r+ on the second run through.  Of particular concern is the generateXPointer() method, which you suggested and I have not implemented yet.  For all I know, the IDL is not what you intended.
Attachment #212772 - Flags: review?
Attachment #212772 - Flags: review? → review?(peterv)
Comment on attachment 212772 [details] [diff] [review]
patch, v1

>+    !(aSearchFlags > 0 && (aSearchFlags ^ SUPPORTED_FLAGS)),

    (aSearchFlags == 0) || (aSearchFlags & SUPPORTED_FLAGS),

A little typo.
Comment on attachment 212772 [details] [diff] [review]
patch, v1

Blah, the namespace resolver doesn't correctly work with XPath.

A secondary testcase for the searchFlags and includeAttributeNS() functionality, for this page:

var generator = new XPathGenerator();
generator.includeAttributeNS(null, "id", true);
var selectNode = document.getElementById("flag_type-192");
var targetNode = selectNode.lastChild.previousSibling;
generator.includeAttributeNS(null, "name", true);
generator.excludeAttributeNS(null, "name");
var path = ""
var xpathExpression = null;
var xpathResult = null;
var checkNode = null;
var UNORDERED_NODE = Components.interfaces.nsIDOMXPathResult.ANY_UNORDERED_NODE_TYPE;
path = generator.generateXPath(targetNode, document);
xpathExpression = document.createExpression(path, generator.resolver);
xpathResult = xpathExpression.evaluate(contextNode, UNORDERED_NODE, null);
checkNode = xpathResult.singleNodeValue;
checkNode == targetNode
generator.searchFlags = 1;
generator.generateXPath(targetNode, document);
xpathExpression = document.createExpression(path, generator.resolver);
xpathResult = xpathExpression.evaluate(contextNode, UNORDERED_NODE, null);
checkNode = xpathResult.singleNodeValue;
checkNode == targetNode
generator.searchFlags = 2;
generator.generateXPath(targetNode, document);
xpathExpression = document.createExpression(path, generator.resolver);
xpathResult = xpathExpression.evaluate(contextNode, UNORDERED_NODE, null);
checkNode = xpathResult.singleNodeValue;
checkNode == targetNode
generator.searchFlags = 3;
generator.generateXPath(targetNode, document);
xpathExpression = document.createExpression(path, generator.resolver);
xpathResult = xpathExpression.evaluate(contextNode, UNORDERED_NODE, null);
checkNode = xpathResult.singleNodeValue;
checkNode == targetNode
Attachment #212772 - Flags: review?(peterv)
The functionality test didn't cover the includeAttributeNS() or excludeAttributeNS tests.  This one does for HTML documents.
Attachment #212515 - Attachment is obsolete: true
Attached patch patch, v1.1 (obsolete) — Splinter Review
Let's try that again.  This patch generates the same statistics on the functionality test, passes the includeAttribute test (which also tests searchFlags), and also generates correct XPath results with includeAttributeNS() applied to a modification of the functionality test designed to check for namespaced attributes.

In short, I'm passing every reasonable test I can think of, save the XPointer tests, which I'm not sure how to write.
Attachment #212772 - Attachment is obsolete: true
Attachment #212848 - Flags: review?(peterv)
Comment on attachment 212848 [details] [diff] [review]
patch, v1.1

Argh, my functionality test was giving false positives.  Actual error rate is 38%.
Attachment #212848 - Flags: review?(peterv)
Attached file functionality test fixed again (obsolete) —
Attachment #212512 - Attachment is obsolete: true
Attached patch patch, v1.2 (obsolete) — Splinter Review
Fixed the patch again.
Attachment #212848 - Attachment is obsolete: true
Attachment #212854 - Flags: review?(peterv)
FWIW, I am (now) aware of the problems the renewed jst-review simulacrum lists.  However, I want to give peterv a real chance to review before I go through and start fixing them.
Comment on attachment 212854 [details] [diff] [review]
patch, v1.2

>Index: extensions/xpath-generator/nsXPathGenerator.cpp
>+nsXPathGenerator::nsXPathGenerator() :
>+  mSearchFlags(0)
>+{
  }

  nsresult nsXPathGenerator::Init()
  {
>+  mResolver = new nsXPathGeneratorResolver();
if (!mResolver)
  return NS_ERROR_OUT_OF_MEMORY;

if (!
>+  mIncludeAttrsMap.Init();
)
  return NS_ERROR_OUT_OF_MEMORY;

if (!
>+  mStepAttrsMap.Init();
)
  return NS_ERROR_OUT_OF_MEMORY;

  return NS_OK;
>+}

>Index: extensions/xpath-generator/nsXPathGenerator.h

>+class nsXPathGenerator : public nsIXPathGenerator
>+{
>+  public:
>+    nsXPathGenerator();
>+    virtual ~nsXPathGenerator();
>+
>+    NS_DECL_ISUPPORTS
>+
>+    // nsIXPathGenerator
>+    NS_DECL_NSIXPATHGENERATOR
...
      nsresult Init();
...
>+};

>Index: extensions/xpath-generator/nsXPathGeneratorModule.cpp

instead of:
>+NS_GENERIC_FACTORY_CONSTRUCTOR(nsXPathGenerator)
you need to use:
NS_GENERIC_FACTORY_CONSTRUCTOR_INIT(nsXPathGenerator, Init)

>Index: extensions/xpath-generator/nsXPathGeneratorResolver.cpp
>+nsXPathGeneratorResolver::nsXPathGeneratorResolver()
see other creature.
Attached patch patch, v1.3 (obsolete) — Splinter Review
This patch has been cleaned up per beaufour's jst-review simulacrum.  It also includes an optimization to refer to the document node by /, and the document element node by /* .
Attachment #212854 - Attachment is obsolete: true
Attachment #218124 - Flags: review?
Attachment #212854 - Flags: review?(peterv)
Attachment #218124 - Flags: review? → review?(peterv)
A bunch of comments in somewhat random order as I saw them coming.

From a source structure POV, I don't like all those files being in one dir.

General overview comments, there are quite a few external things that shouldn't
really be in there, like public header files, or extern symbols.

The comments in nsIXPathGenerator.idl are not good enough. searchFlags needs
a documented default value, and my expectation would be that it'd turn out to be
'1'.
Is including ID attributes really the inverse of going to the root of the 
document? (Did you test this for orphaned trees?)

The ID checking code should be based on nsIContent::GetIDAttributeName.
Unless you really want to be a good extension, but then you should restrict
your linking to glue and use nsStringAPI etc.

currentNode == targetNode should not return '.' independent of flags.

include/excludeAttributeNS don't specify defaults in the idl.

The resolver is not the resolver for all generated xpath expressions, I guess,
but rather the latest. At least the IDL doesn't say that you can't overwrite
namespace mappings (the code actually catches that and errors, though).
Talking of the resolver, don't |new nsAutoString| ever. That class is to be
used on the stack only.

I'm not convinced about the requirement for nsISecurityCheckedComponent, I
recall that we were fine with just nsIClassInfo in the extensions/transformiix 
days.

The hashtable iteration stuff probably only needs a friend statement. See that
XXX comment.

Is mStepCheckDuplicateID really a member?
(In reply to comment #44)

I'll look at incorporating some of these changes into the patch post-review.  I want peterv to review what I've got before submitting a new patch.

> From a source structure POV, I don't like all those files being in one dir.

*shrug* This shouldn't be too hard to fix.

> The comments in nsIXPathGenerator.idl are not good enough. searchFlags needs
> a documented default value, and my expectation would be that it'd turn out to
> be
> '1'.

The default is 0.  The idea is that you turn on specific bitwise flags to change the default behaviors of the generator.

> Is including ID attributes really the inverse of going to the root of the 
> document? (Did you test this for orphaned trees?)

I don't understand the first question.  As for what I tested it on, the functionality test does include a document fragment with a child element and two attributes.

> currentNode == targetNode should not return '.' independent of flags.

Can you point to the portion of the patch you are referring to?

> include/excludeAttributeNS don't specify defaults in the idl.

What defaults???  If you don't provide the arguments, you're pretty much guaranteed not to get anything to work.

> The resolver is not the resolver for all generated xpath expressions, I guess,
> but rather the latest. At least the IDL doesn't say that you can't overwrite
> namespace mappings (the code actually catches that and errors, though).

Each generator has a resolver associated with it.  You can use the generator and/or its resolver as many times as you want.

> Talking of the resolver, don't |new nsAutoString| ever. That class is to be
> used on the stack only.

Whoops.

> I'm not convinced about the requirement for nsISecurityCheckedComponent, I
> recall that we were fine with just nsIClassInfo in the extensions/transformiix 
> days.

When I was testing this code early on, I discovered QI calls to nsISecurityCheckedComponent for the resolver.  Without support for that, it just didn't work.

> Is mStepCheckDuplicateID really a member?

It needs to be available in more than one function, so I felt it was best not to pass it around as an extra argument.
+  if (targetNode == contextNode)
+  {
+    _retval.Assign(NS_LITERAL_STRING("."));
+    return NS_OK;
+  }

is the code piece.

Re ID vs /, I don't think I get the logic of mStepCheckDuplicateID.

For orphaned nodes, something like 
foo = someElement.removeElement(someElement.firstChild);
and then something further down should not throw UNEXPECTED. Which I think
you do if you don't get the parent of an element, which is perfectly fine
in this case.

Regarding those member variables, we went through some tough lessons to separate
state from non-state, just because you can put stuff somewhere without the
need to pass it as argument doesn't necessarily mean it's the right thing to do.
Member variables is for state for the object. Other information needed by memberfunctions should be passed as arguments. It'll lead to more misery than anything else to temporarily store arguments in members. For example, are you sure that your object will never re-enter?
(In reply to comment #46)
> Re ID vs /, I don't think I get the logic of mStepCheckDuplicateID.

Re-reading my patch, it's apparent that should not have been a member.  I had defined a bunch of temporary members to use in the PLDHashOperator functions.  This isn't one of them.  Whoops again.

> For orphaned nodes, something like 
> foo = someElement.removeElement(someElement.firstChild);
> and then something further down should not throw UNEXPECTED. Which I think
> you do if you don't get the parent of an element, which is perfectly fine
> in this case.

Can you give me a XML+JS testcase which demonstrates the intended behavior?  I still don't quite grok this.

> Regarding those member variables, we went through some tough lessons to
> separate
> state from non-state, just because you can put stuff somewhere without the
> need to pass it as argument doesn't necessarily mean it's the right thing to
> do.

As I mentioned above, the need to use hash operator functions made things difficult.  I would welcome a better approach.
(In reply to comment #46)
> +  if (targetNode == contextNode)
> +  {
> +    _retval.Assign(NS_LITERAL_STRING("."));
> +    return NS_OK;
> +  }
> 
> is the code piece.

I don't see the problem.  This is basically the fourth thing we do in the GenerateXPath() function.  The first one checks the pointers.  The second and third check for doctype nodes being passed in (which XPath doesn't support).

As I said, the search flags are to modify default behavior.  But when the target node and the context node are the same, what's the point of returning anything but "."?

> Re ID vs /, I don't think I get the logic of mStepCheckDuplicateID.

mStepCheckDuplicateID has nothing to do with / .

For ID-type attributes, the goal is to check to see if the user was stupid and gave the document two ID-type attributes with the same value.  It's used as part of nsXPathBaseNodeFilter, ultimately.
Comment on attachment 218124 [details] [diff] [review]
patch, v1.3

Your macros make the code hard to read, personally I'd just expand them all. Moreover, you're often double-checking things:

+    rv = attrNode->GetOwnerElement(getter_AddRefs(element));
+    NS_SUCCEED_OR_CONSOLE_RETURN(
+      element,
+      "GenerateXPath: Attributes must have an owner element.",
+      NS_ERROR_DOM_NOT_FOUND_ERR);
+    NS_ENSURE_SUCCESS(rv, rv);

NS_ENSURE_SUCCESS will never return here, since you already checked element. Please go through the patch and remove the redundant checks.

Please just make a function that logs to the console, and replace NS_SUCCEED_OR_CONSOLE_RETURN and MSTEPRESULT_SUCCEED_OR_CONSOLE_RETURN (which is used only once) with their expansion, calling that function.
I also think you're sending way too much to the console. You should define some clear error codes and throw those on certain conditions.

MSTEPRESULT_ASSERT_OR_RETURN depends on _this being declared, which means it either misses an argument or it needs to be defined locally where it's used.

Index: extensions/xpath-generator/nsXPathGenerator.cpp
===================================================================

+nsXPathGenerator::GetHashKeyData(const nsAString & hashKey,
+                                 nsAString & namespaceURI,
+                                 nsAString & localName)
+{
+  PRInt32 spaceIndex = hashKey.FindChar(' ');
+
+  NS_ASSERT_OR_RETURN(spaceIndex != 0,
+                      "What happened to our local name?",
+                      NS_ERROR_UNEXPECTED);

Why not just assert?

+nsXPathGenerator::IncludeAttributeNS(const nsAString & namespaceURI,
+                                     const nsAString & localName,
+                                     PRBool includeAttrValue)
+{
+  nsAutoString hashKey;
+  nsresult rv = GetAttrsHashKey(namespaceURI, localName, hashKey);
+  if (NS_FAILED(rv))
+    return rv;
+
+  nsAutoString checkNSURI;
+  nsAutoString checkName;
+  rv = GetHashKeyData(hashKey, checkNSURI, checkName);
+  if (NS_FAILED(rv))
+    return rv;
+
+  NS_ASSERT_OR_RETURN(namespaceURI.Equals(checkNSURI),
+                      "Hash key data failed sanity check!",
+                      NS_ERROR_UNEXPECTED);
+
+  NS_ASSERT_OR_RETURN(localName.Equals(checkName),
+                      "Hash key data failed sanity check!",
+                      NS_ERROR_UNEXPECTED);

I tried expanding all the function calls and this is what I came up with:

CheckLocalName(localName)
Concatenate localname, ' ', namespace URI
Split hashKey localname, namespace URI
CheckLocalName(localName)
Concatenate localname, ' ', namespace URI

And along the way you check that the concatenations match and that the split strings match. Please drop all this and just call GetAttrsHashKey once.

+nsXPathGenerator::ExcludeAttributeNS(const nsAString & namespaceURI,
+                                     const nsAString & localName)
+{
+  nsAutoString hashKey;
+  nsresult rv = GetAttrsHashKey(namespaceURI, localName, hashKey);
+  if (NS_FAILED(rv))
+    return rv;
+
+  nsAutoString checkNSURI;
+  nsAutoString checkName;
+  rv = GetHashKeyData(hashKey, checkNSURI, checkName);
+  if (NS_FAILED(rv))
+    return rv;
+
+  NS_ASSERT_OR_RETURN(namespaceURI.Equals(checkNSURI),
+                      "Hash key data failed sanity check!",
+                      NS_ERROR_UNEXPECTED);
+
+  NS_ASSERT_OR_RETURN(localName.Equals(checkName),
+                      "Hash key data failed sanity check!",
+                      NS_ERROR_UNEXPECTED);

See above.

+nsXPathGenerator::GetResolver(nsIDOMXPathNSResolver **aResolver)
+{
+  NS_ENSURE_ARG_POINTER(aResolver);
+  *aResolver = mResolver;
+  NS_ADDREF(*aResolver);

NS_ADDREF(*aResolver = mResolver);
There's plenty more of these in the patch.

+nsXPathGenerator::AddParentStep(nsAString & path)
+{
+  if (!path.IsEmpty())
+    path.Append(NS_LITERAL_STRING("/"));

Append('/')
There's plenty more of these in the patch.

+  path.Append(NS_LITERAL_STRING(".."));

AppendLiteral
There's plenty more of these in the patch.

+nsXPathGenerator::GetRootAncestor(nsIDOMNode* aNode,
+                                  nsIDOMNode** aRangeNode)

+  rv = root->GetNodeType(&nodeType);
+  NS_ENSURE_SUCCESS(rv, rv);
+
+  if (nodeType != nsIDOMNode::DOCUMENT_NODE)
+  {
+    nsCOMPtr<nsIDOMNode> parent;
+    do {
+      rv = root->GetParentNode(getter_AddRefs(parent));
+      NS_ENSURE_SUCCESS(rv, rv);
+      if (parent)
+        root = parent;
+    } while (parent);
+    // We can't assert for nsIContent on root; aNode may be an unattached Attr.
+  }
+
+  *aRangeNode = root;
+  NS_IF_ADDREF(*aRangeNode);

How can root be null here?

+nsXPathGenerator::GetCommonAncestor(nsIDOMNode * aTargetNode,
+                                    nsIDOMNode * aContextNode,
+                                    nsIDOMNode** aCommonAncestor,
+                                    nsIDOMNode** aIterator,
+                                    nsAString & pathToAncestor)

+  else if (mSearchFlags & 0)

Huh? This will never be true.


+  rv = GetRootAncestor(aTargetNode, getter_AddRefs(targetRoot));

+    targetAncestor3 = do_QueryInterface(aTargetNode, &rv);

+  targetAncestor  = do_QueryInterface(targetAncestor3,  &rv);

+  rv = GetRootAncestor(targetAncestor, getter_AddRefs(targetRootAncestor));

targetRootAncestor == targetRoot, same for context* Why are you doing all this useless work?

+  NS_ASSERT_OR_RETURN(
+    contextRangeNode,
+    "What's going on here?",
+    NS_ERROR_UNEXPECTED);
+
+  if (contextRangeNode == targetRangeNode)
+  {
+    /* If one is an attribute the other owns, or if both are attributes on
+     * the same element.
+     */
+
+    *aCommonAncestor = contextRangeNode;
+    NS_IF_ADDREF(*aCommonAncestor);

How can contextRangeNode be null?


+    NS_ASSERT_OR_RETURN(
+      positionFlags & (
+        nsIDOM3Node::DOCUMENT_POSITION_PRECEDING |
+        nsIDOM3Node::DOCUMENT_POSITION_FOLLOWING),
+      "Target range node must precede or follow the context range node!",
+      NS_ERROR_UNEXPECTED);
+
+    // How nice for CreateRange to require we set the end of a range first...
+    if (positionFlags & nsIDOM3Node::DOCUMENT_POSITION_PRECEDING)

+    else if (positionFlags & nsIDOM3Node::DOCUMENT_POSITION_FOLLOWING)

This can just be else.


+    {
+      rv = ancestorRange->SetEndAfter(contextRangeNode);
+      NS_ASSERT_OR_RETURN(
+        NS_SUCCEEDED(rv),
+        "What happened to the range call?",
+        rv);
+      rv = ancestorRange->SetStartBefore(targetRangeNode);
+      NS_ASSERT_OR_RETURN(
+        NS_SUCCEEDED(rv),
+        "What happened to the range call?",
+        rv);
+    }
+
+    // Set our return values.
+    rv = ancestorRange->GetCommonAncestorContainer(aCommonAncestor);
+

+    // Make sure aCommonAncestor really is an ancestor!
+    nsCOMPtr<nsIDOM3Node> commonAncestor3;
+    commonAncestor3 = do_QueryInterface(*aCommonAncestor, &rv);
+    NS_ENSURE_SUCCESS(rv, rv);
+
+    NS_IF_ADDREF(*aCommonAncestor);
+    NS_ASSERT_OR_RETURN(
+      aCommonAncestor,
+      "What happened with our range?",
+      rv);
+
+    rv = commonAncestor3->CompareDocumentPosition(targetAncestor,
+                                                  &positionFlags);
+    NS_ASSERT_OR_RETURN(
+      (positionFlags & nsIDOM3Node::DOCUMENT_POSITION_CONTAINED_BY) ||
+        (positionFlags == 0),
+      "Range didn't get a container of the target node, or the target node!",
+      NS_ERROR_UNEXPECTED);
+
+    rv = commonAncestor3->CompareDocumentPosition(contextAncestor,
+                                                  &positionFlags);
+    NS_ASSERT_OR_RETURN(
+      (positionFlags & nsIDOM3Node::DOCUMENT_POSITION_CONTAINED_BY) ||
+        (positionFlags == 0),
+      "Range didn't get a container of the context node, or the context node!",
+      NS_ERROR_UNEXPECTED);

No need for these checks, if GetCommonAncestorContainer is broken it should just be fixed.


+  if (contextAncestor != contextRangeNode)
+  {
+    AddParentStep(pathToAncestor);
+  }
+
+  *aIterator = targetAncestor;
+  NS_IF_ADDREF(*aIterator);
+
+  // Finish off the path. "../../../"
+  // Start to iterate from context node to ancestor.
+
+  // Special cases:  commmon ancestor is a document or document element.
+  PRBool empty = pathToAncestor.IsEmpty();
+  if (empty)

Why not make this the else for the |if (contextAncestor != contextRangeNode)| above?


+static PLDHashOperator PR_CALLBACK
+GenAttrsMapEnum(const nsAString & aKey,
+                PRBool aIncludeValue,
+                void* aClosure)

+  nsAutoString namespaceURI, localName, key;
+  key = aKey;
+  rv = _this->GetHashKeyData(key,

Why can't you use key directly?

+  nsAutoString prefix;
+  prefix.Truncate();

Useless Truncate.
There's plenty more of these in the patch.

+    attrStep.Append(prefix + NS_LITERAL_STRING(":"));

Append(prefix)
Append(':')


+nsresult
+nsXPathGenerator::GetStepWithoutIndex(nsAString & currentStep)
+{
+  /* In addition to generating the step, this also sets the parameters by which
+   * nodeListFilter will match nodes to this step.  So keep this closely
+   * synchronized with CheckNodeAgainstStep.
+   */
+  mStepNamespaceURI.Truncate();
+  mStepLocalName.Truncate();
+  mStepValue.Truncate();
+  mStepAttrsMap.Clear();
+  mStepCheckDuplicateID = PR_FALSE;
+  mStepResult = NS_OK;

As mentioned, these must not be members.


+      if (!(mSearchFlags & GO_TO_DOCUMENT_ROOT))
+      {
+        // ID-type check.  We'll use the mStepValue to store the ID attr value.

Don't you need to check if the node is in the document?


+      generatorWrapper data;
+      data.gen = this;
+      nsAutoString currentStepCached;
+      currentStepCached = currentStep;
+      data.currentStepPtr = &currentStepCached;
+
+      rv = mIncludeAttrsMap.EnumerateRead(GenAttrsMapEnum, &data);
+      currentStep = *(data.currentStepPtr);

Why can't you just pass currentStep into data? You append to currentStepPtr.

+      NS_ENSURE_SUCCESS(rv, rv);
+      NS_ENSURE_SUCCESS(mStepResult, rv);

This returns success?


+NS_IMETHODIMP
+nsXPathGenerator::GenerateXPointer(nsIDOMNode *targetNode,
+                                   nsIDOMNode *contextNode,
+                                   nsAString & _retval)
+{

The way to implement this would be to call GenerateXPath. Then you generate the xpointer by enumerating the mappings in mResolver, for each mapping you append |xmlns(prefix=uri) |, and then at the end you append |xpath1(expression)|. You will be outputting unused mappings, which isn't the end of the world. To fix that you'd have to be able to accumulate the mappings as you use them.

To test the result you can use evaluateXPointer(expression) on an XML document.

It's fairly easy to implement, but I'd be fine with not doing this in a first pass.

Index: extensions/xpath-generator/nsXPathGenerator.h
===================================================================

+struct generatorWrapper {

+  nsAutoString* currentStepPtr;

I'd make this a nsString reference.

+class nsXPathGenerator : public nsIXPathGenerator

A bunch of functions in this class can be made static.

+    // Public for nsXPathBaseNodeFilter to use as well

But it doesn't?

+    nsresult GetAttrsHashKey(const nsAString & namespaceURI,
+                             const nsAString & localName,
+                             nsAString & hashKey);
+
+    // Public for nsXPathBaseNodeFilter to use as well

Ditto.

+    nsresult GetHashKeyData(const nsAString & hashKey,
+                            nsAString & namespaceURI,
+                            nsAString & localName);

+    /* XXX ajvincent These probably shouldn't be public, but they were necessary
+     * to make (static PLDHashOperator PR_CALLBACK GenAttrsMapEnum) compile.
+     * Come back and see what's necessary to move these back into the private
+     * sector, probably by adding them to struct generatorWrapper.
+     */

Make GenAttrsMapEnum a friend of nsXPathGenerator.

+    nsCOMPtr<nsIDOMNode> mStepCheckNode;
+    nsString mStepNamespaceURI;
+    nsString mStepLocalName;
+    nsString mStepValue;
+    PRBool mStepCheckDuplicateID;
+    // This substitutes for rv in functions that can't return a nsresult.
+    nsresult mStepResult;

Index: extensions/xpath-generator/nsXPathGeneratorModule.cpp
===================================================================

+RegisterXPathGenerator(nsIComponentManager *aCompMgr,
+                  nsIFile *aPath,
+                  const char *registryLocation,
+                  const char *componentType,
+                  const nsModuleComponentInfo *info)
+{
+  nsresult rv = NS_OK;
+
+  nsCOMPtr<nsICategoryManager> catman =
+    do_GetService(NS_CATEGORYMANAGER_CONTRACTID, &rv);
+
+  if (NS_FAILED(rv))
+    return rv;
+
+  nsXPIDLCString previous;
+  rv = catman->AddCategoryEntry(JAVASCRIPT_DOM_CLASS,
+                                "XPathGenerator",
+                                XPATH_GENERATOR_DOMCI_EXTENSION_CONTRACTID,
+                                PR_TRUE, PR_TRUE, getter_Copies(previous));

You need to add nsIXPathGenerator to the JAVASCRIPT_DOM_INTERFACE category.

Index: extensions/xpath-generator/nsXPathGeneratorResolver.cpp
===================================================================

+NS_IMPL_ISUPPORTS2(nsXPathGeneratorResolver,
+                   nsIDOMXPathNSResolver,
+                   nsISecurityCheckedComponent)

You need to give this DOMCI instead of making it nsISecurityCheckedComponent, expand the macro and use NS_INTERFACE_MAP_ENTRY_DOM_CLASSINFO(XPathNSResolver).
(In reply to comment #50)
First, thank you for the once-over.  Stuff I don't repeat here I plan on fixing without question.

> (From update of attachment 218124 [details] [diff] [review] [edit])
> Index: extensions/xpath-generator/nsXPathGenerator.cpp
> ===================================================================
> 
> +nsXPathGenerator::GetHashKeyData(const nsAString & hashKey,
> +                                 nsAString & namespaceURI,
> +                                 nsAString & localName)
> +{
> +  PRInt32 spaceIndex = hashKey.FindChar(' ');
> +
> +  NS_ASSERT_OR_RETURN(spaceIndex != 0,
> +                      "What happened to our local name?",
> +                      NS_ERROR_UNEXPECTED);
> 
> Why not just assert?

The idea was to bail out in a non-debug environment immediately once we've hit a failure condition - especially here where it's really important to get it right.  Though I can probably just assert as you suggest.

> +nsXPathGenerator::GetCommonAncestor(nsIDOMNode * aTargetNode,
> +                                    nsIDOMNode * aContextNode,
> +                                    nsIDOMNode** aCommonAncestor,
> +                                    nsIDOMNode** aIterator,
> +                                    nsAString & pathToAncestor)
> 
> +  else if (mSearchFlags & 0)
> 
> Huh? This will never be true.

Actually, this is deliberate:

> +  rv = GetRootAncestor(aTargetNode, getter_AddRefs(targetRoot));
> 
> +    targetAncestor3 = do_QueryInterface(aTargetNode, &rv);
> 
> +  targetAncestor  = do_QueryInterface(targetAncestor3,  &rv);
> 
> +  rv = GetRootAncestor(targetAncestor, getter_AddRefs(targetRootAncestor));
> 
> targetRootAncestor == targetRoot, same for context* Why are you doing all this
> useless work?

The way I wrote this code, I wanted a spot for future code crossing anonymous-content and framed-document boundaries.  In other words, the code is already written to support situations that XPath doesn't currently (but may in the future - ref bug 326745) support.

> +      if (!(mSearchFlags & GO_TO_DOCUMENT_ROOT))
> +      {
> +        // ID-type check.  We'll use the mStepValue to store the ID attr
> value.
> 
> Don't you need to check if the node is in the document?

Not at this stage.  I actually did include this possibility in the functionality test.  Also, GetCommonAncestor has already taken care of that work, well before GetStepWithoutIndex() has been called.

I can add an assertion if you want.

> +      NS_ENSURE_SUCCESS(rv, rv);
> +      NS_ENSURE_SUCCESS(mStepResult, rv);
> 
> This returns success?

Oops.  That should've been NS_ENSURE_SUCCESS(mStepResult, mStepResult);.  (I'm noting this because I didn't see what you meant until now, probably the fourth time I looked at it.)

> +NS_IMETHODIMP
> +nsXPathGenerator::GenerateXPointer(nsIDOMNode *targetNode,
> +                                   nsIDOMNode *contextNode,
> +                                   nsAString & _retval)
> +{
> 
> It's fairly easy to implement, but I'd be fine with not doing this in a first
> pass.

Good.  Because I'm not even sure the API is correct.  It was a wild guess.
Please don't write code "for future use". Stuff like that tend to linger around for way too long without getting used and just cause confusion or worse.

Feel free to attach patches in this (or some other) bug with the code that you intend to use in the future though.
Escaping in xpath basically sucks.

If you have a string that contains ' you can do "He can't"
If you have a string that contains " you can do 'It is a "feature"'
If you have a string that contains both ' and " you are in pain. You'll have to do:

concat("He can't, it is a ", '"feature"')

Fortunatly the last case is pretty rare so no need to optimize for it. Just put |concat("| before and |")| after and then replace every " with |", '"', "|
(In reply to comment #50)
> I tried expanding all the function calls and this is what I came up with:
> 
> CheckLocalName(localName)
> Concatenate localname, ' ', namespace URI
> Split hashKey localname, namespace URI
> CheckLocalName(localName)
> Concatenate localname, ' ', namespace URI
> 
> And along the way you check that the concatenations match and that the split
> strings match. Please drop all this and just call GetAttrsHashKey once.

Would it be okay to leave these additional checks in #ifdef DEBUG ?
If you want to leave specific debug information, use a specific debug define.
#ifdef DEBUG_ajvincent or so.

Unspecific debug info should only be used for coping with user data, and then is
likely better off in prlog stuff.
Attached patch work-in-progress (pre-v2) (obsolete) — Splinter Review
I'm aware there are still a large number of problems with this patch.  I figure I'm about halfway to fixing the issues raised before and during peterv's initial review pass.

On the error reporting part, I filed bug 338614, to see if we can get an error module for extensions defined.
Attachment #218124 - Attachment is obsolete: true
Attachment #218124 - Flags: review?(peterv)
Comment on attachment 222712 [details] [diff] [review]
work-in-progress (pre-v2)

>+nsXPathGenerator::CheckNodeAgainstStep(nsIDOMNode *aNode,
>+                                       generatorWrapper * aGenWrapper,
>+                                       PRInt16 & _retval)
// ... to line 931
>+      nodeFilterWrapper data;
>+      data.gen = this;
>+      data.checkNode = aElement;
        data.stepResult = NS_OK;
>+      data._retval = nsIDOMNodeFilter::FILTER_ACCEPT;

Seems I left that one line off, and it caused nice horkage on my Linux build.
Alias: XPathGenerator
Attached patch patch, v2 (obsolete) — Splinter Review
To the best of my knowledge, I've fixed 98% of the nits pointed out by pike, sicking, peterv, and myself.  The remaining 2%:

* All the files still live in one directory.  (comment 44)
* Generating XPath's for attributes with USE_DESCENDANT_AXIS violates the XPathGenerator spec slightly.  (See nsXPathGenerator::GenerateXPath, near the end of the function, for a detailed comment.)
* I have not tested the proper escaping of quotes for XPath (comment 53).
Attachment #222712 - Attachment is obsolete: true
Attachment #226472 - Flags: review?(peterv)
Recently, I've been thinking I should move the AddNamespace() method to a new interface just for the namespace resolver.  That would make it more useful as a standalone object.  Thoughts?
If it's just one method I don't see a big use for it. If code-reuse is all you're aiming for that can be done in the implementation either way. Keep things simple
I might be missunderstanding your comment in the patch, but I think you've missunderstood the xpath syntax.

//@foo[5] does not refer to the fifth foo-attribute in the document. It refers to the fifth foo-attribute in all elements. Of course, any element can have at maximum one foo-attribute (since we're talking about foo in the null namespace) so the expression will never return any nodes.
Comment on attachment 226472 [details] [diff] [review]
patch, v2

I gotta say I find the code hard to follow. There's not nearly enough comments, the variable names often don't make sense and you seem to lose track of them yourselve (you often init them multiple times or QI back and forth or store the same value in multiple variables).

Replace all Append(NS_LITERAL_STRING("...")) with AppendLiteral.
Replace Append/Assign(NS_LITERAL_STRING("x")) with Append/Assign('x')
There's still useless Truncate calls, even though I already commented on that.

Sometimes you do

  if (NS_FAILED(rv))
    return rv;

and sometimes

  NS_ENSURE_SUCCESS(rv, rv);

Choose one.

Don't use aFoo to name local variables. Do use aFoo to name arguments to functions.

> Index: extensions/xpath-generator/nsXPathGenerator.cpp
> ===================================================================

> +/**
> + * Get a properly escaped value (mainly for handling quote characters)
> + *
> + * @param rawString The original value string.
> + *
> + * @return XPath string for the value.
> + */
> +static nsString
> +EscapeValueForXPath(const nsAString & rawString)
> +{
> +  const char singleQuote = "'"[0];
> +  if (rawString.FindChar(singleQuote) == -1)
> +  {
> +    return NS_LITERAL_STRING("'") + rawString + NS_LITERAL_STRING("'");
> +  }
> +
> +  if (rawString.FindChar('"') == -1)
> +  {
> +    return NS_LITERAL_STRING('"') + rawString + NS_LITERAL_STRING('"');
> +  }
> +
> +  nsString value = NS_LITERAL_STRING("concat(");
> +  value.Append(singleQuote);
> +  nsString data;
> +  data.Assign(rawString);
> +  PRInt32 index = data.Find(NS_LITERAL_STRING("'"));
> +
> +  while (index != -1)
> +  {
> +    // foo'bar'baz -> concat('foo', "'", 'bar', "'", 'baz');
> +    nsString subValue;
> +    data.Left(subValue, index);
> +    value.Append(NS_LITERAL_STRING("', "));
> +    value.Append('"');
> +    value.Append(singleQuote);
> +    value.Append('"');
> +    value.Append(NS_LITERAL_STRING(", '"));
> +    data.Cut(0, index + 1);
> +    index = data.Find(NS_LITERAL_STRING("'"));
> +  }
> +  value.Append(data);
> +  value.Append(singleQuote);
> +  value.Append(')');
> +
> +  return value;
> +}

This function uses the string API in a very inefficient way. I didn't test it but I think something like this should do the trick:

static void
EscapeValueForXPath(const nsString & aString, nsString & aResult)
{
  aResult.Truncate();

  PRInt32 quote = aString.FindChar('"');
  PRBool noDoubleQuote = quote == kNotFound;
  if (noDoubleQuote || aString.FindChar('\'') == kNotFound) {
    PRUnichar quote = noDoubleQuote ? '"' : '\'';

    aResult.Append(quote);
    aResult.Append(aString);
    aResult.Append(quote);

    return;
  }

  // aString has both single and double quotes.

  aResult.AppendLiteral("concat(\"");

  PRInt32 start = 0;

  while (quote != kNotFound) {
    aResult.Append(Substring(aString, start, quote));
    aResult.AppendLiteral("\", '\"', \"");

    start = quote + 1;
    quote = aString.FindChar('"', start)
  }

  aResult.Append(Substring(aString, start));
  aResult.AppendLiteral("\")");
}


> +NS_IMETHODIMP
> +nsXPathGenerator::SetSearchFlags(PRUint32 aSearchFlags)
> +{
> +  /* Filter out for unsupported flags. Remember, 0x01000000 and higher are
> +   * reserved for custom implementations, and must be left undefined in the
> +   * official mozilla.org definition.  So keep this value under 0x01000000
> +   * in all patches to mozilla.org's native XPathGenerator.
> +   */
> +  const PRUint32 SUPPORTED_FLAGS = 0x00000003;
> +
> +  if ((aSearchFlags != 0) && !(aSearchFlags & SUPPORTED_FLAGS))
> +    return NS_ERROR_NOT_IMPLEMENTED;

Can't you just check for aSearchFlags < SUPPORTED_FLAGS

> +nsresult
> +nsXPathGenerator::GetHashKeyData(const nsString & hashKey,
> +                                 nsString & namespaceURI,
> +                                 nsString & localName)
> +{
> +  PRInt32 spaceIndex = hashKey.FindChar(' ');
> +
> +  NS_ASSERTION(spaceIndex != 0, "What happened to our local name?");
> +
> +  namespaceURI.Truncate();
> +  localName.Truncate();
> +
> +  if (spaceIndex > 0)
> +  {
> +    hashKey.Left(localName, spaceIndex);
> +    hashKey.Right(namespaceURI, hashKey.Length() - (spaceIndex + 1));
> +  }
> +  else
> +  {
> +    localName = hashKey;
> +    SetDOMStringToNull(namespaceURI);
> +  }
> +
> +  return NS_OK;

This should be a void function if you only return NS_OK.

> +nsresult
> +nsXPathGenerator::GetRootAncestor(nsIDOMNode* aNode,
> +                                  nsIDOMNode** aRangeNode)

aRangeNode seems like a bad name.

> +  nsCOMPtr<nsIContent> rootAsContent = do_QueryInterface(root);
> +  // We don't care about rv here:
> +  if (rootAsContent && rootAsContent->IsInDoc())
> +  {
> +    nsCOMPtr<nsIDOMDocument> doc;
> +    rv = root->GetOwnerDocument(getter_AddRefs(doc));
> +    NS_ASSERTION(doc, "Failed to get a document when root said we had one!");
> +    root = doc;
> +  }

  if (rootAsContent)
  {
    nsIDocument *doc = rootAsContent->GetCurrentDoc();
    if (doc) {
      root = do_QueryInterface(doc);
    }
  }

> +  NS_ADDREF(*aRangeNode = root);

root.swap(*aRangeNode);

> +  return NS_OK;
> +}
> +
> +nsresult
> +nsXPathGenerator::GetRangeNode(nsIDOMNode* aNode,
> +                               nsIDOMNode** aRangeNode)
> +{
> +  PRUint16 nodeType = 0;
> +  aNode->GetNodeType(&nodeType);
> +  if (nodeType == nsIDOMNode::ATTRIBUTE_NODE)
> +  {
> +    nsresult rv;
> +    nsCOMPtr<nsIDOMAttr> attrNode = do_QueryInterface(aNode, &rv);
> +    NS_ENSURE_SUCCESS(rv, rv);
> +    nsCOMPtr<nsIDOMElement> element;
> +    rv = attrNode->GetOwnerElement(getter_AddRefs(element));
> +    NS_ENSURE_SUCCESS(rv, rv);
> +    *aRangeNode = element;
> +  }
> +  else
> +    *aRangeNode = aNode;
> +
> +  NS_ADDREF(*aRangeNode);
> +  return NS_OK;
> +}
> +
> +nsresult
> +nsXPathGenerator::GetCommonAncestor(nsIDOMNode * aTargetNode,
> +                                    nsIDOMNode * aContextNode,
> +                                    nsINode** aCommonAncestor,
> +                                    nsString & pathToAncestor)

> +    // How nice for CreateRange to require we set the end of a range first...

Couldn't you do SelectNode, SetEndAfter?

> +  nsCOMPtr<nsIDOMNode> ancestorAsNode = do_QueryInterface(*aCommonAncestor, &rv);

You have aCommonAncestor as nsIDOMNode when you set it. You should store it in ancestorAsNode there and set aCommonAncestor at the end where you return NS_OK.

> +static PLDHashOperator PR_CALLBACK
> +GenAttrsMapEnum(const nsAString & aKey,
> +                PRBool aIncludeValue,
> +                void* aClosure)

> +  nsString* currentStepPtr = data->currentStepPtr;
> +  currentStepPtr->Append(attrStep);

No need for currentStepPtr

> +nsresult
> +nsXPathGenerator::GetStepWithoutIndex(nsINode * aIterator,
> +                                      nsString & currentStep,
> +                                      generatorWrapper & data)

> +    // XXX ajvincent This can't be right... too many lines for something simple
> +    nsString currentStepCached;
> +    currentStepCached = currentStep;
> +    data.currentStepPtr = &currentStepCached;

currentStepPtr should just be a reference instead, so you can set it to currentStep.

> +      rv = aGenWrapper->stepAttrsMap.EnumerateRead(FilterAttrsMapEnum, &data);
> +      NS_ENSURE_SUCCESS(rv, rv);
> +      NS_ENSURE_SUCCESS(data.stepResult, data.stepResult);

Since you return PL_DHASH_STOP if NS_FAILED(data.stepResult) this will never be reached, as EnumerateRead will return an error. I think you should just drop stepResult and return the error from EnumerateRead.

> +nsresult
> +nsXPathGenerator::GetStep(nsINode* iterator,
> +                          nsINode* aRoot,
> +                          nsXPathBaseNodeFilter * nodeListFilter,
> +                          nsString & currentStep,
> +                          generatorWrapper & data)

> +  nsCOMPtr<nsIDOMNode> assertCheckNode = do_QueryInterface(iterator);
> +  if (assertCheckNode != data.stepCheckNode)
> +  {
> +    // We tend to crash much later QI'ing data.stepCheckNode.
> +    NS_ERROR("Iterator and stepCheckNode aren't the same?  Prepare to crash!");
> +    data.stepCheckNode = assertCheckNode;

This should be a DEBUG-only check, you added all the callers.

> +  }
> +
> +  nsresult rv;
> +  nsCOMPtr<nsIDOMNode> iteratorAsNode = do_QueryInterface(iterator, &rv);

This is the same as stepCheckNode.

> +  nsCOMPtr<nsIDOMNodeFilter> nodeFilter;
> +  nodeFilter = do_QueryInterface(nodeListFilter, &rv);

No need for this variable.

> +  while (currentNode)
> +  {
> +    total++;
> +    if (currentNode == iteratorAsNode)
> +      index = total;

Why not break here?

> +/**
> + * Prepend the current step onto a common string.
> + */
> +static void
> +PrependStep(nsAString & currentStep,
> +            nsAString & prepend)
> +{
> +  if (prepend.IsEmpty())
> +    prepend = currentStep;
> +  else
> +    prepend = currentStep + NS_LITERAL_STRING("/") + prepend;

Use Insert.

> +NS_IMETHODIMP
> +nsXPathGenerator::GenerateXPath(nsIDOMNode *targetNode,
> +                                nsIDOMNode *contextNode,
> +                                nsAString & _retval)

> +  generatorWrapper* data = new generatorWrapper;

Who delete's this? I bet this should be |generatorWrapper data;|

> +  data->stepCheckDuplicateID = PR_FALSE;
> +  nsCOMPtr<nsIDOMNode> iteratorAsDOM;
> +  if (checkAtStep) {
> +    PRUint32 aModifierFlags = nodeListFilter->GetModifierFlags();
> +    aModifierFlags |= nsXPathBaseNodeFilter::ONLY_CHILD_NODES;

You never called SetModifierFlags, so why GetModifierFlags?

> +    data->stepCheckDuplicateID = PR_FALSE;

You already did that.

> +    // Attribute?
> +    nsCOMPtr<nsIAttribute> iteratorAsAttr = do_QueryInterface(iterator);
> +    if (iteratorAsAttr)
> +    {
> +      /* XXX ajvincent Here, we violate the XPath Generator specification.
> +       *     Attributes are a special case for the DOM, and react badly to
> +       *     tree walkers (which GetStep uses).
> +       *
> +       *     In theory, a XPath of //@foo[5] can be accurately identified,
> +       *     as the attribute would have an owner element, and that owner
> +       *     element can be located in a particular order.  Attributes not
> +       *     on the same element do have a defined implementation order.
> +       *
> +       *     For this case, we'll return //element[5]/@foo.  The following code
> +       *     is copied from the while (!stop) block above.

As sicking said, something's wrong here. //@foo[5] doesn't select anything.

> +      data->stepCheckDuplicateID = PR_FALSE;

You already did that.

> +      currentStep = NS_LITERAL_STRING("descendant::") + currentStep;

Use Insert.

> Index: extensions/xpath-generator/nsXPathGenerator.h
> ===================================================================

> +struct generatorWrapper {
> +  nsXPathGenerator* gen;
> +  nsString stepValue;
> +  nsIDOMNode* stepCheckNode;
> +  nsString stepNamespaceURI;
> +  nsString stepLocalName;
> +  nsDataHashtable<nsStringHashKey, nsString> stepAttrsMap;
> +  PRBool stepCheckDuplicateID;
> +  nsresult stepResult;
> +  nsString* currentStepPtr;
> +};
> +
> +struct nodeFilterWrapper {
> +  nsXPathGenerator* gen;
> +  nsIDOMElement* checkNode;
> +  nsresult stepResult;
> +  PRInt16 _retval;
> +};

I find it very confusing to keep track of who changes values in these structs and what the values mean. Any way this could go away?

> Index: extensions/xpath-generator/nsXPathGeneratorModule.cpp
> ===================================================================

> +    NS_DOMCI_EXTENSION_ENTRY_BEGIN(XPathGeneratorResolver)
> +        NS_DOMCI_EXTENSION_ENTRY_INTERFACE(nsIDOMXPathNSResolver)
> +    NS_DOMCI_EXTENSION_ENTRY_END_NO_PRIMARY_IF(XPathGeneratorResolver,
> +                                               PR_TRUE,
> +                                               &kXPathGeneratorCID)

kXPathGeneratorCID seems very wrong here. I'd appreciate if you'd test your code before attaching a patch.

> +  { "XPathGeneratorResolver",
> +    NS_XPATHGENERATOR_CID,
> +    NS_XPATHGENERATOR_CONTRACTID,
> +    nsXPathGeneratorConstructor,

Wrong CID, CONTRACTID and Constructor.

> +    RegisterXPathGenerator

AFAIK you need this only once.

> Index: extensions/xpath-generator/nsXPathGeneratorResolver.cpp
> ===================================================================

> +nsresult
> +nsXPathGeneratorResolver::Init()
> +{
> +  /* member initializers and constructor code */
> +  mPrefixesUndeclared = 0;

Should be in an initializer list in the constructor.

> +nsresult
> +nsXPathGeneratorResolver::LookupPrefix(const nsAString & namespaceURI,
> +                                       nsString & _retval)

> +    while (mPrefixesUndeclared < PR_UINT32_MAX)
> +    {
> +      aPrefix.AssignLiteral("a");
> +      aPrefix.AppendInt(mPrefixesUndeclared++);
> +      LookupNamespaceURI(aPrefix, aNamespaceURI);
> +      if (aNamespaceURI.IsEmpty())
> +      {
> +        break;
> +      }
> +    };

Stray semicolon.

> Index: extensions/xpath-generator/nsXPathGeneratorResolver.h
> ===================================================================

> +class nsXPathGeneratorResolver : public nsIDOMXPathNSResolver

> +    nsClassHashtable<nsStringHashKey, nsString> mNamespaceURIMap;
> +    nsClassHashtable<nsStringHashKey, nsString> mPrefixMap;

Shouldn't these be nsDataHashTable's?
> +
> +  protected:
> +    /* additional members */

Useless block.

> Index: extensions/xpath-generator/nsXPathNodeFilter.cpp
> ===================================================================

> +nsXPathBaseNodeFilter::nsXPathBaseNodeFilter() :
> +  mModifierFlags(0)

mRoot and mData need to be initialized to nsnull.

> Index: extensions/xpath-generator/nsXPathNodeFilter.h
> ===================================================================

> +// Needed for the base node filter.
> +class nsXPathGenerator;

Unused.
Attachment #226472 - Flags: review?(peterv) → review-
(In reply to comment #62)
> (From update of attachment 226472 [details] [diff] [review] [edit])
> I gotta say I find the code hard to follow. There's not nearly enough comments,
> the variable names often don't make sense and you seem to lose track of them
> yourselve (you often init them multiple times or QI back and forth or store the
> same value in multiple variables).

Fixing all the issues in this patch will clearly not be easy.  I expect it will take me 2-4 days to wade through the review comments and track down the issues you've raised.

The hardest part to explain, I suppose, is how it all flows together.  I may just end up writing a big block comment near the beginning of nsXPathGenerator.cpp to summarize the flow.  Would that help, or did you have something else in mind?

> Sometimes you do
> 
>   if (NS_FAILED(rv))
>     return rv;
> 
> and sometimes
> 
>   NS_ENSURE_SUCCESS(rv, rv);
> 
> Choose one.

I had figured on using NS_ENSURE_SUCCESS only at the deepest level, and NS_FAILED at higher levels.  I'll change them all to NS_ENSURE_SUCCESS, even though that could mean significantly more noise in the debug console.

> > +  data->stepCheckDuplicateID = PR_FALSE;
> > +  nsCOMPtr<nsIDOMNode> iteratorAsDOM;
> > +  if (checkAtStep) {
> > +    PRUint32 aModifierFlags = nodeListFilter->GetModifierFlags();
> > +    aModifierFlags |= nsXPathBaseNodeFilter::ONLY_CHILD_NODES;
> 
> You never called SetModifierFlags, so why GetModifierFlags?

Actually, I do call on it.  Twice.  (Look two lines down.)  But I see your point; when I'm calling on GetModifierFlags, it's effectively zero at this time.

> > +struct generatorWrapper {
> > +  nsXPathGenerator* gen;
> > +  nsString stepValue;
> > +  nsIDOMNode* stepCheckNode;
> > +  nsString stepNamespaceURI;
> > +  nsString stepLocalName;
> > +  nsDataHashtable<nsStringHashKey, nsString> stepAttrsMap;
> > +  PRBool stepCheckDuplicateID;
> > +  nsresult stepResult;
> > +  nsString* currentStepPtr;
> > +};
> > +
> > +struct nodeFilterWrapper {
> > +  nsXPathGenerator* gen;
> > +  nsIDOMElement* checkNode;
> > +  nsresult stepResult;
> > +  PRInt16 _retval;
> > +};
> 
> I find it very confusing to keep track of who changes values in these structs
> and what the values mean. Any way this could go away?

I really don't think I can get rid of the structs altogether.  Unfortunately they are needed when I go into EnumerateRead (I think - I'll have to reread that part of my patch to be absolutely sure).

I can JavaDoc them to make them clearer, if that helps.

> > +    NS_DOMCI_EXTENSION_ENTRY_BEGIN(XPathGeneratorResolver)
> > +        NS_DOMCI_EXTENSION_ENTRY_INTERFACE(nsIDOMXPathNSResolver)
> > +    NS_DOMCI_EXTENSION_ENTRY_END_NO_PRIMARY_IF(XPathGeneratorResolver,
> > +                                               PR_TRUE,
> > +                                               &kXPathGeneratorCID)
> 
> kXPathGeneratorCID seems very wrong here. I'd appreciate if you'd test your
> code before attaching a patch.

I hope and pray you mean testing the construction of a XPathGeneratorResolver from JavaScript land.  Because I do heavily test XPathGenerator using the testcases I wrote for this bug before posting a patch.

> > +nsresult
> > +nsXPathGeneratorResolver::Init()
> > +{
> > +  /* member initializers and constructor code */
> > +  mPrefixesUndeclared = 0;
> 
> Should be in an initializer list in the constructor.

I don't quite follow.  Init() was introduced to handle OOM.

> > +nsresult
> > +nsXPathGeneratorResolver::LookupPrefix(const nsAString & namespaceURI,
> > +                                       nsString & _retval)
> 
> > +    while (mPrefixesUndeclared < PR_UINT32_MAX)
> > +    {
> > +      aPrefix.AssignLiteral("a");
> > +      aPrefix.AppendInt(mPrefixesUndeclared++);
> > +      LookupNamespaceURI(aPrefix, aNamespaceURI);
> > +      if (aNamespaceURI.IsEmpty())
> > +      {
> > +        break;
> > +      }
> > +    };
> 
> Stray semicolon.
> 
> > Index: extensions/xpath-generator/nsXPathGeneratorResolver.h
> > ===================================================================
> 
> > +class nsXPathGeneratorResolver : public nsIDOMXPathNSResolver
> 
> > +    nsClassHashtable<nsStringHashKey, nsString> mNamespaceURIMap;
> > +    nsClassHashtable<nsStringHashKey, nsString> mPrefixMap;
> 
> Shouldn't these be nsDataHashTable's?

I thought about that, but per http://developer.mozilla.org/en/docs/XPCOM:Hashtables#Which_Hashtable_Should_I_Use.3F it appeared they should be nsClassHashTables.
Re:

"Sometimes you do

 if (NS_FAILED(rv))
   return rv;

and sometimes

 NS_ENSURE_SUCCESS(rv, rv);"

Those are not equivalent, so it's perfectly reasonable to use both in the same codebase.  The former simply propagates a failure, while the latter fires an assertion/warning if an unexpected failure is detected.  Not all failure-code returns warrant a warning or assertion.
(In reply to comment #63)
> The hardest part to explain, I suppose, is how it all flows together.  I may
> just end up writing a big block comment near the beginning of
> nsXPathGenerator.cpp to summarize the flow.  Would that help, or did you have
> something else in mind?

A summary about how you build up expressions would be helpful, yeah. Especially the use of the treewalker, the attr hash, etc.

> I had figured on using NS_ENSURE_SUCCESS only at the deepest level, and
> NS_FAILED at higher levels.  I'll change them all to NS_ENSURE_SUCCESS, even
> though that could mean significantly more noise in the debug console.

> I hope and pray you mean testing the construction of a XPathGeneratorResolver
> from JavaScript land.  Because I do heavily test XPathGenerator using the
> testcases I wrote for this bug before posting a patch.

Well, clearly the XPathGeneratorResolver part wasn't tested, it constructs the wrong object.

> > > +nsXPathGeneratorResolver::Init()
> > > +{
> > > +  /* member initializers and constructor code */
> > > +  mPrefixesUndeclared = 0;
> > 
> > Should be in an initializer list in the constructor.
> 
> I don't quite follow.  Init() was introduced to handle OOM.

You don't need to handle OOM when setting mPrefixUndeclared.

> > > +    nsClassHashtable<nsStringHashKey, nsString> mNamespaceURIMap;
> > > +    nsClassHashtable<nsStringHashKey, nsString> mPrefixMap;
> > 
> > Shouldn't these be nsDataHashTable's?
> 
> I thought about that, but per
> http://developer.mozilla.org/en/docs/XPCOM:Hashtables#Which_Hashtable_Should_I_Use.3F
> it appeared they should be nsClassHashTables.

Then you should change

+  nsDataHashtable<nsStringHashKey, nsString> stepAttrsMap;

Please check with bsmedberg.

(In reply to comment #64)
> Those are not equivalent, so it's perfectly reasonable to use both in the same
> codebase.  The former simply propagates a failure, while the latter fires an
> assertion/warning if an unexpected failure is detected.  Not all failure-code
> returns warrant a warning or assertion.

Sure.
I strongly encourage people to use NS_ENSURE_SUCCESS rather then |if (NS_FAILED())|. The latter clutters the code a lot more and makes it harder to follow. The result is often that people check rv less than they should.

The one exception I can think of is if we bump into some warning a lot during normal operation. Then it is a good idea to convert to an |if| to keep debug output readable.

But I think it's fine to warn multiple times for a real error. Even if that error originates from the web developer or from OOM.

Of course, this is just the guidelines i follow when writing and reviewing code. I know other people disagree with me on some aspects of this. But please have good reasons for not following this.

And always consider code clarity and maintainability when writing code.
peterv, sicking: I've just written the following into my local nsXPathGenerator.cpp to summarize what goes on "under the hood" in the XPath Generator, as a comment just below the license.  Could you please read this and offer suggestions on what to change or clarify?

---

/* ajvincent Here's how the XPath Generator works.  A XPath looks like:

../../../../html:form/foo:section/html:p/html:textarea/text()

Each segment (before or after a slash) is considered a step.  Each step can be
simply a node name, or depending on the settings of the generator, could be a
compound step with several attributes and/or an index number indicating which
element among several matches for the rest of the step the XPath refers to.

A sample step could be "html:p" or it could be:

html:p[@class="foo"][@bar:baz][3]

An attribute is included by name and value via the IncludeAttributeNS() API
call if the third argument as true.  The name, but not the value, is included
if the third argument of the IncludeAttributeNS() API call is false.  Any
attribute name included may be removed via the ExcludeAttributeNS() API call.
If an attribute is not explicitly included, then for the purposes of the XPath
Generator it is ignored.

(01) User calls GenerateXPath().
(02) GenerateXPath() calls GetCommonAncestor() in order to determine which node
     is the closest ancestor of both the target node and the context node.  It
     also retrieves from GetCommonAncestor() the number of steps from the
     context node to the common ancestor node.
(03) GenerateXPath() initializes a nsIDOMNodeFilter for GetStep(), and passes in
     the node we are currently examining.
(04) GetStep() calls on GetStepWithoutIndex() to determine what the step matches
     in the generator's settings.
(05) For elements, the following sequence is observed within 
     GetStepWithoutIndex():
(05a) If the generator's settings permit the generator to stop when it hits an
      element with an ID-type attribute, the generator searches for the current
      node's ID-type attribute.  If it finds one, it returns "id('elementId')"
      for that step.
(05b) Otherwise, the first part of the step is the local name and possibly a
      namespace prefix.
(05c) Next, the attribute names stored in the generator via IncludeAttributeNS()
      are searched for in the element.  These attribute names are stored by
      namespace URI, local name, and an includeAttrValue flag in 
      mIncludeAttrsMap, and processed via the GenAttrsMapEnum function.
(05d) GenAttrsMapEnum receives the namespace URI, local name, and
      includeAttrValue flag for one attribute, and the current node (now known
      as an element).  If the element doesn't have an attribute matching the
      namespace URI and local name, GenAttrsMapEnum returns OK and will
      be called for the next attribute entry in mIncludeAttrsMap.
(05e) If the element does have a matching attribute, GenAttrsMapEnum gets a
      prefix from the namespace resolver if needed, and combines the prefix with
      the local name for the attribute match.
(05f) If the attribute entry in mIncludeAttrsMap requests a value,
      GenAttrsMapEnum includes that as well.
(05g) GenAttrsMapEnum also adds the namespace URI, local name and (if requested)
      the value to a stepAttrsMap.  This map is for attributes on the given step
      that XPathGenerator successfully matched.
(05h) By this point, a sample step in GetStepWithoutIndex looks somewhat like:

html:p[@class="foo"][@bar:baz]

(06) For attributes, GetStepWithoutIndex returns "@prefix:localName", not
     mentioning the attribute value.  (Since elements can only have one
     attribute matching a namespace URI and local name, this is okay.)
(07) For other node types, GetStepWithoutIndex returns the following:
     * Text, CharacterData: "text()"
     * Processing instruction: "processing-instruction(localName)"
     * Comment: "comment()"
     * Document, DocumentFragment: "/"
     * Other node types:  GetStepWithoutIndex "throws" a XPCOM error.
(08) GetStep takes over again, and tries to determine how many nodes match the
     step already given.  For attributes, it just returns OK, as elements can't
     have multiple attributes with the same namespace URI and local name.
(09) GetStep creates a TreeWalker using the XPathBaseNodeFilter GetStep
     received from GenerateXPath (see (3) above).  The TreeWalker filters for
     only nodes matching the type of the generator's current node, except in
     the case of character data nodes and text nodes; if the generator is
     checking for a text or CDATA node, it must check for matches among text
     and CDATA nodes combined.  Based on the generator's settings, the node
     filter will reject nodes which are not the current node nor its siblings.
(10) As the final check of nsXPathBaseNodeFilter::AcceptNode(),
     CheckNodeAgainstStep is called.
(11) For elements, the following sequence is observed:
(11a) Optionally check for the ID attribute to match the current step's ID.
(11b) Check the namespace URI and local name against the current step.
(11c) If we've passed the above criteria, we need to check the attributes
      on the tree walker's current node against the attributes stored in
      the current step.  These attributes were determined in (05g) above,
      and are now listed in stepAttrsMap.  This map then enumerates through
      for FilterAttrsMapEnum().
(11d) FilterAttrsMapEnum receives the namespace URI, local name, and
      includeAttrValue flag for one attribute, and the tree walker's current
      node (now known as an element).  If the element doesn't have an
      attribute matching the namespace URI and local name, FilterAttrsMapEnum
      tells CheckNodeAgainstStep to reject the tree walker's current node.
(11e) FilterAttrsMapEnum may also have a required attribute value from (05g).
      If it does, it checks the matching attribute's value for an exact match.
      If the two attribute values do not match, FilterAttrsMapEnum tells
      CheckNodeAgainstStep to reject the tree walker's current node.
(11f) Otherwise, FilterAttrsMapEnum tells CheckNodeAgainstStep to accept the
      node.
(12) For attributes, CheckNodeAgainstStep checks the namespace URI and the
     local name of the tree walker's current node against the current step.
(13) For other node types, CheckNodeAgainstStep accepts the node if the
     following is true:
     * Text, CharacterData: Always
     * Processing instruction: If the local names match.
     * Comment: Always
     * Document, DocumentFragment: Always
     * Other node types:  CheckNodeAgainstStep "throws" a XPCOM error.
(14) nsXPathBaseNodeFilter::AcceptNode accepts the results of
     CheckNodeAgainstStep and passes them on to the tree walker.
(15) Back in GetStep(), for every node which passed
     nsXPathBaseNodeFilter::AcceptNode, an index is incremented (starting at
     zero, and ultimately including the step's current node, so it must be at
     least one).  The list of accepted nodes indicates an ordered set for the
     step.  If the ordered set has more than one member, the index of the
     current step within the ordered set is recorded at the end of the step
     within brackets ([3]).
(16) By this point, a sample step in GetStepWithoutIndex looks somewhat like:

html:p[@class="foo"][@bar:baz][3]

(17) GetStep() returns this step to GenerateXPath().
(18) GenerateXPath inserts the step into the generated XPath at the right spot,
     then moves to the current step node's parent.
(19) If the new current step node is not the common ancestor retrieved from
     (02) above, GenerateXPath then passes the current step node into GetStep()
     again, basically starting over at (03) above.  The cycle ends when the
     current step node is the common ancestor.
(20) GenerateXPath() combines the path from the context node to the common
     ancestor node (02), with the path from the common ancestor node to the
     target node (18), and returns the final generated XPath to the user as a
     string.
 */
It's very hard to follow a recursive routine when written as a long list like that. For example it took me some time to understand that 11 was done inside 10. I would instead suggest that you split up the comment into the functions that it's documenting. So for GenerateXPath you'd say that it call GetStep for each step down the "child chain" to reach the desired node, and then document on GetStep how it builds the step-string.

Also, it seemed like petervs comment was regarding that some members in some datastructures were hard to understand how they were used. First of all in general that seems to me like the code has been optimized too much for something other then clarity when that's the case. However if that's not the case then please document those members in the actual struct itself.
Attached patch patch, v3 (obsolete) — Splinter Review
My apologies for the extended delay.  Too many things on my plate.  Many, many more comments in this code than before, plus some clean-up (answering review nits, plus a couple subtle bugs I noticed on my own).

(In reply to comment #62)
> There's still useless Truncate calls, even though I already commented on that.

I tried.  Unfortunately, I don't know which ones are useless and which aren't.

> > Index: extensions/xpath-generator/nsXPathGenerator.h
> > ===================================================================
> 
> > +struct generatorWrapper {
> > +  nsXPathGenerator* gen;
> > +  nsString stepValue;
> > +  nsIDOMNode* stepCheckNode;
> > +  nsString stepNamespaceURI;
> > +  nsString stepLocalName;
> > +  nsDataHashtable<nsStringHashKey, nsString> stepAttrsMap;
> > +  PRBool stepCheckDuplicateID;
> > +  nsresult stepResult;
> > +  nsString* currentStepPtr;
> > +};
> > +
> > +struct nodeFilterWrapper {
> > +  nsXPathGenerator* gen;
> > +  nsIDOMElement* checkNode;
> > +  nsresult stepResult;
> > +  PRInt16 _retval;
> > +};
> 
> I find it very confusing to keep track of who changes values in these structs
> and what the values mean. Any way this could go away?

I don't quite see how to get rid of the structs.  Because I have to go through EnumerateRead (in two separate functions), the wrappers do come in handy.
Attachment #226472 - Attachment is obsolete: true
Attachment #241805 - Flags: review?(peterv)
requesting blocking1.9 for the following reason:  I believe this component will be fairly light-weight and useful for developers.  The size of the compiled dll's in opt builds, I believe, is under 30KB.  Note I have not yet decided whether to request XPathGenerator be on by default or not.  This request is being made to see if it should be on the Gecko 1.9 Feature List.
Flags: blocking1.9?
Maybe this has already been covered, but https://addons.mozilla.org/firefox/1095/ seems to have an XPath generator and inspector, including UI, for 4KB packaged as an XPI.  Have we asked if we can just repurpose his code, or maybe even get him to do it for us?
(In reply to comment #71)
> Maybe this has already been covered, but
> https://addons.mozilla.org/firefox/1095/ seems to have an XPath generator and
> inspector, including UI, for 4KB packaged as an XPI.  Have we asked if we can
> just repurpose his code, or maybe even get him to do it for us?

*sigh* No, no one has asked him about it; by and large I wasn't aware it existed.

His approach is actually a little bit clever, but far from complete:
* He relies on DOM Level 1 methods, no real consideration given to namespaces at all.
* He gives no consideration to attributes (a very large part of my patch is in dealing with attributes, under the proposed API).
* His code will only match text and element nodes; other node types are effectively ignored.  Feed in a comment node; it'll give you a wrong answer.

I feel like the gent from IBM who is pushing XForms at the expense of Web Forms 2.0 - just looking at the code he's written, it's not enough, and I feel bad for saying so.  I grant that it shows potential; people could work on it to extend its capabilities significantly.

I actually tried writing my XPathGenerator code in JS first - and I would have loved to do it entirely in JS as a XPCOM component.  I nixed it because I couldn't create a constructor that could be applied to the window object from XPCOM (reference http://weblogs.mozillazine.org/weirdal/archives/009684.html for details).  This fatal flaw is why I went the C++ route.

It's a very good try for a start, but if I had to review that code for the tree, I'd say there'd be a lot of work to do on it.

In all fairness, I'll e-mail him and let him know about this bug.  It's not right for me to bash his code without him being aware of it and able to respond.
I guess I should ask what the use cases are that we're trying to support here.  The XPath Checker has been able to do pretty much everything I've needed so far for building microsummaries and most XSLT stuff, and if namespaces are a must then I suspect that it's not hard to get those added.

Is there a bug filed on the inability to create the appropriate constructor?  (And is creating a constructor a requirement?  I thought it was possible to register at least functions via the namespace stuff, and if not then we should probably fix that to address this use case -- if, indeed, a constructor is required to make this work.  For prototyping and experimentation it seems like even window.external.XPathGenerator would be sufficient to get us going, and I'm pretty sure the sidebar code exposes itself to content from a JS component.)

It's only bashing his code if his code doesn't meet the targets set out, I think.  I'm not sure what the requirements are here, or which ones we'd be willing to give on in order to have safer code sooner, for possible 1.9 turn-on.  Also, I think that even just providing this capability to chrome in the near term would have a fair bit of value, and the 4K-with-UI number makes me pretty happy with his work!
(In reply to comment #73)
> I guess I should ask what the use cases are that we're trying to support here.

Reference http://wiki.mozilla.org/DOM:XPath_Generator - I actually asked quite a few people for use cases.  Note it's slightly out of date:  the GenerateStateKey case doesn't apply.
 
> Is there a bug filed on the inability to create the appropriate constructor? 

Not that I'm aware of.  Basically it's a matter of documentation, of figuring out how to transform the C++ macros into JS code.  Some things are just too hard for me.  ;)
 
>Also, I think that even just providing this capability to chrome in
> the near term would have a fair bit of value, and the 4K-with-UI number makes
> me pretty happy with his work!

You won't get any complaints from me on the size, although it's probably closer to 10 KB than 4 KB.  Again, I prefer JS, but I felt I had no choice this time.
Does any of the usecases on that page require the attributes API? It wasn't explicit one way or the other.

Given that you say that your code is a fair bit bigger than his due to dealing with attributes, I think we should check if the attributes stuff is really needed.
I personally would like to see the attribute support, but for the use case I was working on (a dom event recorder/playback system), I'm not certain if I need it.  I could see using this in a dom inspector, and potentially for some other tools, such as generating xpaths for use in xslt.  These would be use cases in Komodo, I can't come up with any use cases in Firefox.
Sometimes the most robust XPath to a given node is a relative path from another node containing its label.

For example, the table cell containing the end time on eBay auction item pages lives in one of five different table rows, depending on various factors, but it is always in the same place relative to the cell containing its label, and the label itself ("End Time:") is always the same (for en-US anyway, and only until the auction ends, after which it becomes "Ended:"), so one of the more robust XPaths to that value is something like:

//tr[td='End time:']/td[2]

Or perhaps:

//td[contains(., 'End time:')]/following-sibling::td[1]

Or possibly a version rooted to the nearest node with a unique identifier:

id('FastVIPDetails')/table/tbody/tr[td='End time:']/td[2]
id('FastVIPDetails')/table/tbody/tr/td[contains(., 'End time:')]
  /following-sibling::td[1]

It would be useful for the generator to support generating such XPaths, perhaps when the caller indicates a preference for them.
The problem I see both with something like that and with attribute selectors is that I wonder who will decide what label/attributes to use? So far it seems like most use cases involve a node being picked and then code automatically generating an expression that selects that node.

But how is this code supposed to know which attributes to filter on, or which labels to choose? Seems like that would take a lot of human involvement. What UI would be used for that?

It seems like a nice idea, but I can't think of a way to really make it work. Additionally, if you build a lot of UI to let users somehow configure labels and attributes and such, you might as well build the expression too. You can always call into the existing XPath generator to generate subexpressions and append custom filters to that.
(In reply to comment #78)
>I wonder who will decide what label/attributes to use? So far it seems
> like most use cases involve a node being picked and then code automatically
> generating an expression that selects that node.
> 
> But how is this code supposed to know which attributes to filter on, or which
> labels to choose?

The way xpathgen's API is set up, the consumer calls on includeAttributeNS() or excludeAttributeNS() to instruct the generator what they expect.  The default is that all attributes are excluded from the xpath.  (The default is also that a xpath doesn't have to go to the root node; it stops when it finds an ID-type attribute on an element.)

My patch doesn't include any UI, and deliberately so.  It's up to users to determine how they use it; I'm trying to provide a baseline.  (This of course brings back the question of "do we need support for looking up attributes" - and I believe it would be useful enough to have.)

>You can always call into the existing XPath generator to generate subexpressions and append custom filters to that.

That's actually somewhat of the point; this generator is intended to be generic enough that it meets common uses.  As time goes on and people request additional features, we amend the interfaces and the code to adapt.
Yes, I know your patch only supplies the API. My question is, who is going to write UI that uses that API? And what would that UI look like? In other words, is the API really going to get any users?

This question is directed to the people that would be the users of this API. Myk? Shane? Is this something you've thought about?

> >You can always call into the existing XPath generator to generate
> > subexpressions and append custom filters to that.
> 
> That's actually somewhat of the point; this generator is intended to be 
> generic enough that it meets common uses.  As time goes on and people request
> additional features, we amend the interfaces and the code to adapt.

The thing is that you're never going to be able to cover all possible expressions people want. Myk already added a request for something that it can't do, and I could come up with plenty more.

What we should aim for is something that'll cover what most people will use.
What I need is, given a node in DOM, give me an xpath that I can serialize to a file, and at any later point, load and use to find the same node again.  I use my own home brew generator to do this, but would like the api to be in mozilla.  I'm very rusty on what I did before, but I recall having to do some *ugly* hack...I think it was to evaluate the xpath into anonymous nodes (eg. xbl).  Unfortunately I'm in crazy work mode and cannot put much thought into this at the moment.

I think a good UI to consider for this is Selenium.  The project I worked on, Casper (unreleased) is similar to Selenium, but specifically for testing XUL applications.
If you just provide a node then the attribute related APIs in this patch would be of no use to you.

The XBL thing I suspect is something different based on hazy recollections on my part. For nodes in XBL bindings you probably wouldn't be able to use the current API at all.
so first off, I'm really very hazy on the api right now, but my impression of what the attribute part of the api handles is....


<foo>
 <bar id="1"/>
 <bar id="2">
   <baz/>
 </bar>
</foo>

if I want to generate an xpath to element baz, it needs to look like

/foo/bar[id="2"]/baz

or something along that line.  You can always do /foo/bar[2]/baz, but then if I later insert a new bar element between 1 & 2, that xpath now fails.

What if you have

<foo hello="hi">
 <bar id="1"/>
 <bar id="2" color="green">
   <baz/>
 </bar>
</foo>

Would you want an expression like
/foo[hello="hi"]/bar[id="2"][color="green"]/baz

?

And what if the DOM is

<foo>
 <bar id="2" color="red"/>
 <bar id="2">
   <baz/>
 </bar>
</foo>

?

With the current API you have to specify which attributes you want to take into account and which you want to ignore, so it probably doesn't fit what you need.
(In reply to comment #80)
> Yes, I know your patch only supplies the API. My question is, who is going to
> write UI that uses that API? And what would that UI look like? In other words,
> is the API really going to get any users?
> 
> This question is directed to the people that would be the users of this API.
> Myk? Shane? Is this something you've thought about?

Yes, I've done some thinking about this.  In the microsummaries prototype I put together earlier this year, users were able to create microsummaries by dragging and dropping nodes from the page being summarized to a textbox, where the nodes appear as non-editable strings of their content.

In this simple case, the prototype created an XPath expression (based on the XPath Checker code, incidentally, which Brian Slesinsky gave me permission to reuse) starting from the nearest uniquely-identified ancestor of the selected node (or starting from the document root, if there is no uniquely-identified ancestor).

This may work ok for one-page generators made by individual users, but it's prone to bustage, especially for generators which should work across multiple similar pages (like all eBay auction item pages).  In those cases, a better approach would be to locate the node by its relationship to some nearby static information (like the label for the information in the node).

Imagine an extension that made it easier for developers to create microsummary generators.  Such an extension might have similar functionality to the earlier user-focused prototype, i.e. simple drag-and-drop or click-select to identify a node by its nearest uniquely-identified ancestor.

But it might also enable developers to create more robust expressions by clicking on two nodes and then generating an XPath expression from one to the other.  And it might include a toggle switch (f.e. a checkbox or a sticky button) for enabling "start by content" mode, which causes the XPath generator to locate the starting node by the content it contains.
I don't think there's a fully automatic way to generate an XPath that survives all page changes.  Even using id() can break - believe it or not, there are invalid pages out there that use the same id twice.

But you could certainly do better than what I did for XPath Checker.  When I wrote it I had various ideas for improving it, such as:

- add refactoring tools where you could modify the XPath in various ways (such as by adding or removing attributes) while still pointing to the same node.

- store the XPath along with an expected result in a local repository and have a way to check whether a website has changed.

- have a shared repository of XPath expressions where you could upload XPaths for various sites, and XPath Checker would use those as hints when generating XPaths.  Each website is going to be a little different and they're going to change over time as they redesign their pages.

There are lots of interesting things that could be done, but it's hard to tell without experience which ones are actually useful.  I've moved on to other things and I'm not likely to do more than the bare minimum to maintain XPath Checker, but if there's interest, I could upload the source code to a source repository somewhere and people could use it as a testbed of sorts for trying out various ways of generating XPaths.
There are lots and lots of things that we could do. My question is really, what would get used if we did? Would anyone use the attribute API in the current patch? If anyone would, have you thought about how you'd use it?
Speaking as an extension writer, I'd prefer a pure JavaScript library.  XPath Checker supports back to FireFox 1.0.2 for no particular reason except that I saw no reason to drop support for earlier versions.  If I got a better XPath generator written in JavaScript, I could include it without breaking backward compatibility.  Also, I'd be able to fix bugs and adapt it to my code.
I've posted some new thoughts on design to mozilla.dev.tech.dom:

news://news.mozilla.org:119/V62dnQuupM7twtLYnZ2dnUVZ_tGdnZ2d@mozilla.org

Replies there, please - it will help me determine what to do next.

wiki.m.o seems down right now, so I can't update that.
Comment on attachment 241805 [details] [diff] [review]
patch, v3

canceling review request; I'm going to rewrite in JS with some C++ bootstrapping, and with reduced API (eliminating attribute support).
Attachment #241805 - Flags: review?(peterv)
Surprisingly, I didn't need any C++ bootstrapping at all.  The category manager, given a category of "JavaScript global constructor", was all too happy to oblige me.  We don't need any stinking DOMCI macros!

The patch is also, to my great relief, much smaller.
Attachment #241805 - Attachment is obsolete: true
Attachment #245173 - Flags: review?(peterv)
There are a bunch of commented-out sections in the patch. Please fix that before asking for review.
sicking:  most of the commented-out sections are JavaDoc, comments to explain what's going on, etc.  There are only three very short sections which are really reminders to myself about loopholes (parser service checking for local names are two of them, and the third is an error console logging routine I'm not sure I want).

I figured more comments would be better than less.
When reviewing or looking at the code after-the-fact I don't know what this means:

+/*
+  // prefix must be a valid local name (xmlns:foo, foo is a prefix)
+  nsresult rv = nsXPathGenerator::CheckLocalName(prefix);
+ */

Is that something that is accidentally commented out? Is that something that you forgot to implement? Or something you forgot to remove? If they are as reminders to yourself about missing code, shouldn't those things be fixed before we check this in?

Basically, it is meaningless to everyone but you and I don't think we should have things like that in the common codebase.

If you want logging stuff for yourself you can put that in #ifdef DEBUG_ajvincent or some such.

Alternatively properly comment these things out (rather than the current error-looking nested comment) and add a proper comment describing why it is there.
Comment on attachment 245173 [details] [diff] [review]
patch, v4 (with JS-based constructor)

My allmakefiles.sh changes broke the patch anyway.  :(
Attachment #245173 - Attachment is obsolete: true
Attachment #245173 - Flags: review?(peterv)
Depends on: 360207
Attached patch patch, v4.1 (obsolete) — Splinter Review
nits fixed, and workarounds for bug 360207 included.  Also obsoleting the includeAttribute testcase, as that's no longer part of the API for now.
Attachment #212846 - Attachment is obsolete: true
Attachment #245318 - Flags: review?(peterv)
Not a blocker. Would still be nice to have though.
Flags: blocking1.9? → blocking1.9-
Started reviewing again, could you explain why you need to implement nsISecurityCheckedComponent?
re comment 99: please see http://weblogs.mozillazine.org/weirdal/archives/017211.html, which is my attempt to explain nsISecurityCheckedComponent.  Simply put, I did not declare DOM_OBJECT on my classes.  In retrospect, I probably should have.
Comment on attachment 245318 [details] [diff] [review]
patch, v4.1

(In reply to comment #100)
> Simply put, I did not declare
> DOM_OBJECT on my classes.  In retrospect, I probably should have.

>Index: extensions/xpath-generator/src/nsXPathGenerator.js
>===================================================================

>+  flags: nsIClassInfo.DOM_OBJECT,

>+  flags: nsIClassInfo.DOM_OBJECT,

So you don't need nsISecurityCheckedComponent?
I'd have to run the tests again, but I don't think I need it, no.
Attached file functionality test fixed once again (obsolete) —
Confirmed, DOM_OBJECT means I can remove the nsISecurityCheckedComponent implementations.

The previous functionality test was written before we supported document.adoptNode().  Thus it no longer works on trunk.  This testcase does.

The 2% error rate is due to adjacent data nodes, and cannot currently be worked around.
Attachment #212851 - Attachment is obsolete: true
Comment on attachment 245318 [details] [diff] [review]
patch, v4.1

>Index: extensions/xpath-generator/Makefile.in
>===================================================================

>+DIRS		= public src

I don't think we need separate directories for public and src, just put everything in extensions/xpath-generator

>Index: extensions/xpath-generator/public/nsIXPathGenerator.idl
>===================================================================

>+  /* A collection of bitwise flags which modify behavior.

  /**
   * A collection...

>Index: extensions/xpath-generator/src/nsXPathGenerator.js
>===================================================================

>+function ERROR_FAILURE() {
>+  throw Components.results.NS_ERROR_FAILURE;
>+}
>+
>+function NULL_POINTER() {
>+  throw Components.results.NS_ERROR_NULL_POINTER;
>+}
>+
>+function NOT_IMPLEMENTED() {
>+  throw Components.results.NS_ERROR_NOT_IMPLEMENTED;
>+}
>+
>+function DOM_NOT_SUPPORTED() {
>+  // Components.results.NS_ERROR_DOM_NOT_SUPPORTED_ERR;
>+  throw ( (((1)*Math.pow(2,31)) | ((14+0x45)*Math.pow(2,16)) | ((9))) )

Why not use << like in the C++ code? And lose some of the braces.

>+}

Don't really understand why you need these functions? I'd define these errorcodes as a const and just throw them:

const NS_ERROR_FAILURE = Components.results.NS_ERROR_FAILURE;

and then further down

    throw NS_ERROR_FAILURE;

>+  while (quoteIndex != -1) {
>+    retval += str.substr(0, quoteIndex);
>+    retval += "\", '\"', \"";
>+    str = str.substr(quoteIndex + 1);
>+    quoteIndex = str.indexOf('"');
>+  }

Maybe:

  var previousQuoteIndex = 0;
  while (quoteIndex != -1) {
    retval += str.substring(previousQuoteIndex, quoteIndex);
    retval += "\", '\"', \"";
    previousQuoteIndex = quoteIndex;
    quoteIndex = str.indexOf('"');
  }

>+XPathResolver.prototype = {

>+  QueryInterface: function QueryInterface(aIID) {

>+    // Discover what interfaces we need
>+    var i = Components.interfaces;
>+    var found = false;
>+    for (var prop in i) {
>+      if (aIID.equals(i[prop])) {
>+        found = true;
>+        dump("QUERYINTERFACE XPathGeneratorResolver: " + prop + "\n");
>+        break;
>+      }
>+    }

Drop this.

>+    if (!found) {
>+      dump("QUERYINTERFACE XPathGeneratorResolver: Unknown\n");

s/dump/debug/?

>+  getStepWithoutIndex: function getStepWithoutIndex(aNode, hasId) {

>+        debugger;

Drop this.

>+  compareNodes: function compareNodes(aTargetIterator, aComparison, hasId) {

>+    if (aComparison.localName != aTargetIterator.localName) {
>+      return nsIDOMNodeFilter.FILTER_SKIP;
>+    }
>+
>+    if (aComparison.namespaceURI != aTargetIterator.namespaceURI) {
>+      return nsIDOMNodeFilter.FILTER_SKIP;
>+    }

Combine these two.

>+  generateXPath: function generateXPath(targetNode, contextNode) {

>+    // Start establishing ancestor and first half of path.

Wouldn't it make sense to do this after constructing fromAncestorToTarget? That way you could skip it if you find an ID attribute.

>+        hasId = {value: null};
>+        var step = this.getStepWithoutIndex(targetIterator, hasId);

Could you use destructuring assignment instead of the hasId uglyness? (see http://developer.mozilla.org/en/docs/New_in_JavaScript_1.7#Multiple-value_returns)

>+
>+        // Find the subscript index ([1], [2], etc.) if needed.
>+        var indexIterator = targetIterator.parentNode.firstChild;
>+        total = 0;
>+        index = 0;
>+
>+        while (indexIterator) {
>+          var filterValue = this.compareNodes(targetIterator,
>+                                              indexIterator,
>+                                              hasId);
>+          if (filterValue == nsIDOMNodeFilter.FILTER_ACCEPT) {
>+            total++;
>+
>+            var prevNode = indexIterator.previousSibling;
>+            if (checkPreviousNode && prevNode && !index &&
>+                ((prevNode.nodeType == nsIDOMNode.TEXT_NODE) ||
>+                 (prevNode.nodeType == nsIDOMNode.CDATA_SECTION_NODE))) {
>+              // We don't support two adjacent character data nodes.

I don't understand what this is supposed to do. I think what should be happening is: if indexIterator and prevNode are textnodes and indexIterator == targetIterator then fail, else don't increment total. I think it would also make more sense to set prevNode to null before the while loop, and set it to indexIterator before changing indexIterator at the end of the while loop.

>+          indexIterator = indexIterator.nextSibling;
>+        }

>+      var filter = {
>+        target: targetIterator,
>+        hasId: hasId,
>+        compareNodes: this.compareNodes,
>+        acceptNode: function acceptNode(aNode) {
>+          return this.compareNodes(this.target, aNode, this.hasId);

Do you really need this.hasId (and the hasId property)? I thought this would automatically work with closures if you just pass hasId here.

>+  // ajvincent It's not very clear which flags, if any, should be set.
>+  // nsIClassInfo
>+  flags: nsIClassInfo.DOM_OBJECT,

Probably want MAIN_THREAD_ONLY.

>+    // Discover what interfaces we need
>+    var i = Components.interfaces;
>+    var found = false;
>+    for (var prop in i) {
>+      if (aIID.equals(i[prop])) {
>+        found = true;
>+        dump("QUERYINTERFACE XPathGenerator: " + prop + "\n");
>+        break;
>+      }
>+    }
>+    if (!found) {
>+      dump("QUERYINTERFACE XPathGenerator: Unknown (" + aIID + ")\n");
>+    }

Drop this.

>+    catman.addCategoryEntry("JavaScript global constructor",
>+                            "XPathGenerator",
>+                            XPathGenerator_contractId,
>+                            true,  /* Persist this entry */
>+                            true); /* Replace existing entry */

I'm not convinced we want to expose this to webcontent. We haven't even tried to standardize this.

I wonder if GO_TO_DOCUMENT_ROOT should be USE_ID_IF_AVAILABLE or something like that.
Drop nsISecurityCheckedComponent.
There's a bunch of long lines which you should fix.
Attachment #245318 - Flags: review?(peterv) → review-
(In reply to comment #104)
>   var previousQuoteIndex = 0;
>   while (quoteIndex != -1) {
>     retval += str.substring(previousQuoteIndex, quoteIndex);
>     retval += "\", '\"', \"";
>     previousQuoteIndex = quoteIndex;

previousQuoteIndex = quoteIndex + 1;

>     quoteIndex = str.indexOf('"');
>   }

Thanks for the review, peterv.  I'll get back to you later this week with a revised patch.
Now that we have fairly good test harnesses, I need to write some automated testcases into the next patch.  I'm thinking the easiest route might be to simply make my functionality test into a Mochikit test, but the only part that really needs a browser window would be verifying that XPathGenerator() is available as a global constructor.

With xpcshell testing, I'd have to take the test XML file and parse it.  This would likely mean more additions to the head.js file for xpcshell ("do_parse_document()") - and considering the patch is already hard to review, I might just do a spinoff bug for that.
I'd recommend doing it as mochikit tests, I don't really see a reason not to?
(In reply to comment #108)
> I'd recommend doing it as mochikit tests, I don't really see a reason not to?

I just realized something that may make this impossible.  peterv said:
> >+    catman.addCategoryEntry("JavaScript global constructor",
> >+                            "XPathGenerator",
> >+                            XPathGenerator_contractId,
> >+                            true,  /* Persist this entry */
> >+                            true); /* Replace existing entry */
> 
> I'm not convinced we want to expose this to webcontent. We haven't even tried
> to standardize this.

If it's not publicly exposed, then I can't write it as a mochitest.
Never mind, I got it to work.

So from the mochitest, I removed the adjacent data nodes part of the test (since there's only a snowball's chance in hell of passing that for 1.9 anyway).  After that, my test code (which probably has a lot of redundant tests) passes 8,831 ok() calls and fails 5.  (The testing matrix is still based on a list of nodes compared against itself.)

Of the five failures, all are edge cases.  Four are when the target node is a document fragment and the context node is the fragment or a descendant thereof.  The fifth is an attribute, not attached to any element, tested against itself.

I'm still revising the patch for the generator.  Is 8K successful calls to ok() excessive for a single test?  :)
sicking: I revised my mochitest to cover a few more cases, and I'm getting some bizarre results, even after my debugging.

The following expressions were evaluated using the document node as the context node.  Each pair below, I think, should have evaluated to the same node.

Basically, because I'm not 100% certain that my code is generating correct XPath's, I can't file a new bug for this yet.  I'm hoping you can tell me if I'm doing something wrong, or if this is a XPath evaluation bug I caught.

---

"a1:window/a0:bindings/text()[1]" resolves to the 2nd text node in the document: [object Text @ 0xb0700c0 (native @ 0xbc35890)]: (bindings)

"//text()[2]" resolves to the 3rd text node in the document: [object Text @ 0xb040070 (native @ 0xd0258f8)]: (bindings)

---

"a1:window/a0:bindings/text()[2]" resolves to the 3rd text node in the document: [object Text @ 0xb040070 (native @ 0xd0258f8)]: (bindings)

"//text()[3]" resolves to the 7th text node in the document: [object Text @ 0xc18f7e0 (native @ 0xb9063d0)]: (content)

---
> "//text()[2]" resolves to the 3rd text node in the document: [object Text @
> 0xb040070 (native @ 0xd0258f8)]: (bindings)

This is of course wrong if that is really the case. Have you looked at the DOM using the DOM Inspector? Note that the XUL contentsink is whacky when it comes to textnodes.

If we are indeed producing the wrong results for expressions please file a separate bug on that.
There's only one problem with your statement, sicking:  I'm loading the testcase as XML, not XUL.  javascript:alert(document instanceof Components.interfaces.nsIDOMXULDocument)) returns false, but checking it against nsIDOMXMLDocument returns true.
Attachment #263632 - Attachment is patch: false
Attachment #263632 - Attachment mime type: text/plain → application/xml
To quote http://www.w3.org/TR/xpath#predicates, 

NOTE: The location path //para[1] does not mean the same as the location path /descendant::para[1]. The latter selects the first descendant para element; the former selects all descendant para elements that are the first para children of their parents.

This is expected behaviour.
Doh, right. Or you could do (//para)[1]
descendant::text()[2] works.  (//text())[2] didn't.  So now I know what to do.
If (//text())[2] ever returns something different than /descendant::text()[2], please file a bug and attach a testcase.
Hm, now it is working.  Sorry for the bugspam.
Attached patch patch, v4.2 (obsolete) — Splinter Review
This patch addresses all review comments.  As a matter of fact, from peterv's comment 104:

"I don't understand what this is supposed to do. I think what should be
happening is: if indexIterator and prevNode are textnodes and indexIterator ==
targetIterator then fail, else don't increment total. I think it would also
make more sense to set prevNode to null before the while loop, and set it to
indexIterator before changing indexIterator at the end of the while loop."

The suggested fix actually eliminated the 2% error rate I was still seeing, something I didn't expect.  Very cool!  (I've reinserted the adjacent nodes test.)

I rewrote the functionality test first as a mochitest, but that had serious performance problems.  As a xpcshell test, it only takes a couple minutes to run.

I also ended up going with descendant::text() instead of (//text()), because it worked better with a prefixed path (../../descendant::text()[2]).
Attachment #245318 - Attachment is obsolete: true
Attachment #263632 - Attachment is obsolete: true
Attachment #265224 - Flags: review?(peterv)
Note that technically speaking, the xpcshell test currently fails, due to an assertion which I haven't traced yet and I believe is not my fault:

###!!! ASSERTION: Expected only one call!: '!gCallCount++', file /home/ajvincent/trunk/mozilla/xpcom/reflect/xptinfo/src/xptiInterfaceInfoManager.cpp, line 229
(In reply to comment #108)
> I'd recommend doing it as mochikit tests, I don't really see a reason not to?

xpcshell-based tests are easier to run, just do "make check"

(In reply to comment #104)
> >+    catman.addCategoryEntry("JavaScript global constructor",
> >+                            "XPathGenerator",
> >+                            XPathGenerator_contractId,
> >+                            true,  /* Persist this entry */
> >+                            true); /* Replace existing entry */
> 
> I'm not convinced we want to expose this to webcontent. We haven't even tried
> to standardize this.

peterv, bug 379140 looks to add a "JavaScript global privileged property" category; would this be acceptable?
(In reply to comment #122)
> peterv, bug 379140 looks to add a "JavaScript global privileged property"
> category; would this be acceptable?

I think so. You did a bunch of changes in the latest patch adding a lot of instanceof checks. Could you undo those, I think the code was fine without them (testing nodeType should be ok).
Actually, I added the instanceof checks because the xpcshell tests required them.
peterv: I just realized the instanceof checks were necessary because of bug 319968 (no, that's not a typo).
Comment on attachment 265224 [details] [diff] [review]
patch, v4.2

>Index: extensions/xpath-generator/nsXPathGenerator.js
>===================================================================

>+   * Get a XPath step, without index, based on the current iterator node.

an XPath

>+    if ((aComparison.localName != aTargetIterator.localName) ||
>+        (aComparison.namespaceURI != aTargetIterator.namespaceURI)) {

No need for the inner set of braces.

>+    if ((targetNode.nodeType == C_i.nsIDOMNode.DOCUMENT_TYPE_NODE) ||
>+        (contextNode.nodeType == C_i.nsIDOMNode.DOCUMENT_TYPE_NODE)) {

No need for the inner set of braces.

>+      else if (compare & C_i.nsIDOM3Node.DOCUMENT_POSITION_PRECEDING) {

No need for else after throw I think.

Could you replace the instanceof's with QueryInterface (maybe have a helper function that does it with a comment to remove it once bug 319968 is fixed)? And do they really need to throw on failure?

I think sicking should sr.
Attachment #265224 - Flags: review?(peterv) → review+
Comment on attachment 265224 [details] [diff] [review]
patch, v4.2

peterv: thanks for the review!
Attachment #265224 - Flags: superreview?(jonas)
Comment on attachment 265224 [details] [diff] [review]
patch, v4.2

I'd say lets turn this on for extension authors. I.e. make it part of the default extension set.

>+XPathResolver.prototype = {
>+  /**
>+   * Add a namespace URI and prefix.
>+   *
>+   * @param namespaceURI     The namespace URI of the namespace.
>+   * @param prefix           The prefix of the namespace.
>+   */
>+  addNamespace: function addNamespace(namespaceURI, prefix) {
>+    if (!namespaceURI || !prefix) {
>+      throw INVALID_ARG;
>+    }
>+
>+    // Can we validate the prefix somehow via C_i.nsIParserService?
>+
>+    if (prefix in this.mNamespaceURIMap) {
>+      // The prefix has already been assigned a namespace.
>+      throw ERROR_FAILURE;
>+    }
>+
>+    this.mPrefixMap[namespaceURI] = prefix;
>+    this.mNamespaceURIMap[prefix] = namespaceURI;

So in theory '__proto__' is a valid prefix, is as 'length', but you don't want to set that in either map. Don't know if there are any directly harmful properties that could get set.

You could potentially fix this by prefixing the values you're indexing using. Alternatively you could just stick the prefixes and namespaceURIs into two arrays. Would be a bit more code, but it'd be safer.

>+  // nsIClassInfo
>+  getHelperForLanguage: function getHelperForLanguage(aLanguage) {
>+    return null;
>+  },
>+
>+  // nsIClassInfo
>+  contractID: "@mozilla.org/xpath-generator-resolver;1",
>+
>+  // nsIClassInfo
>+  classDescription: "XPathGeneratorResolver",
>+
>+  // nsIClassInfo
>+  classID: Components.ID("{e74ada91-4c42-4493-b6e0-fa0fe02449e0}"),

No need to comment every function from an interface. Just mark off a section like:

// nsIClassInfo
getHelperForLanguage: function getHelperForLanguage(aLanguage) {
  return null;
},
contractID: "@mozilla.org/xpath-generator-resolver;1",
classDescription: "XPathGeneratorResolver",
classID: Components.ID("{e74ada91-4c42-4493-b6e0-fa0fe02449e0}"),
...

>+XPathGenerator.prototype = {
>+  /**
>+   * Get a XPath step, without index, based on the current iterator node.
>+   *
>+   * @param aNode Node to generate a step from.
>+   *
>+   * @return step  XPath step retrieved.
>+   * @return idval id attribute value.
>+   */
>+  getStepWithoutIndex: function getStepWithoutIndex(aNode) {
>+    if (!(aNode instanceof C_i.nsIDOMNode)) {
>+      throw ERROR_UNEXPECTED;
>+    }

Is this really needed? Won't XPConnect catch this before you even get here? To test, remove the code and try to pass in document.documentElement.style or some such? You will probably be able to pass in random js-objects, but I think even the test above won't catch that.

Please test removing the above check and pass in both a .style object and as well as an empty js-object to see if either works.


>+    switch (aNode.nodeType) {
>+      case C_i.nsIDOMNode.ELEMENT_NODE:
>+        if (!(aNode instanceof C_i.nsIDOMElement)) {
>+          throw ERROR_UNEXPECTED;
>+        }

Again, this shouldn't be needed. Feel free to create some sort of assertion function though and use that.

>+      case C_i.nsIDOMNode.PROCESSING_INSTRUCTION_NODE:
>+        if (!(aNode instanceof C_i.nsIDOMProcessingInstruction)) {
>+          throw ERROR_UNEXPECTED;
>+        }

Same here.

>+
>+    return [step, idval];

You never seem to set idval to anything but nsnull, except in one place, but there you return early. Just remove the variable and use null or attrNode.nodeValue directly.

>+  /**
>+   * Compare two nodes to see if a step might match them.
>+   *
>+   * @param aTargetIterator First node.
>+   * @param aComparison     Second node.
>+   * @param idval           id attribute value.

I don't get this at all. What does 'a step' mean here? An XPath-expression step? If so, what type of step? I would have guessed that the function would take an expression, but that doesn't seem to be the case. The id-attribute-value of what? Please explain the function and the arguments more in detail. Too much documentation above a function is rarely a problem.

more to come...
Oh, and you could add some global consts so you won't have to put C_i everywhere:

const nsIDOMNode = C_i.nsIDOMNode

Up to you
Attachment #265224 - Flags: superreview?(jonas) → superreview-
Comment on attachment 265224 [details] [diff] [review]
patch, v4.2

Ok, so i guess you'll have to leave the instanceof checks in for now :(

Looks good otherwise, please attach a new patch with the above fixed.
Attached patch patch, v4.3 (obsolete) — Splinter Review
answering all review nits
Attachment #273928 - Flags: superreview?(jonas)
Attachment #273928 - Flags: review+
Attached patch patch 4.3.1 (obsolete) — Splinter Review
Forgot a file.  oops
Attachment #265224 - Attachment is obsolete: true
Attachment #273928 - Attachment is obsolete: true
Attachment #273928 - Flags: superreview?(jonas)
Attachment #273929 - Flags: superreview?(jonas)
Attached patch patch 4.3.2, final (obsolete) — Splinter Review
sr=sicking over irc
Attachment #273933 - Flags: superreview+
Attachment #273933 - Flags: review+
Attachment #273929 - Attachment is obsolete: true
Attachment #273929 - Flags: superreview?(jonas)
We've decided to hold XPath Generator until after 1.9a7, and then to have it off by default.

Rationale:  We didn't want this on for content pages, because this hasn't been standardized and we don't have enough experience using this new feature.

Since it's not needed for content, it doesn't need to be on by default at this time.

Since it's not on by default, it's not appropriate to go in during the 1.9a7 freeze.

If you are an extension author that wants this on by default in Firefox or another Mozilla-based product, send me an e-mail (not on this bug!).  However, be prepared to justify it; your experience with this feature will be most valuable.
We are go for launch with attachment 273933 [details] [diff] [review].  Whoever checks this in should check in only the extensions/xpath-generator part of the patch (the rest of the patch turns xpathgen on by default for various apps, which per comment 134 we do not want right now).
Whiteboard: [checkin needed]
Don't you need approval1.9?
<WeirdAl>   hey, where's the approval1.9 flag for bug 319768?
<bz>        WeirdAl: approval flags are always per-patch
<bsmedberg> WeirdAl: if it's NPOB, it doesn't matter
<WeirdAl>   bsmedberg: oh. Well, technically speaking this one is NPODB *for now*

(per comment 135)
Correction to comment 135:  We need tools/test-harness/xpcshell-simple/head.js and extensions/xpath-generator to go in.
Please attach a final patch that is ready to land as-is, without changes needed by whoever's checking in. That reduces the risks of mistakes a lot.
Attached patch final patchSplinter Review
Attachment #273933 - Attachment is obsolete: true
Attachment #275080 - Flags: superreview+
Attachment #275080 - Flags: review+
Checked in, not part of the build.

Checking in tools/test-harness/xpcshell-simple/head.js;
new revision: 1.14; previous revision: 1.13
Checking in extensions/xpath-generator/Makefile.in;
initial revision: 1.1
Checking in extensions/xpath-generator/nsIXPathGenerator.idl;
initial revision: 1.1
Checking in extensions/xpath-generator/nsXPathGenerator.js;
initial revision: 1.1
Checking in extensions/xpath-generator/test/Makefile.in;
initial revision: 1.1
Checking in extensions/xpath-generator/test/unit/test_bug319768.js;
initial revision: 1.1
Checking in extensions/xpath-generator/test/unit/test_xpathgen.xml;
initial revision: 1.1
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Flags: in-testsuite+
Flags: in-litmus?
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9 M8
Whiteboard: [checkin needed]
Hmm.. I realized something. You don't need to have nsIClassInfo implemented since you're only being used by chrome code. This'll likely be the case for the foreseeable future too, so IMHO we should remove the nsIClassInfo implementation. This is partially for security reasons too.
Depends on: 402129
Depends on: 402130
Component: DOM: Core → DOM: Core & HTML
QA Contact: ian → general
Keywords: dev-doc-needed
I don't know why "dev-doc-needed" is set, this code never made it to the hg repo, so it's not in 1.9.1+.
Ah thanks, wasn't obvious from a cursory glance!
Keywords: dev-doc-needed
Whiteboard: [landed, but not part of the regular build]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: