Closed Bug 384186 Opened 17 years ago Closed 17 years ago

Microformats support for Firefox 3

Categories

(Firefox :: General, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
Firefox 3 beta1

People

(Reporter: mkaply, Assigned: mkaply)

References

()

Details

(Keywords: dev-doc-complete, Whiteboard: [microformats])

Attachments

(7 files, 5 obsolete files)

Attached file First pass at Microformats API (obsolete) —
This is the primary bug for Microformats in Firefox 3.

Step one is to get a microformats backend into Firefox for parsing microformats.

Microformats uses the new script loader. To use it in an extension, import the module:

Components.utils.import("resource://gre/modules/Microformats.js");

You then have access to the microformats API.

This JS file goes in the bin/modules directory.

IT still has some Operator specific stuff in it which I am working on.
Depends on: 238324, 377450
Attached file Latest Microformats.js (obsolete) —
geo is now included (needed by hCard/hCalendar)

Continued testing/fixes for using this in the trunk (vs in operator)
API documenentation is in the bug URL
the Components.utils.import("rel:Microformats.js"); line on the wiki seems to be outdated... Also, shouldn't the file have the .jsm extension, like XPCOMUtils does?
Keywords: dev-doc-needed
I've updated the Wiki.

As far as the .jsm goes, I'm not sure that is confirmed yet as the "proper" extension.
Attached patch Latest version (obsolete) — Splinter Review
This mainly adds a specific getter in the parser for telephone numbers.

It also fixes geo naming and improves tag parsing.
Comment on attachment 269673 [details] [diff] [review]
Latest version

Can you guys take a look? I'm not sure who to get to evaluate what I've done. You guys make sense, since this API is for extension developers.
Attachment #269673 - Flags: review?(shaver)
Attachment #269673 - Flags: review?(mark.finkle)
Attached file Mochitest unit test
Here's a first pass at a mochitest test harness.

It tests existence of microformats, the basic APIs, and hCard
Comment on attachment 269673 [details] [diff] [review]
Latest version

As per today's Firefox 3 status meeting, reassigning review to sayrer. Rob, can you take a look over this API and let mkaply know what you think? There's also a test harness attached to this bug, which I think will please you!
Attachment #269673 - Flags: review?(shaver)
Attachment #269673 - Flags: review?(sayrer)
Attachment #269673 - Flags: review?(mark.finkle)
Comment on attachment 269673 [details] [diff] [review]
Latest version

In general, this is ok, but denying review due to the monolithic implementation of getMicroformatProperty (see below), and lack of tests for each supported format. I know that microformats.org has a large test suite available, can't we use that?

---------

>  count: function(name, rootElement, recurseFrames, count) {
>    var i;

Declare bookkeeping stuff like this with let, as close as possible to the first usage. Top of the function is a C-ism.

>    if (!count) {
>      count = 0;
>    }

count = count || 0;

>    if (!rootElement) {
>      rootElement = content.document;
>    }

ditto.

>
>    /* If recurseFrames is undefined or true, look through all child frames for microformats */
>    if ((recurseFrames == undefined) || (recurseFrames == true)) {

Seems like you want 

  if (recurseFrames || recurseFrames === undefined)


>
>    /* Get the microformat nodes for the document */
>    var microformatNodes = [];
>    if (Microformats[name]) {

Should you check for this before recursing into frames?

>      if (Microformats[name].className) {
>        microformatNodes = Microformats.getElementsByClassName(rootElement,
>                                          Microformats[name].className);
>        /* alternateClassName is for cases where a parent microformat is inferred by the children */
>        /* If we find alternateClassName, the entire document becomes the microformat */
>        if ((microformatNodes.length == 0) && (Microformats[name].alternateClassName)) {

nit: This is over-parenthesized, like many other parts of the file. Please eliminate unless absolutely required for readability.

>          var temp = Microformats.getElementsByClassName(rootElement, Microformats[name].alternateClassName);
>          if (temp.length > 0) {
>            microformatNodes.push(rootElement); 
>          }

nit: replace "temp" with something descriptive.

>        }
>      } else if (Microformats[name].attributeValues) {
>        microformatNodes = Microformats.getElementsByAttribute(rootElement,
>                                          Microformats[name].attributeName,
>                                          Microformats[name].attributeValues);
>        
>      }

nit: indentation issues?


>  /**
>   * If the passed in node is contained in a microformat, this function returns
>   * the microformat that contains it. If the passed in node is a microformat,
>   * it still returns the parent.
>   *

I'm not quite sure what this means.


>   * @param  node          DOM node to check
>   * @return If the node is contained in a microformat, it returns the parent
>   *         DOM node, otherwise returns nothing
>   */
>  getParent: function(node) {
>    var xpathExpression;
>    var xpathResult;
>    var mfname;
>    var i, j;
>    for (i in Microformats)
>    {
>      mfname = i;
>      if (Microformats[mfname]) {
>        if (Microformats[mfname].className) {
>          xpathExpression = "ancestor::*[contains(concat(' ', @class, ' '), ' " + Microformats[mfname].className + " ')]";
>        } else if (Microformats[mfname].attributeValues) {
>          xpathExpression = "ancestor::*[";
>          var attributeList = Microformats[i].attributeValues.split(" ");
>          for (var j=0; j < attributeList.length; j++) {
>            if (j != 0) {
>              xpathExpression += " or ";
>            }
>            xpathExpression += "contains(concat(' ', @" + Microformats[mfname].attributeName + ", ' '), ' " + attributeList[j] + " ')";
>          }
>          xpathExpression += "]"; 
>        } else {
>          continue;
>        }
>        xpathResult = (node.ownerDocument || node).evaluate(xpathExpression, node, null,  Components.interfaces.nsIDOMXPathResult.FIRST_ORDERED_NODE_TYPE, null);
>        if (xpathResult.singleNodeValue) {
>          xpathResult.singleNodeValue.microformat = mfname;
>          return xpathResult.singleNodeValue;
>        }
>      }
>    }
>    return;
>  },
>  /**
>   * If the passed in node is a microformat, this function returns a space 
>   * separated list of the microformat names that correspond to this node
>   *
>   * @param  node          DOM node to check
>   * @return If the node is a microformat, a space separated list of microformat
>   *         names, otherwise returns nothing
>   */
>  getNamesFromNode: function(node) {
>    var i;
>    var microformatNames = [];
>    var xpathExpression;
>    var xpathResult;
>    var i, j;

"var i" appears twice here. 

>    for (i in Microformats)
>    {
>      if (Microformats[i]) {
>        if (Microformats[i].className) {
>          if (node.getAttribute('class')) {

 if (Microformats[i].className && node.getAttribute('class')) {

>            if (node.getAttribute('class').match("(^|\\s)" + Microformats[i].className + "(\\s|$)")) {
>              microformatNames.push(i);
>              continue;
>            }
>          }
>        } else if (Microformats[i].attributeValues) {
>          var attribute;
>          if (attribute = node.getAttribute(Microformats[i].attributeName)) {
>            var attributeList = Microformats[i].attributeValues.split(" ");
>            for (var j=0; j < attributeList.length; j++) {
>              if (attribute.match("(^|\\s)" + attributeList[j] + "(\\s|$)")) {
>                microformatNames.push(i);
>                continue;
>              }
>            }

Did you want to break here?

>          }
>        } else {
>          continue;
>        }
>      }

That continue statement looks superfluous. 

>    }
>    return microformatNames.join(" ");
>  },

>  add: function add(microformat, microformatDefinition) {
>    /* We always replace an existing definition with the new one */
>    if (microformatDefinition.mfVersion == Microformats.version) {
>      if (!Microformats[microformat]) {
>        Microformats.list.push(microformat);
>      }
>      Microformats[microformat] = microformatDefinition;
>      microformatDefinition.mfObject.prototype.debug = function(microformatObject) {return Microformats.debug(microformatObject)};
>    }

72 chars, please.

>  },
>  /* All parser specific function are contained in this object */

function[s]


>    /**
>     * Used to specifically retrieve an email address in a microformat node.
>     * This includes at an href, as well as removing subject if specified and
>     * the mailto prefix.
>     *
>     * @param  propnode   The DOMNode to check
>     * @param  parentnode The parent node of the property. If it is a subproperty,
>     *                    this is the parent property node. If it is not, this is the
>     *                    microformat node.
>     * @return A string with the email address.
>     */
>    emailGetter: function(propnode, parentnode) {
>      if ((propnode.nodeName.toLowerCase() == "a") || (propnode.nodeName.toLowerCase() == "area")) {
>        var mailto = propnode.href;
>        if (mailto.indexOf('?') > 0) {
>          return unescape(mailto.substring("mailto:".length, mailto.indexOf('?')));
>        } else {
>          return unescape(mailto.substring("mailto:".length));
>        }

Use the IO service to parse this correctly.


>      } else {
>        /* Special case - if this node is a value, use the parent node to get all the values */
>        if (propnode.getAttribute('class').match("(^|\\s)" + "value" + "(\\s|$)")) {
>          return Microformats.parser.defaultGetter(parentnode, parentnode);
>        } else {
>          return Microformats.parser.defaultGetter(propnode, parentnode);
>        }
>      }
>    },

What sort of data is returned from the special case? How come it isn't cleaned up like the first part?

>    /**
>     * Used to specifically retrieve HTML in a microformat node.
>     *
>     * @param  propnode   The DOMNode to check
>     * @param  parentnode The parent node of the property. If it is a subproperty,
>     *                    this is the parent property node. If it is not, this is the
>     *                    microformat node.
>     * @return A string with the HTML including all tags.
>     */
>    HTMLGetter: function(propnode, parentnode) {
>      return propnode.innerHTML;
>    },

How is this function used by callers?


>    newMicroformat: function(object, in_node, microformat) {
>      /* check to see if we are even valid */
>      if (!Microformats[microformat]) {
>        throw("Invalid microformat - " + microformat);
>      }
>      if (in_node.ownerDocument) {
>        if (Microformats[microformat].attributeName) {
>          if (!(in_node.getAttribute(Microformats[microformat].attributeName))) {
>            throw("Node is not a microformat (" + microformat + ")");
>          }

more nested if blocks

>        } else {
>          if (!(in_node.getAttribute('class').match("(^|\\s)" + Microformats[microformat].className + "(\\s|$)"))) {

This regex is repeated a lot. Define a function to do it in one place.


>    getMicroformatProperty: function getMicroformatProperty(in_mfnode, mfname, propname) {
>      var i, j, k;
>      var result;
>      var mfnode = in_mfnode;

More bookkeeping at the top of the function.

>      if (!in_mfnode.origNode && Microformats[mfname].className && in_mfnode.ownerDocument) {
>        mfnode = Microformats.parser.preProcessMicroformat(in_mfnode);
>      }
>      var foundProps = false;
>      var tp;
>      if (Microformats[mfname].properties[propname]) {
>        tp = Microformats[mfname].properties[propname];
>      }
>  /* OK this is strange so let me explain it. If we didn't find the property
>     that was passed in, look for it among the subproperties. If you find it, 
>     AND the classname of the passed in mfnode matches the parent property,
>     use it. This allows us to pass in an hentry and get it's entry-title
>     for instance. This only works one level deep. */
>      if (!tp) {
>        for (i in Microformats[mfname].properties) {
>          if (Microformats[mfname].properties[i].subproperties) {
>            if (Microformats[mfname].properties[i].subproperties[propname]) {
>              if (mfnode.getAttribute('class')) {
>                if (mfnode.getAttribute('class').match("(^|\\s)" + i + "(\\s|$)")) {

Nested ifs and the repetitious regex.

>                  tp = Microformats[mfname].properties[i].subproperties[propname];
>                  break;
>                }
>              }
>            }
>          }
>        }
>      }
>      var propnodes;
>      if (!tp) {
>        /* propname contains a . - dereference it */
>        if (propname.indexOf(".") != -1) {
>          var props = propname.split(".");
>          tp = Microformats[mfname].properties[props[0]];
>          if (tp && tp.rel == true) {
>            propnodes = Microformats.getElementsByAttribute(mfnode, "rel", props[0]);
>          } else {
>            propnodes = Microformats.getElementsByClassName(mfnode, props[0]);
>          }
>        }
>      } else {
>        if (tp && tp.rel == true) {
>          propnodes = Microformats.getElementsByAttribute(mfnode, "rel", propname);
>        } else {
>          propnodes = Microformats.getElementsByClassName(mfnode, propname);
>        }
>      }

This is repetitious. Only write the 

"if (tp && tp.rel == true) {" 

conditional once.


>      var curprop = 0;
>      var validType, type;
>      for (j = 0; j < propnodes.length; j++) {
>        foundProps = true;
>        var callPropertyGetter = true;
>        if (tp.subproperties) {
>          var subprop;
>          for (subprop in tp.subproperties) {
>            var foundSubProperties = false;
>            var subpropnodes = Microformats.getElementsByClassName(propnodes[j], subprop);
>            var cursubprop = 0;
>            for (k = 0; k < subpropnodes.length; k++) {
>              if (tp.subproperties[subprop].datatype) {
>                result = Microformats.parser.datatypeHelper(tp.subproperties[subprop], subpropnodes[k], propnodes[j]);
>              } else {
>                result = Microformats.parser.defaultGetter(subpropnodes[k], propnodes[j]);
>                if ((tp.subproperties[subprop].implied) && (result)) {
>                  var temp = result;
>                  result = {};
>                  result[tp.subproperties[subprop].implied] = temp;
>                }
>              }
>              if (tp.subproperties[subprop].types) {
>                validType = false;
>                for (type in tp.subproperties[subprop].types) {
>                  if (result.toLowerCase() == tp.subproperties[subprop].types[type]) {
>                    validType = true;
>                    break;
>                  }
>                }
>                if (!validType) {
>                  continue;
>                }
>              }
>              if (result) {
>                foundSubProperties = true;
>                callPropertyGetter = false;
>                if (tp.plural) {
>                  if (!property) {
>                    property = [];
>                  }
>                  if (!property[curprop]) {
>                    property[curprop] = [];
>                  }
>                  if (tp.subproperties[subprop].plural) {
>                    if (!property[curprop][subprop]) {
>                      property[curprop][subprop] = [];
>                    }
>                    property[curprop][subprop][cursubprop] = result;
>                  } else {
>                    property[curprop][subprop] = result;
>                  }
>                } else {
>                  if (!property) {
>                    property = {};
>                  }
>                  if (tp.subproperties[subprop].plural) {
>                    if (!property[subprop]) {
>                      property[subprop] = [];
>                    }
>                    property[subprop][cursubprop] = result;
>                  } else {
>                    property[subprop] = result;
>                  }
>                }
>              }
>              
>              if (!tp.subproperties[subprop].plural) {
>                break;
>              }
>              cursubprop++;
>            }
>            if (!foundSubProperties) {
>              if (tp.subproperties[subprop].virtual) {
>                if (tp.subproperties[subprop].virtualGetter) {
>                  result = tp.subproperties[subprop].virtualGetter(propnodes[j], propnodes[j]);
>                } else {
>                  result = Microformats.parser.datatypeHelper(tp.subproperties[subprop], propnodes[j]);
>                }
>                if (result) {
>                  callPropertyGetter = false;
>                  if (tp.plural) {
>                    if (!property) {
>                      property = [];
>                    }
>                    if (!property[curprop]) {
>                      property[curprop] = [];
>                    }
>                    if (tp.subproperties[subprop].plural) {
>                      if (!property[curprop][subprop]) {
>                        property[curprop][subprop] = [];
>                      }
>                      property[curprop][subprop][cursubprop] = result;
>                    } else {
>                      property[curprop][subprop] = result;
>                    }
>                  } else {
>                    if (!property) {
>                      property = {};
>                    }
>                    if (tp.subproperties[subprop].plural) {
>                      if (!property[subprop]) {
>                        property[subprop] = [];
>                      }
>                      property[subprop][cursubprop] = result;
>                    } else {
>                      property[subprop] = result;
>                    }
>                  }
>                }
>              }
>            }
>          }
>        }

This block is completely unreadable :) It might be doing the right thing, but we're going to have to simplify it and comment it. This function is 265 lines long, and that is too long for JS of this sort.

>        /* If we didn't find any sub properties for the property, check
>           to see if the property has a getter. If it does, call it. This
>           is to handle cases where the property sets up the subproperty
>           (org->organization-name) */
>        if (callPropertyGetter) {
>          if (tp.datatype) {
>            result = Microformats.parser.datatypeHelper(tp, propnodes[j]);
>          } else {
>            result = Microformats.parser.defaultGetter(propnodes[j]);
>            if ((tp.implied) && (result)) {
>              var temp = result;
>              result = {};
>              result[tp.implied] = temp;
>            }
>          }
>          if (tp.types) {
>            validType = false;
>            for (type in tp.types) {
>              if (result.toLowerCase() == tp.types[type]) {
>                validType = true;
>                break;
>              }
>            }
>            if (!validType) {
>              continue;
>            }
>          }
>          if (result) {
>            if (tp.plural) {
>              if (!property) {
>                property = [];
>              }
>              property[curprop] = result;
>            } else {
>              if (!property) {
>                property = {};
>              }
>              property = result;
>            }
>          }
>          if (!tp.plural) {
>            break;
>          }
>        }
>        curprop++;
>      }
>      /* If we didn't find the property, check to see if it is a virtual
>         property and if so, call it's getter. */
>      if (!foundProps) {
>        if (tp.virtual) {
>          if (tp.virtualGetter) {
>            result = tp.virtualGetter(mfnode, mfnode);
>          } else {
>            result = Microformats.parser.datatypeHelper(tp, mfnode);
>          }
>          if (tp.cardinality == "plural") {
>            if (result) {
>              if (!property) {
>                property = [];
>              }
>              property[0] = result;
>            } else {
>              return;
>            }
>          } else {
>            if (result) {
>              property = result;
>            } else {
>              return;
>            }
>          }
>        } else {
>          return;
>        }
>      }
>      /* propname contains a . - dereference it */
>      /* this needs to be way more robust */
>      /* check if property is an array and if so only return 1st element */
>      /* handle multuiple nestings */
>      if (propname.indexOf(".") != -1) {
>        var props = propname.split(".");
>        return property[props[1]];
>      }
>      return property;
>    },

>    /**
>     * Internal parser API used to resolve includes and headers. Includes are
>     * resolved by simply cloning the node and replacing it in a clone of the
>     * original DOM node. Headers are resolved by creating a span and then copying
>     * the innerHTML and the class name.
>     *
>     * @param  in_mfnode The node to preProcess.
>     * @return If the node had includes or headers, a cloned node otherwise
>     *         the original node. You can check to see if the node was cloned
>     *         by looking for .origNode in the new node.
>     */
>    preProcessMicroformat: function(in_mfnode) {
>      var mfnode;
>      var i;
>      var includes = Microformats.getElementsByClassName(in_mfnode, "include");
>      if ((includes.length > 0) || ((in_mfnode.nodeName.toLowerCase() == "td") && (in_mfnode.getAttribute("headers")))) {
>        mfnode = in_mfnode.cloneNode(true);
>        mfnode.origNode = in_mfnode;
>        if (includes.length > 0) {
>          includes = Microformats.getElementsByClassName(mfnode, "include");
>          var includeId;
>          var include_length = includes.length;
>          for (i = include_length -1; i >= 0; i--) {
>            if (includes[i].nodeName.toLowerCase() == "a") {
>              includeId = includes[i].getAttribute("href").substr(1);
>            }
>            if (includes[i].nodeName.toLowerCase() == "object") {
>              includeId = includes[i].getAttribute("data").substr(1);
>            }
>            try {
>              includes[i].parentNode.replaceChild(in_mfnode.ownerDocument.getElementById(includeId).cloneNode(true), includes[i]);
>            } catch(ex) {}
>          }

Why are we swallowing this exception? How can we make it occur?

>        } else {
>          var headers = in_mfnode.getAttribute("headers").split(" ");
>          for (i = 0; i < headers.length; i++) {
>            try {
>              var tempNode = in_mfnode.ownerDocument.createElement("span");
>              var headerNode = in_mfnode.ownerDocument.getElementById(headers[i]);
>              tempNode.innerHTML = headerNode.innerHTML;
>              tempNode.className = headerNode.className;
>              mfnode.appendChild(tempNode);
>            } catch(ex) {}
>          }

Same here question here.

>        }
>      } else {
>        mfnode = in_mfnode;
>      }
>      return mfnode;
>    },


>
>/* MICROFORMAT DEFINITIONS BEGIN HERE */
>

So, it looks like we have reasonably extensible architecture here. That's good. We also have a bug open on making sure that we don't hunt for microformats needlessly, by listening to the parser. How do new microformats plug into that? Or do they?
Attachment #269673 - Flags: review?(sayrer) → review-
Working on review comments. Addressing questions here:

> In general, this is ok, but denying review due to the monolithic implementation
> of getMicroformatProperty (see below), and lack of tests for each supported
> format. I know that microformats.org has a large test suite available, can't we
> use that?

There's no easy way to integrate the test suite into our framework. Basically the test suite is an hCard and it's corresponding VCARD. What we're doing here is comparing against the data, not the actual VCARD. And even if we had VCARD export, we couldn't necessarily compare directly because some of the VCARD info is user agent specific. 

So I can certainly add more testcases, but they will all be manual "create this microformat, do string compares to see if things came out right"

> This block is completely unreadable :) It might be doing the right thing, but
> we're going to have to simplify it and comment it. This function is 265 lines
> long, and that is too long for JS of this sort.

Yeah. It did come out pretty painful. It used to be longer :) I'll see if there is some way to use recursion to clean it up.

> So, it looks like we have reasonably extensible architecture here. That's good.
> We also have a bug open on making sure that we don't hunt for microformats
> needlessly, by listening to the parser. How do new microformats plug into that?
> Or do they?

That's an open question based on the parser changes. I don't think there is a way to access the Microformats array from C code, is there? So we might end up having to have a small Component if only to keep the list of microformat classnames that the parser accesses.

Per mscott:

toolkit/components/microformats
This is the hCard test suite:

http://microformats.org/tests/hcard

In mochitest
I've addressed all of your comments.

The GetMicroformatProperty function was completely rewritten to be recursive and is much less complex. 

Also as noted above, I wrote a huge unit test that verified things are working the  same as before.

Robert: THANKS for the push on the unit test. Saved me a lot of by hand testing. I guess that's what unit tests are for :)
Attachment #268123 - Attachment is obsolete: true
Attachment #268231 - Attachment is obsolete: true
Attachment #269673 - Attachment is obsolete: true
Attachment #274778 - Flags: review?(sayrer)
Attached patch Latest patch (obsolete) — Splinter Review
This version fixes some issues with HTML getters. It allows access to the HTML data of certain attributes in the microformat.
Attachment #274778 - Attachment is obsolete: true
Attachment #274778 - Flags: review?(sayrer)
Attached file hCalender mochitest
Attached file geo mochitest
Looks like review got canceled here. What's up?
Comment on attachment 276639 [details] [diff] [review]
Latest patch

It autocancelled when I put the new patch up and obsoleted the old. Maybe bugzilla should transfer the review request...
Attachment #276639 - Flags: review?(sayrer)
Attachment #276639 - Attachment is patch: true
Attachment #276639 - Attachment mime type: application/x-javascript → text/plain
Comment on attachment 276639 [details] [diff] [review]
Latest patch

This is much, much better code on the whole.

It still iterates over things using for-loops and switches in a way that I personally dislike, but I don't think that should prevent check in. I can patch it later if it really bothers me. ;)

Pleases make the changes below, and we should be all set.

>var EXPORTED_SYMBOLS = ["Microformats", "adr", "tag", "hCard", "hCalendar", "geo"];
>
>var Microformats = {
>  version: 0.8,
>  /* When a microformat is added, the name is placed in this list */
>  list: [],
>  /* Custom iterator so that microformats can be enumerated as */
>  /* for (i in Microformats */

typo: for (var i in Micromats)

>  __iterator__: function () {
>    for (let i=0; i < this.list.length; i++) {
>      yield this.list[i];
>    }
>  },
>  /**
>   * Retrieves microformats objects of the given type from a document
>   * 
>   * @param  name          The name of the microformat (required)
>   * @param  rootElement   The DOM element at which to start searching (required)
>   * @param  recurseFrames Whether or not to search child frames for microformats (optional - defaults to true)
>   * @param  microformats  An array of microformat objects to which is added the results (optional)

This parameter name is a little obtuse. Change to targetArray or formatArray or something.
In place array concatenation is awkard in JS, so I can buy this param.

>  /**
>   * Counts microformats objects of the given type from a document
>   * 
>   * @param  name          The name of the microformat (required)
>   * @param  rootElement   The DOM element at which to start searching (required)
>   * @param  recurseFrames Whether or not to search child frames for microformats (optional - defaults to true)
>   * @param  count         The current count

The count parameter seems to have little value to me. How hard is it for the caller to to perform the addition themselves?


>  /**
>   * This function searches a given nodes ancestors looking for a microformat
>   * and if it finds it, returns it. It does NOT include self, so if the passed
>   * in node is a microformat, it will still search ancestors for a microformat.
>   *
>   * @param  node          DOM node to check
>   * @return If the node is contained in a microformat, it returns the parent
>   *         DOM node, otherwise returns nothing
>   */
>  getParent: function(node) {

"Nothing" is imprecise. Should it return null rather than undefined? Either way, be specfic.

We still have this regex showing up a bunch:

>            if (node.getAttribute('class').match("(^|\\s)" + Microformats[i].className + "(\\s|$)")) {
>              microformatNames.push(i);
>              continue;
>            }
>          }
...
>          /* If we are processing a value node, don't remove whitespace */
>          if (propnode.getAttribute('class') && !propnode.getAttribute('class').match("(^|\\s)" + "value" + "(\\s|$)")) {
...
>    telGetter: function(propnode, parentnode) {
>      /* Special case - if this node is a value, use the parent node to get all the values */
>      if (propnode.getAttribute('class').match("(^|\\s)" + "value" + "(\\s|$)")) {
...
>        /* Special case - if this node is a value, use the parent node to get all the values */
>        /* If this case gets executed, per the value design pattern, the result */
>        /* will be the EXACT email address with no extra parsing required */
>        if (propnode.getAttribute('class').match("(^|\\s)" + "value" + "(\\s|$)")) {

Let's do something like this:

function matchClass(node, className) {
  classValue = node.getAttribute('class');
  return (classValue && classValue.match("(^|\\s)" + className + "(\\s|$)"));
}

that should eliminate a bunch of null checks and noisy inline regex stuff.

>    },
>    /**
>     * Used to specifically retrieve a URI in a microformat node. This includes
>     * looking at an href/img/object/area to get the fully qualified URI.
>     *
>     * @param  propnode   The DOMNode to check
>     * @param  parentnode The parent node of the property. If it is a subproperty,
>     *                    this is the parent property node. If it is not, this is the
>     *                    microformat node.
>     * @return A string with the fully qualified URI.
>     */
>    uriGetter: function(propnode, parentnode) {
>      if (propnode.nodeName.toLowerCase() == "a") {
>        return propnode.href;
>      } else if (propnode.nodeName.toLowerCase() == "img") {
>        return propnode.src;
>      } else if (propnode.nodeName.toLowerCase() == "object") {
>        return propnode.data;
>      } else if (propnode.nodeName.toLowerCase() == "area") {
>        return propnode.href;
>      } else {
>        return Microformats.parser.textGetter(propnode, parentnode);
>      }

var pairs = {"a":"href", "img":"src", "object":"data", "area":"href"};
var name = propnode.nodeName.toLowerCase();
if (pairs.hasOwnProperty(name)) {
  return propnode[pairs[name]];
} else {
  return Microformats.parser.textGetter(propnode, parentnode);
}


>    },
>    /**
>     * Used to specifically retrieve a telephone number in a microformat node.
>     * Basically this is to handle weird value excerpting
>
>     * @param  propnode   The DOMNode to check
>     * @param  parentnode The parent node of the property. If it is a subproperty,
>     *                    this is the parent property node. If it is not, this is the
>     *                    microformat node.
>     * @return A string with the telephone number
>     */
>    telGetter: function(propnode, parentnode) {

Please expand on "weird value excerpting". Reference to an external spec is fine.


>    /**
>     * Used when a caller needs the text inside a particular DOM node.
>     *
>     * @param  propnode   The DOMNode to check
>     * @param  parentnode The parent node of the property. If it is a subproperty,
>     *                    this is the parent property node. If it is not, this is the
>     *                    microformat node.
>     * @return A string with just the text including all tags.
>     */
>    textGetter: function(propnode, parentnode) {
>      var s = Microformats.parser.defaultGetter(propnode, parentnode, "text");
>      /* Remove any HTML comments */
>//      s	= s.replace(/\<!--.*?\-->/gi, '');
>      /* Remove any HTML tags and their contents */
>      /* This used to be replacing with a space - I don't like that */
>//      s	= s.replace(/\<.*?\>/gi, '');
>      return s;
>    },

http://developer.mozilla.org/en/docs/DOM:element.textContent ??

Or maybe I am missing something.


>    /**
>     * Used when a caller needs the HTML inside a particular DOM node.
>     *
>     * @param  propnode   The DOMNode to check
>     * @param  parentnode The parent node of the property. If it is a subproperty,
>     *                    this is the parent property node. If it is not, this is the
>     *                    microformat node.
>     * @return An object with function to access the string and the HTML
>     *         Note that because this is an object, you can't do string functions
>     *         so i faked a couple string functions that might be useful.
>     */
>    HTMLGetter: function(propnode, parentnode) {
>      return {
>        toString: function () {
>          return Microformats.parser.defaultGetter(propnode, parentnode, "text");
>        },
>        toHTML: function () {
>          return Microformats.parser.defaultGetter(propnode, parentnode, "HTML"); 
>          },
>          replace: function (a, b) {
>            return this.toString().replace(a,b);
>          },
>          match: function (a) {
>            return this.toString().match(a);
>          } 
>           
>      };
>    },

Indentation problem.

>    newMicroformat: function(object, in_node, microformat) {
>      /* check to see if we are even valid */
>      if (!Microformats[microformat]) {
>        throw("Invalid microformat - " + microformat);
>      }
>      if (in_node.ownerDocument) {
>        if (Microformats[microformat].attributeName) {
>          if (!(in_node.getAttribute(Microformats[microformat].attributeName))) {
>            throw("Node is not a microformat (" + microformat + ")");
>          }
>        } else {
>          if (!(in_node.getAttribute('class').match("(^|\\s)" + Microformats[microformat].className + "(\\s|$)"))) {
>            throw("Node is not a microformat (" + microformat + ")");
>          }
>        }
>      }
>      var node = in_node;
>      if ((Microformats[microformat].className) && in_node.ownerDocument) {
>        node = Microformats.parser.preProcessMicroformat(in_node);
>      }
>
>      for (let i in Microformats[microformat].properties) {
>        object.__defineGetter__(i, Microformats.parser.getMicroformatPropertyGenerator(node, microformat, i, object));
>      }
>      
>      /* The node in the object should be the original node */
>      object.node = in_node;
>      /* we also store the nodes that has been "resolved" */

Grammar problem.
(In reply to comment #19)
> 
> var pairs = {"a":"href", "img":"src", "object":"data", "area":"href"};
> var name = propnode.nodeName.toLowerCase();
> if (pairs.hasOwnProperty(name)) {
>   return propnode[pairs[name]];
> } else {
>   return Microformats.parser.textGetter(propnode, parentnode);
> }

Er, actually, don't use an else block here.

  if (pairs.hasOwnProperty(name)) {
    return propnode[pairs[name]];
  } 
   
  return Microformats.parser.textGetter(propnode, parentnode);
I've addressed all your comments.

Thanks for the feedback!
Attachment #276639 - Attachment is obsolete: true
Attachment #278647 - Flags: review?(sayrer)
Attachment #276639 - Flags: review?(sayrer)
Attachment #278647 - Flags: review?(sayrer) → review+
Whiteboard: [microformats]
Attachment #278647 - Flags: superreview?(benjamin)
Attachment #278647 - Flags: approval1.9?
I am going on vacation. Tomorrow. I've attached a diff so that anyone can check in this patch once approvals are had.

i'd appreciate it if someone could checkin the patch since I haven't gotten approvals.
Attachment #279849 - Attachment mime type: application/octet-stream → text/plain
(In reply to comment #22)
> 
> i'd appreciate it if someone could checkin the patch since I haven't gotten
> approvals.

I'll deal with any comments from benjamin and check it in.

Comment on attachment 278647 [details] [diff] [review]
Patch taht addresses all review comments

please request approval only when all reviews are done
Attachment #278647 - Flags: approval1.9?
The Target milestone here is outdated.  can someone please update it?
Done
Target Milestone: Firefox 3 alpha6 → Firefox 3 M9
Comment on attachment 278647 [details] [diff] [review]
Patch taht addresses all review comments

I don't know microformats or most of this stuff well... can you please find somebody else who does?
Attachment #278647 - Flags: superreview?(benjamin)
Comment on attachment 279849 [details] [diff] [review]
Full diff ready for checkin

moa=bsmedberg
a=mconnor on behalf of drivers for checkin
Attachment #279849 - Flags: approval1.9+
The patch appears to have landed. Should this bug be closed?
Sorry, forgot.

This has been checked in. Woohoo!
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
The mochitest tries to load some files from external servers and currently for
example http://theryanking.com/contact/ doesn't return anything (not even 404), so
mochitest is failing because of timeout.
Where can I find some examples of sites with microformats so I can take a look at those?
There are lists of sites with hcards here:

http://microformats.org/wiki/hcard-examples-in-wild

and hcalendars here:

http://microformats.org/wiki/hcalendar-examples-in-wild

My favorite sources are Yahoo local for hcard and upcoming for hcalendar.

Google maps also has embedded hcards now as well.
Is there a specification for the fields and format of the JavaScript objects that define a microformat somewhere?  I'm trying to document that and when documenting from examples, it's hard to be sure you're covering all the bases.
(In reply to comment #34)
> Is there a specification for the fields and format of the JavaScript objects
> that define a microformat somewhere?  I'm trying to document that and when
> documenting from examples, it's hard to be sure you're covering all the bases.
> 

Unfortunately no. Things have changed since the last time I talked about this:

http://www.kaply.com/weblog/2006/12/23/adding-new-microformats-to-operator/


http://www.kaply.com/weblog/wp-content/uploads/2007/05/hResume.js

Shows how to do it with no docs.

Give me a day to try to throw something together that explains things.

I guess my main question is: are there specific attributes that are standard on properties?  I see that for example, "type" in "adr" has attributes "plural" and "types" -- are these standard attributes that may be used in various places, or are attributes defined on a per-property basis?

The documentation is coming together, slowly, but I don't want to make stuff up off the top of my head and be badly wrong. :)
I sent you a quick pass at how to define a microformat.

plural is a standard type that indicates if a a microformat property can be in the microformat multiple times. If so, it is always returned as an array.

types refers to a case where a microformat property has specific values that it can be an non other.

"type" is actually the name of the property itself.

The "type"/"types" thing is a little confusing unfortunately.

I should have named that better...
Am I correct in guessing that everything from newMicroformat downward in the Microformats.js file is internal implementation detail type stuff and of low importance for (or should not be) documented?  If not, let me know which if any of this stuff you'd like to see written up and I'll take care of it.
Eric:

Basically everything in .parser is really the stuff only for developers of new microformats.

In my mind, the public API is mostly what is documented here:

http://wiki.mozilla.org/User:Mkaply:Fx-Docs:Microformats/Architecture

Although that's a little dated. I am making a change to the "get" and "coumt" APIs to pass in an options object instead of recurseFrames.

We should probably talk outside the bug...
If you could ping me when the changes to get and count are done, that would be a big help, so I can be sure I get the doc updated.
Patch for unit tests to remove external dependency
Attachment #292682 - Flags: review?
Since nobody has complained about the quality of this documentation in the past couple-three months, I'm tagging this as documentation complete.
Attachment #292682 - Flags: review? → review?(dietrich)
Comment on attachment 292682 [details] [diff] [review]
Fix for unit tests

looks like sayre's done all previous reviews of this code, so routing to him.
Attachment #292682 - Flags: review?(dietrich) → review?(sayrer)
Comment on attachment 292682 [details] [diff] [review]
Fix for unit tests

tests look fine if we still support this.
Attachment #292682 - Flags: review?(sayrer) → review+
Blocks: 772981
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: