Closed Bug 357276 Opened 15 years ago Closed 4 years ago

Add an RDF compatibility parser

Categories

(Core Graveyard :: RDF, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: benjamin, Unassigned)

References

Details

Attachments

(2 files)

To remove the "heavy" mozilla/rdf code from the platform, we need a couple things:

1) a way to import data that's currently in RDF files
2) a simple graph-based data model for XUL templating

This bug covers part 1. I have an RDF parser using SAX which can read the RDF produced by the current RDF serializer, as well as "standard" RDF with the following exceptions:

* no handling of relative URIs, xml:base, or rdf:ID on nodes
* no handling of reification (rdf:ID on properties)
* no handling of rdf:parseType (though parseType="Resource" would be pretty easy to add)

It does do lang-typed literals, which should make it possible to solve the "localized extension descriptions" bug.

The code is shipped as an JS file which can be loaded using the subscript loader. I will be attaching two patches: one for the parser itself and xpcshell unit tests, the other will change the chrome registry to use the new parser.
The unit test depends on the RDF testsuite living in rdf/compat/test/testcases. I have not included the testcases in this patch, but I will land them with this patch: http://www.w3.org/2000/10/rdf-tests/rdfcore/all_20031114.zip
Attachment #242766 - Flags: review?(sayrer)
I just noticed that rdf-parser.js in the first patch didn't have a license header. I have added one in my tree.
Attachment #242769 - Flags: review?(sayrer)
I think that a proposal like this should be discussed in .planning.

Moving away from what we currently have may be a good idea, but I'd like to see a better case being made on where to go to and why.
And I'm honestly more interested to see a plan for localstore.rdf than contents.rdf in toolkit, which, hrm, is even more out of fashion that RDF or RDF/XML, even.
(In reply to comment #0)
> * no handling of rdf:parseType (though parseType="Resource" would be pretty
> easy to add)
> 

I filed bug 357289 to implement a SAX recorder, since I should have one for feeds too. That should make XML literals easy. 
Comment on attachment 242766 [details] [diff] [review]
RDF-compat parser, rev. 1

If this interface had IDL, this patch would get rejected for lack of documentation. I grok enough to continue reviewing, though.

I don't think docstrings on each function are necessary, but I do think explanations of what each object does would be good, in addition to an introductory comment detailing what the file does at a very high level.
Comment on attachment 242766 [details] [diff] [review]
RDF-compat parser, rev. 1

general nits: 

Pick one brace style for one-line conditionals.
JS code I'm familiar with generally doesn't use the "a" prefix for args. There is a mix in this file.

>+
>+function getRDFAttribute(aAttr, atts)
>+{
>+  var rv = atts.getValueFromName(RDF_NAMESPACE, aAttr);

nit: the rv name clashes weirdly with nsresult usage. use "text" or "value"


>+
>+XML_NAME = /^[A-Za-z_][0-9A-Za-z\.\-_]*$/;
>+

This seems to disallow elements with unicode names. Other regexes in the file might have this issue as well.

js> myTextVar = "\u0046\u004f\u004f"
FOO
js> XML_NAME.test(myTextVar)
true
js> myTextVar = "\u7000\u5000\u5122";
瀀倀儢
js> XML_NAME.test(myTextVar)
false

I think this is fine for a follow up if it's not a regression.


>+
>+  r = LiteralCRP.exec(aObject);
>+  if (r != null) {
>+    if (r[2] == "^^") {
>+      return new RDFTypedLiteral(untripleString(r[1]), untripleString(r[3]));
>+    }
>+    return new RDFLiteral(untripleString(r[1]), untripleString(r[4]));
>+  }

This stuff is hard to understand. How about descriptive names for the matches?

  if (r != null) {
    var [,foo,bar,baz,qux] = r; //maybe we can use JS 1.7?


>+  toNTriples: function rdfs_toNTriples()
>+  {
>+    var lines = [];
>+    for (var subj in this._forward) {
>+      var subjobj = this._forward[subj];
>+      for (var pred in subjobj) {
>+        var predarray = subjobj[pred];
>+        for each (var obj in predarray) {
>+          lines.push(subj + "\t" + pred + "\t" + obj + " .");
>+        }
>+      }
>+    }
>+    lines.sort();
>+    return lines.join("\n") + "\n";
>+  },

nit: Nested for-loops like this are suspect in JS1.7. You could possibly tighten 
this up by getting rid of the key/value assignment stuff and then perhaps look at 
transforming to one or two array comprehensions.

// not tested
for (var [subj, subobj] in Iterator(this._forward))
  for(var [predicate,predarray] in Iterator(subobj))


>+
>+  // @param aParam either a string to be parsed directly, or a nsIFile
>+  //               which is read and then parsed
>+  parseNTriples: function rdfs_parseNTriples(aParam)
>+  {
>+    if (aParam instanceof Ci.nsIFile) {
>+      var fis = Cc["@mozilla.org/network/file-input-stream;1"].
>+        createInstance(Ci.nsIFileInputStream);
>+      fis.init(aParam, 0x01, 0664, 0);

nit: these flags should be descriptive consts.

>+      var sis = Cc["@mozilla.org/scriptableinputstream;1"].
>+        createInstance(Ci.nsIScriptableInputStream);
>+      sis.init(fis);
>+
>+      var data = sis.read(sis.available());
>+      this._parseNTriplesFromString(data);
>+    }
>+    else {
>+      this._parseNTriplesFromString(aParam.toString());
>+    }
>+  },

Shouldn't this close the stream in a finally block? 

Will this always read the whole file? (I don't know)

>+
>+  _parseNTriplesFromString: function rdfs_parsentriplesfromstring(aString)
>+  {
>+    this.clear();
>+
>+    var lines = aString.split(/[\n\r]+/);
>+    for each (var line in lines) {
>+      if (COMMENT_OR_EMPTY.test(line))
>+        continue;
>+
>+      var parts = splitNTriple(line, this);
>+      this.addTriple(parts[0], parts[1], parts[2]);
>+    }
>+  },

nit: this stuff can be made easier to follow by using JS features

// not tested
var lines = aString.split(/[\n\r]+/).filter(COMMENT_OR_EMPTY.test);
var partslist = [splitNTriple(line, this) for line in lines];

// use loops when there are side-effects
for each (var parts in partslist) { 
  this.addTriple(parts[0], parts[1], parts[2]);
}
 
>+
>+  setSAXHandlers: function(aSAXReader)

This is a judgement call, and I think whichever you prefer is ok. SAX handlers
usually aren't swapped out like this unless they are intended to be composed
in different ways (like one handler calls the next). So, startElement would
usually have something like this:

if (this.state == IN_PROLOG) {
 ...
} else if (this.state == IN_DOC)
 ...
 
etc. You are holding a lot of state in the handlers though, so I can see why 
the current code might be better.

xml:lang and xml:base are easier to handle with the switch-style handlers,
because you can keep sparse arrays keyed to depth for the context of each.
push() on startElement and pop() on closeElement()

r=sayrer with style nits fixed. the rest is up to you.
Attachment #242766 - Flags: review?(sayrer) → review+
(In reply to comment #6)
> Shouldn't this close the stream in a finally block? 

In fact this never closes the file; you have to read beyond the end of the file to get auto-closure, as I assume the constant arguments specify but am too lazy to double-check.

> Will this always read the whole file? (I don't know)

According to the contract of the API, no; it would be entirely free to return data in 5-byte chunks until it hit the end of the file if it wanted.  There's also the rather-meaningless problem of the file being greater than 2**32 bytes in size.  The in-tree implementation returns the number of bytes available to be read beyond the current file pointer location; I doubt that's guaranteed to be the entire file, but it typically would be where this would be used, I expect.
Comment on attachment 242769 [details] [diff] [review]
Make the chrome registry use the rdf-compat parser, rev. 1

>? chrome/src/importer-test.js

There are probably few tests to write for this, especially getResourceName.

>+
>+function getResourceName(resource)
>+{
>+  if (!resource instanceof RDFParser.RDFNode)
>+    throw Error("Unexpected value passed as resource.");
>+
>+  var urispec = resource.uri();
>+  return /:([^:]+)$/.exec(urispec)[1];
>+}

Use nsIURI to get the pieces you need.

>+
>+    var fis = Cc["@mozilla.org/network/file-input-stream;1"].
>+      createInstance(Ci.nsIFileInputStream);
>+    fis.init(aInstalledFile, 0x01, 0664, 0);

nit: name the flags

>+    var sis = Cc["@mozilla.org/scriptableinputstream;1"].
>+      createInstance(Ci.nsIScriptableInputStream);
>+    sis.init(fis);
>+
>+    var data = sis.read(sis.available());

close the stream, make sure to read the whole stream

r=sayrer with that
Attachment #242769 - Flags: review?(sayrer) → review+
Priority: -- → P3
Assignee: benjamin → nobody
Blocks: 833098
Duplicate of this bug: 857454
Is this WONTFIX ? Given the remaining usages of RDF, the solutions are either to convert to JSON, either to remove the usages.

I don't think adding a compat parser is the way to go though.
Flags: needinfo?(mak77)
(In reply to Tim Nguyen :ntim from comment #10)
> Is this WONTFIX ? Given the remaining usages of RDF, the solutions are
> either to convert to JSON, either to remove the usages.

I agree, but the last call should be by the add-ons team since the only remaining usages are install.rdf and update.rdf, so forwarding to jorgev. The last thing I heard was that we planned to completely move to json in 2018, if so a reader would not be useful.
Flags: needinfo?(mak77) → needinfo?(jorge)
install.rdf is going away with legacy add-ons, so we can drop support for it later this year. update.rdf already has a JSON alternative (https://developer.mozilla.org/en-US/Add-ons/Updates), but we need to move AMO to start using it (https://github.com/mozilla/addons-server/issues/7223) and then give non-AMO developers time to migrate to the new format.

Regardless, we're moving away from RDF and at this point I don't think it's worth working on a new parser.
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(jorge)
Resolution: --- → WONTFIX
No longer blocks: 274068
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.