rdf/xml serializer needs to be less memory consuming

RESOLVED FIXED

Status

()

Core
RDF
RESOLVED FIXED
13 years ago
12 years ago

People

(Reporter: Axel Hecht, Assigned: Axel Hecht)

Tracking

Trunk
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

13 years ago
While trying to whack rdf/xml parser, I found out that rdfcat runs out of stack
on my testcases. That's evil.
But there are quite a few things we can do to the serializer to get it less
abusive.
The first thing is to get rid of the prefix atoms. We want to have nsC(A)Strings
in AddNamespace, too.
And then I want a hashtable that maps property and type nsIRDFResources to
qnames. Note that we call MakeQName with a aNamespaceURI all the time, though
we never ever use it.
(Assignee)

Comment 1

13 years ago
Created attachment 161142 [details] [diff] [review]
add rdfIDataSource and rdfITripleVisitor, start of a test to serialize NTriples

One of the big hogs in rdf serializing is GetAllResources, we call that twice, 

and each time we allocate a complete array of all Source/Subject resources.
Thus I'd like to prepend some parts of our new API to the actual serializer
patch.

To check that it actually does something usefull, I started an NTriples 
serializer. Thinking about it, this might be the easiest thing to incorporate 
w3c tests, as a sort/diff on NTriples should give better comparison data. And 
the diff can then be checked for anonymous resources by human.
(Assignee)

Comment 2

13 years ago
Comment on attachment 161142 [details] [diff] [review]
add rdfIDataSource and rdfITripleVisitor, start of a test to serialize NTriples

Asking bsmedberg for review first, these are his interfaces for the most part.
Attachment #161142 - Flags: review?(bsmedberg)

Comment 3

13 years ago
Comment on attachment 161142 [details] [diff] [review]
add rdfIDataSource and rdfITripleVisitor, start of a test to serialize NTriples

>Index: base/idl/rdfIDataSource.idl
>+     * @note Implementations are not required to implement this method,
>+     * though this method is used for RDF/XML serialisation and may throw

How about "Implementations may throw NS_ERROR_NOT_IMPLEMENTED for this method,
but in this case RDF serializations of this datasource will not be possible."

>Index: base/idl/rdfITripleVisitor.idl
>+    void visit(in nsIRDFResource aSubject, in nsIRDFResource aPredicate,
>+	       in nsIRDFNode aObject, in boolean aTruthValue);

nit, whitespace

>Index: base/src/nsInMemoryDataSource.cpp

>@@ -1017,6 +1023,9 @@
>     else if (aIID.Equals(NS_GET_IID(nsIRDFPurgeableDataSource))) {
>         *aResult = NS_STATIC_CAST(nsIRDFPurgeableDataSource*, this);
>     }
>+    else if (aIID.Equals(NS_GET_IID(rdfIDataSource))) {
>+        *aResult = NS_STATIC_CAST(rdfIDataSource*, this);
>+    }
>     else {
>         *aResult = nsnull;
>         return NS_NOINTERFACE;

Ook, while we're touching this code can't we switch to NS_IMPL_QUERYINTERACEx?

>+NS_IMETHODIMP
>+InMemoryDataSource::GetAllSubjects(rdfITripleVisitor *aVisitor)
>+{
>+    NS_AUTOLOCK(mLock);
>+
>+    // Enumerate all of our entries into an nsISupportsArray.
>+    PL_DHashTableEnumerate(&mForwardArcs, SubjectEnumerator, aVisitor);
>+
>+    return NS_OK;
>+} 

1) I'm hazy on the locking semantic of the in-memory datasource, since
resources and literal are flagrantly not thread-safe, but it seems like a very
bad idea to be holding onto a lock while calling out to other XPCOM modules. Is
there any code that actually relies on this stuff being threadsafe, or can we
get rid of the locking stuff altogether?

2) You are eating exceptions thrown during visitation. It wouldn't be hard to
make a closure structure:

struct VisitorClosure {
  rdfITripleVisitor* visitor;
  nsresult	     rv;
};

and throw exceptions back to the original caller.

>Index: tests/triplescat/Makefile.in

did you copy this license from somewhere? I'm a bit confused what this code is.

>Index: tests/triplescat/triplescat.cpp

>+NS_IMETHODIMP
>+SubjectVisitor::Visit(nsIRDFResource *aSubject, nsIRDFResource *aPredicate,
>+                      nsIRDFNode *aObject, PRBool aTruthValue)
>+{
>+    const char* ref;
>+    aSubject->GetValueConst(&ref);
>+    printf("%s\n", ref);
>+    return NS_OK;
>+}

So, this doesn't actually produce n-triples, just a list of subjects. Is that
what you intended?

>+    // And this should write it back out. The do_QI() on the pointer
>+    // is a hack to make sure that the new object gets AddRef()-ed.
>+    nsCOMPtr<nsIOutputStream> out =
>+        do_QueryInterface(new ConsoleOutputStreamImpl, &rv);

That is totally bogus. nsCOMPtr<nsIOutputStream> out = new
ConsoleOutputStreamImpl(); should work just fine.
Attachment #161142 - Flags: review?(bsmedberg) → review+
(Assignee)

Updated

13 years ago
Blocks: 278560
(Assignee)

Updated

13 years ago
No longer blocks: 278560
(Assignee)

Updated

13 years ago
Depends on: 291123
(Assignee)

Comment 4

13 years ago
Created attachment 186045 [details] [diff] [review]
pretty final, I guess

Attaching patch, so I don't need to put on the USB stick.

This patch whacks the serializer to 
- use a hash resource->QName instead of resolving them over and over
- use the new APIs for the hashing phase, getting rid of one GetAllSubjects
- nuke default namespaces, that way I can use QNames for both elements and 
  attributes alike
- be twice as fast for my localstore.rdf

I didn't fix all the other bugs, but I did some regression tests on the W3C RDF

testcases (scary results, but no regressions).
Gonna do a trunk build and see if I can do a nice localstore goodness, smoking
some test.
(Assignee)

Comment 5

13 years ago
Comment on attachment 186045 [details] [diff] [review]
pretty final, I guess

Did some localstore.rdf smoketesting, everything works just fine.

(Tested persistance of window sizes and collapse/expand state in help viewer.)
Attachment #186045 - Flags: review?(benjamin)

Comment 6

13 years ago
Comment on attachment 186045 [details] [diff] [review]
pretty final, I guess

Need to document whether rdfISerializer.serialize is sync or async.

I'm intrigued why you're using a success code for NS_RDF_STOP_VISIT instead of
NS_ERROR_ABORT. And you need to declare what haapens in case of other
exceptions.
Attachment #186045 - Flags: review?(benjamin) → review-
(Assignee)

Comment 7

13 years ago
I updated the http://wiki.mozilla.org/RDF:Interfaces page to reflect my current
state in the tree. Benjamin, do those look good?
(Assignee)

Comment 8

13 years ago
Created attachment 186459 [details] [diff] [review]
fixed doxygen docs in idls, same patch as 186045 otherwise

Fixed comments to match the wiki version benjamin and I created.
Attachment #161142 - Attachment is obsolete: true
Attachment #186045 - Attachment is obsolete: true
Attachment #186459 - Flags: review?(benjamin)

Updated

13 years ago
Attachment #186459 - Flags: review?(benjamin) → review+
(Assignee)

Updated

13 years ago
Attachment #186459 - Flags: superreview?(shaver)
(Assignee)

Comment 9

13 years ago
nit to self, I just adjusted NS_RDF_STOP_VISIT to use 4, 3 is 
NS_RDF_ASSERTION_REJECTED.
Comment on attachment 186459 [details] [diff] [review]
fixed doxygen docs in idls, same patch as 186045 otherwise

I'd like to see us use [function] on the visitor interfaces, if we can, so that
it's cleaner to call the visitation methods from JS.

sr=shaver contingent on a summary of a few interesting test scenarios for the
new code.
Attachment #186459 - Flags: superreview?(shaver) → superreview+
(Assignee)

Comment 11

13 years ago
(In reply to comment #10)
> (From update of attachment 186459 [details] [diff] [review] [edit])
> I'd like to see us use [function] on the visitor interfaces, if we can, so that
> it's cleaner to call the visitation methods from JS.

It already is, so this one is fine.

> sr=shaver contingent on a summary of a few interesting test scenarios for the
> new code.

Testplan:
- make sure that VisitAllTriples reports all assertions
 (done for in-memory-ds, thus rdfxml-ds, triple-serializer is the testbed here)
- make sure that VisitAllSubjects reports all subjects
 (done, as far as that is a subset of VisitAllTriples, though the code is not
  shared)
- make sure that we clean up correctly if a visitor call fails and report
  exceptions back to caller
 (this is pldhash code and nsCOMPtrs, return values are retained)
- make sure that we handle assertions being added/removed during the visit callback
 (handled, those are blocked)
- control memory and time usage during calls to VisitAllFoo, these should scale
  well
 (they do, O(N), as they just call into each hash once)

Note to self, this testplan should be on devmo, hopefully just a few days until
I'll joyfully play there.
(Assignee)

Comment 12

13 years ago
Comment on attachment 186459 [details] [diff] [review]
fixed doxygen docs in idls, same patch as 186045 otherwise

I'd like to land this for an aviary alpha, though it should be little risk.

Extensions may do funky things like serializing a self create nsIRDFDataSource
impl to RDF/XML, which would break, but that is all far from frozen and should
be just OK.
Attachment #186459 - Flags: approval1.8b3?

Updated

13 years ago
Attachment #186459 - Flags: approval1.8b3? → approval1.8b3+
(Assignee)

Comment 13

13 years ago
checked in.
Status: NEW → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → FIXED
(Assignee)

Comment 14

13 years ago
I created http://wiki.mozilla.org/RDF:rdfIDataSource_tests until I start pulling
up devmo rdf docs.

Comment 15

12 years ago
Using Serialize() of nsIRDFXMLSource to serialize rdf:history is now broken in
Deer Park Alpha 2 for Mac (Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O;
en-US; rv:1.8b3) Gecko/20050712 Firefox/1.0+). It used to work in Alpha 1 or
Firefox 1.0.x.

var rdfhis
=Components.classes["@mozilla.org/rdf/rdf-service;1"].getService(Components.interfaces.nsIRDFService).GetDataSource("rdf:history");
var serializer=Components.classes["@mozilla.org/rdf/xml-serializer;1"]
.createInstance(Components.interfaces.nsIRDFXMLSerializer);
serializer.init(rdfhis);
var outputStream={ data: "", close : function(){}, flush : function(){},write:
function (buffer,count){this.data += buffer; return count; }, writeFrom :
function (stream,count){}, isNonBlocking: false};
serializer.QueryInterface(Components.interfaces.nsIRDFXMLSource);
// Just as describe in http://www.xulplanet.com/tutorials/mozsdk/rdfsave.php
serializer.Serialize(outputStream);

Error: uncaught exception: [Exception... "Component returned failure code:
0x80004005 (NS_ERROR_FAILURE) [nsIRDFXMLSource.Serialize]"  nsresult:
"0x80004005 (NS_ERROR_FAILURE)"  location: "JS frame :: javascript: var rdfhis = ...

Or was this a kind of the "funky" serialization that was supposed to break?
(Assignee)

Comment 16

12 years ago
I'd call that one of the funky serialization questions that are not guaranteed to
work at all times. Unless I get a hard usecase, I don't intend to keep history
on the bleeding edge of things, but rather fix it once the new API has settled
on the core RDF datasources.

Comment 17

12 years ago
Hey, 

Is there some other call that I can make? I'm desperate to convert my history because it happens to contain some important data that got accidently wiped - about 3 months of ongoing hours spent on a project, which is very helpful.

Thanks,

Mark

(In reply to comment #16)
> I'd call that one of the funky serialization questions that are not guaranteed
> to
> work at all times. Unless I get a hard usecase, I don't intend to keep history
> on the bleeding edge of things, but rather fix it once the new API has settled
> on the core RDF datasources.

Comment 18

12 years ago
There is no feature work going on in RDF land right now, and history is moving
away from RDF anyway. Thus this issue will get fixed eventually without RDF being
involved.

In the meantime, you could just create an RDF/XML DataSource and copy all arcs 
from the history datasource over.
You need to log in before you can comment on or make changes to this bug.