Closed
Bug 259119
Opened 20 years ago
Closed 19 years ago
rdf/xml serializer needs to be less memory consuming
Categories
(Core Graveyard :: RDF, defect)
Core Graveyard
RDF
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: axel, Assigned: axel)
References
Details
Attachments
(1 file, 2 obsolete files)
57.48 KB,
patch
|
benjamin
:
review+
shaver
:
superreview+
benjamin
:
approval1.8b3+
|
Details | Diff | Splinter Review |
While trying to whack rdf/xml parser, I found out that rdfcat runs out of stack
on my testcases. That's evil.
But there are quite a few things we can do to the serializer to get it less
abusive.
The first thing is to get rid of the prefix atoms. We want to have nsC(A)Strings
in AddNamespace, too.
And then I want a hashtable that maps property and type nsIRDFResources to
qnames. Note that we call MakeQName with a aNamespaceURI all the time, though
we never ever use it.
Assignee | ||
Comment 1•20 years ago
|
||
One of the big hogs in rdf serializing is GetAllResources, we call that twice,
and each time we allocate a complete array of all Source/Subject resources.
Thus I'd like to prepend some parts of our new API to the actual serializer
patch.
To check that it actually does something usefull, I started an NTriples
serializer. Thinking about it, this might be the easiest thing to incorporate
w3c tests, as a sort/diff on NTriples should give better comparison data. And
the diff can then be checked for anonymous resources by human.
Assignee | ||
Comment 2•20 years ago
|
||
Comment on attachment 161142 [details] [diff] [review]
add rdfIDataSource and rdfITripleVisitor, start of a test to serialize NTriples
Asking bsmedberg for review first, these are his interfaces for the most part.
Attachment #161142 -
Flags: review?(bsmedberg)
Comment 3•20 years ago
|
||
Comment on attachment 161142 [details] [diff] [review]
add rdfIDataSource and rdfITripleVisitor, start of a test to serialize NTriples
>Index: base/idl/rdfIDataSource.idl
>+ * @note Implementations are not required to implement this method,
>+ * though this method is used for RDF/XML serialisation and may throw
How about "Implementations may throw NS_ERROR_NOT_IMPLEMENTED for this method,
but in this case RDF serializations of this datasource will not be possible."
>Index: base/idl/rdfITripleVisitor.idl
>+ void visit(in nsIRDFResource aSubject, in nsIRDFResource aPredicate,
>+ in nsIRDFNode aObject, in boolean aTruthValue);
nit, whitespace
>Index: base/src/nsInMemoryDataSource.cpp
>@@ -1017,6 +1023,9 @@
> else if (aIID.Equals(NS_GET_IID(nsIRDFPurgeableDataSource))) {
> *aResult = NS_STATIC_CAST(nsIRDFPurgeableDataSource*, this);
> }
>+ else if (aIID.Equals(NS_GET_IID(rdfIDataSource))) {
>+ *aResult = NS_STATIC_CAST(rdfIDataSource*, this);
>+ }
> else {
> *aResult = nsnull;
> return NS_NOINTERFACE;
Ook, while we're touching this code can't we switch to NS_IMPL_QUERYINTERACEx?
>+NS_IMETHODIMP
>+InMemoryDataSource::GetAllSubjects(rdfITripleVisitor *aVisitor)
>+{
>+ NS_AUTOLOCK(mLock);
>+
>+ // Enumerate all of our entries into an nsISupportsArray.
>+ PL_DHashTableEnumerate(&mForwardArcs, SubjectEnumerator, aVisitor);
>+
>+ return NS_OK;
>+}
1) I'm hazy on the locking semantic of the in-memory datasource, since
resources and literal are flagrantly not thread-safe, but it seems like a very
bad idea to be holding onto a lock while calling out to other XPCOM modules. Is
there any code that actually relies on this stuff being threadsafe, or can we
get rid of the locking stuff altogether?
2) You are eating exceptions thrown during visitation. It wouldn't be hard to
make a closure structure:
struct VisitorClosure {
rdfITripleVisitor* visitor;
nsresult rv;
};
and throw exceptions back to the original caller.
>Index: tests/triplescat/Makefile.in
did you copy this license from somewhere? I'm a bit confused what this code is.
>Index: tests/triplescat/triplescat.cpp
>+NS_IMETHODIMP
>+SubjectVisitor::Visit(nsIRDFResource *aSubject, nsIRDFResource *aPredicate,
>+ nsIRDFNode *aObject, PRBool aTruthValue)
>+{
>+ const char* ref;
>+ aSubject->GetValueConst(&ref);
>+ printf("%s\n", ref);
>+ return NS_OK;
>+}
So, this doesn't actually produce n-triples, just a list of subjects. Is that
what you intended?
>+ // And this should write it back out. The do_QI() on the pointer
>+ // is a hack to make sure that the new object gets AddRef()-ed.
>+ nsCOMPtr<nsIOutputStream> out =
>+ do_QueryInterface(new ConsoleOutputStreamImpl, &rv);
That is totally bogus. nsCOMPtr<nsIOutputStream> out = new
ConsoleOutputStreamImpl(); should work just fine.
Attachment #161142 -
Flags: review?(bsmedberg) → review+
Assignee | ||
Comment 4•19 years ago
|
||
Attaching patch, so I don't need to put on the USB stick.
This patch whacks the serializer to
- use a hash resource->QName instead of resolving them over and over
- use the new APIs for the hashing phase, getting rid of one GetAllSubjects
- nuke default namespaces, that way I can use QNames for both elements and
attributes alike
- be twice as fast for my localstore.rdf
I didn't fix all the other bugs, but I did some regression tests on the W3C RDF
testcases (scary results, but no regressions).
Gonna do a trunk build and see if I can do a nice localstore goodness, smoking
some test.
Assignee | ||
Comment 5•19 years ago
|
||
Comment on attachment 186045 [details] [diff] [review]
pretty final, I guess
Did some localstore.rdf smoketesting, everything works just fine.
(Tested persistance of window sizes and collapse/expand state in help viewer.)
Attachment #186045 -
Flags: review?(benjamin)
Comment 6•19 years ago
|
||
Comment on attachment 186045 [details] [diff] [review]
pretty final, I guess
Need to document whether rdfISerializer.serialize is sync or async.
I'm intrigued why you're using a success code for NS_RDF_STOP_VISIT instead of
NS_ERROR_ABORT. And you need to declare what haapens in case of other
exceptions.
Attachment #186045 -
Flags: review?(benjamin) → review-
Assignee | ||
Comment 7•19 years ago
|
||
I updated the http://wiki.mozilla.org/RDF:Interfaces page to reflect my current
state in the tree. Benjamin, do those look good?
Assignee | ||
Comment 8•19 years ago
|
||
Fixed comments to match the wiki version benjamin and I created.
Attachment #161142 -
Attachment is obsolete: true
Attachment #186045 -
Attachment is obsolete: true
Attachment #186459 -
Flags: review?(benjamin)
Updated•19 years ago
|
Attachment #186459 -
Flags: review?(benjamin) → review+
Assignee | ||
Updated•19 years ago
|
Attachment #186459 -
Flags: superreview?(shaver)
Assignee | ||
Comment 9•19 years ago
|
||
nit to self, I just adjusted NS_RDF_STOP_VISIT to use 4, 3 is
NS_RDF_ASSERTION_REJECTED.
Comment 10•19 years ago
|
||
Comment on attachment 186459 [details] [diff] [review]
fixed doxygen docs in idls, same patch as 186045 otherwise
I'd like to see us use [function] on the visitor interfaces, if we can, so that
it's cleaner to call the visitation methods from JS.
sr=shaver contingent on a summary of a few interesting test scenarios for the
new code.
Attachment #186459 -
Flags: superreview?(shaver) → superreview+
Assignee | ||
Comment 11•19 years ago
|
||
(In reply to comment #10)
> (From update of attachment 186459 [details] [diff] [review] [edit])
> I'd like to see us use [function] on the visitor interfaces, if we can, so that
> it's cleaner to call the visitation methods from JS.
It already is, so this one is fine.
> sr=shaver contingent on a summary of a few interesting test scenarios for the
> new code.
Testplan:
- make sure that VisitAllTriples reports all assertions
(done for in-memory-ds, thus rdfxml-ds, triple-serializer is the testbed here)
- make sure that VisitAllSubjects reports all subjects
(done, as far as that is a subset of VisitAllTriples, though the code is not
shared)
- make sure that we clean up correctly if a visitor call fails and report
exceptions back to caller
(this is pldhash code and nsCOMPtrs, return values are retained)
- make sure that we handle assertions being added/removed during the visit callback
(handled, those are blocked)
- control memory and time usage during calls to VisitAllFoo, these should scale
well
(they do, O(N), as they just call into each hash once)
Note to self, this testplan should be on devmo, hopefully just a few days until
I'll joyfully play there.
Assignee | ||
Comment 12•19 years ago
|
||
Comment on attachment 186459 [details] [diff] [review]
fixed doxygen docs in idls, same patch as 186045 otherwise
I'd like to land this for an aviary alpha, though it should be little risk.
Extensions may do funky things like serializing a self create nsIRDFDataSource
impl to RDF/XML, which would break, but that is all far from frozen and should
be just OK.
Attachment #186459 -
Flags: approval1.8b3?
Updated•19 years ago
|
Attachment #186459 -
Flags: approval1.8b3? → approval1.8b3+
Assignee | ||
Comment 13•19 years ago
|
||
checked in.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 14•19 years ago
|
||
I created http://wiki.mozilla.org/RDF:rdfIDataSource_tests until I start pulling
up devmo rdf docs.
Comment 15•19 years ago
|
||
Using Serialize() of nsIRDFXMLSource to serialize rdf:history is now broken in
Deer Park Alpha 2 for Mac (Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O;
en-US; rv:1.8b3) Gecko/20050712 Firefox/1.0+). It used to work in Alpha 1 or
Firefox 1.0.x.
var rdfhis
=Components.classes["@mozilla.org/rdf/rdf-service;1"].getService(Components.interfaces.nsIRDFService).GetDataSource("rdf:history");
var serializer=Components.classes["@mozilla.org/rdf/xml-serializer;1"]
.createInstance(Components.interfaces.nsIRDFXMLSerializer);
serializer.init(rdfhis);
var outputStream={ data: "", close : function(){}, flush : function(){},write:
function (buffer,count){this.data += buffer; return count; }, writeFrom :
function (stream,count){}, isNonBlocking: false};
serializer.QueryInterface(Components.interfaces.nsIRDFXMLSource);
// Just as describe in http://www.xulplanet.com/tutorials/mozsdk/rdfsave.php
serializer.Serialize(outputStream);
Error: uncaught exception: [Exception... "Component returned failure code:
0x80004005 (NS_ERROR_FAILURE) [nsIRDFXMLSource.Serialize]" nsresult:
"0x80004005 (NS_ERROR_FAILURE)" location: "JS frame :: javascript: var rdfhis = ...
Or was this a kind of the "funky" serialization that was supposed to break?
Assignee | ||
Comment 16•19 years ago
|
||
I'd call that one of the funky serialization questions that are not guaranteed to
work at all times. Unless I get a hard usecase, I don't intend to keep history
on the bleeding edge of things, but rather fix it once the new API has settled
on the core RDF datasources.
Comment 17•19 years ago
|
||
Hey,
Is there some other call that I can make? I'm desperate to convert my history because it happens to contain some important data that got accidently wiped - about 3 months of ongoing hours spent on a project, which is very helpful.
Thanks,
Mark
(In reply to comment #16)
> I'd call that one of the funky serialization questions that are not guaranteed
> to
> work at all times. Unless I get a hard usecase, I don't intend to keep history
> on the bleeding edge of things, but rather fix it once the new API has settled
> on the core RDF datasources.
Comment 18•19 years ago
|
||
There is no feature work going on in RDF land right now, and history is moving
away from RDF anyway. Thus this issue will get fixed eventually without RDF being
involved.
In the meantime, you could just create an RDF/XML DataSource and copy all arcs
from the history datasource over.
Updated•6 years ago
|
Product: Core → Core Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•