Closed Bug 265334 Opened 21 years ago Closed 21 years ago

Add API for observing all data coming through the parser.

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jst, Assigned: jst)

Details

(Keywords: fixed-aviary1.0, fixed1.7.5)

Attachments

(2 files, 3 obsolete files)

Extension developers have shown interest in getting an API for observing all data that comes through the parser, i.e. all markup that's loaded in a browser. Adding such an API is fairly trivial and it'll go a long way for those who need it, and it seems worth the trouble to push in as a last minit API addition to Firefox 1.0.
Assignee: parser → jst
Comment on attachment 162809 [details] [diff] [review] Add nsIUnicharStreamListener, and make the parser notify registerd listeners through the new interface. A lot of this is adding nsIContentSink::GetDocument() since that's spread out all over, the interesting changes are in nsParser.cpp and nsScanner.cpp.
Attachment #162809 - Flags: superreview?(brendan)
Attachment #162809 - Flags: review?(darin)
Comment on attachment 162809 [details] [diff] [review] Add nsIUnicharStreamListener, and make the parser notify registerd listeners through the new interface. >Index: htmlparser/public/nsIContentSink.h >+ virtual nsIDocument *GetDocument()=0; maybe use NS_IMETHOD_ instead? (that way the same calling convention is used -- stdcall under Win32 is a good thing :-) NS_IMETHOD_(nsIDocument *) GetDocument()=0; >Index: htmlparser/src/nsParser.cpp >+static nsCOMArray<nsIUnicharStreamListener> *gParserDataObservers; >+ rv = cm->GetCategoryEntry("Parser data listener", categoryEntry.get(), >+ getter_Copies(contractId)); >+ NS_ENSURE_SUCCESS(rv, rv); maybe we should add a #define for this category name in a header file somewhere? >+ nsCOMPtr<nsIUnicharStreamListener> listener = >+ do_CreateInstance(contractId.get()); hmm... or, do_GetService? >+// static >+void nsParser::Shutdown() >+{ >+ FreeSharedObjects(); >+ >+ delete gParserDataObservers; > } it seems to be safe to call FreeSharedObjects more than once, do we want to null out gParserDataObservers here in case Shutdown (or Init) is ever called again? there was talk of restarting xpcom in process, so nulling out this var would probably be good. >+ // needs the parser is to call DataAvailable() on it, and that's >+ // only ever wanted when parsing from an URI. >+ theScanner->SetParser(this); /DataAvailable/DataAdded/ ? >Index: htmlparser/src/nsParser.h >+ void DataAdded(const nsASingleFragmentString& aData); use |nsSubstring| instead of |nsASingleFragmentString| since the latter is deprecated (though i don't think it is properly documented as such). >Index: netwerk/base/public/nsIUnicharStreamListener.idl >+ * The Initial Developer of the Original Code is >+ * Netscape Communications Corporation. >+ * Portions created by the Initial Developer are Copyright (C) 1998 >+ * the Initial Developer. All Rights Reserved. fixme :) >+/** >+ * nsIUnicharStreamListener >+ * >+ * @status FROZEN >+ */ Probably a good idea to give a brief description of this interface. r=me with nits picked
Attachment #162809 - Flags: review?(darin) → review+
> maybe use NS_IMETHOD_ instead? (that way the same calling > convention is used -- stdcall under Win32 is a good thing :-) oh.. hmm.. nevermind.. you'd be using thiscall under win32 which is better than stdcall because it passes the |this| pointer in a register.. nevermind me. perhaps all of nsIContentSink should be changed.. eventually :-/
Comment on attachment 162809 [details] [diff] [review] Add nsIUnicharStreamListener, and make the parser notify registerd listeners through the new interface. + * @status FROZEN that's a rather extreme case of a flash freeze, don't you think? nsIStreamListener uses a stream for the data, should this pass nsIUnicharInputStream? (which ought to be made scriptable) nsIUnicharStreamListener.idl should use MPL rather than NPL I'm not sure that making htmlparser depend on content is such a good idea... can you avoid that dependency by making GetDocument return an nsISupports*? Will this code be called for XML document as well? (like XHTML, for example)
(In reply to comment #5) > (From update of attachment 162809 [details] [diff] [review]) > + * @status FROZEN > > that's a rather extreme case of a flash freeze, don't you think? No, that's the whole point here, we want to expose a frozen interface that embedders/extensions can use, landing something thta's not frozen wouldn't cut it here. > nsIStreamListener uses a stream for the data, should this pass > nsIUnicharInputStream? (which ought to be made scriptable) From talking this over with Darin, we decided to use AString here instead. > nsIUnicharStreamListener.idl should use MPL rather than NPL Good point. Fixed. > I'm not sure that making htmlparser depend on content is such a good idea... > can you avoid that dependency by making GetDocument return an nsISupports*? I could, but why? I don't see a problem with the added dependency, it's not like the parser is independent of the content code in any way. And it depends on layout too, for that matter :) > Will this code be called for XML document as well? (like XHTML, for example) Yes, XML, XHTML, RDF, you name it. All that comes through necko to the parser will go to these observers, if any...
1) You need to document what document it is GetDocument() returns. For some sinks it's non-obvious. 2) On trunk you need to patch the XML fragment sink 3) I agree with biesi that freezing an interface before anyone has had a chance to try using it and uncovering issues seems hasty. 4) The documentation in the interface doesn't indicate whether the context can be null (it can). 5) + * An exception thrown from onUnicharDataAvailable has the + * side-effect of causing the request to be canceled. is not true in the patch as it stands. In general the API looks ok, I guess...
we get the biggest benefit on this if it gets on the aviary and 1.7 branches. need to convince ourselves that its ready for prime time tho....
Flags: blocking1.7.x?
Flags: blocking-aviary1.0?
> All that comes through necko to the parser will go to these observers, if any... ah, that it was in an "htmlparser" directory threw me off... >No, that's the whole point here, we want to expose a frozen interface that >embedders/extensions can use, landing something thta's not frozen wouldn't cut >it here. embeddors (and _much more_ extension authors, which you cite as the use case for this) are using lots of unfrozen interfaces today. I don't see why you need to freeze this one this fast.
Flags: blocking1.7.x?
Flags: blocking1.7.x+
Flags: blocking-aviary1.0?
Flags: blocking-aviary1.0+
Attachment #162940 - Flags: superreview?(brendan)
Attachment #162809 - Flags: superreview?(brendan)
Attachment #162940 - Flags: superreview?(brendan)
Attachment #162940 - Flags: superreview?(brendan)
Attachment #162940 - Flags: superreview?(brendan)
Attachment #162809 - Attachment is obsolete: true
Attachment #162927 - Attachment is obsolete: true
Attachment #162940 - Attachment is obsolete: true
Comment on attachment 162951 [details] [diff] [review] Even less string fu, and bail properly on errors from listeners. jst: thanks, looks good. Noting darin's r= here. biesi: You're right, but I would turn it around: we *should* have frozen faster, esp. small, well-factored, easily tested interfaces. See the asgard plugin bug. We're starting here, now. /be
Attachment #162951 - Flags: superreview+
Attachment #162951 - Flags: review+
Attachment #162951 - Flags: approval1.7.x+
Attachment #162951 - Flags: approval-aviary+
So where does the user get to define this context? For streamlistener, the context is typically the context the channel was opened with, which _is_ user-defined. The context here is not.
The user doesn't get to define the context, the context is the document, when available, or null if not. We just need to document that somewhere (not in the API though).
> The user doesn't get to define the context, the context is the document, when > available, or null if not. We just need to document that somewhere (not in the > API though). how about in a header file with a #define for the category name.
> The user doesn't get to define the context The api documentation in that patch says: + * @param aContext user defined context I realize there are cases when it _could_ be user-defined, but maybe we can say this better?
Attached patch Trunk version.Splinter Review
This is the same as the above diff, but for the trunk. This patch also includes parser/htmlparser/public/nsParserDataListener.h, an unused header that describes this API a bit. Feel free to comment. The header already landed on the branches too.
Status: NEW → ASSIGNED
shouldn't the new header file have an include guard? can you merge these two comments, for the benefit of tools like doxygen: +/** + * nsIUnicharStreamListener is very similar to nsIStreamListener with + * the difference being that this interface gives notifications about + * data being available after the raw data has been converted to + * UTF-16. The primary use for this interface is in the parser when it + * notifies parser data listeners about additional data being + * available. + */ + +/** + * nsIUnicharStreamListener + * + * @status FROZEN + */
oh, and this being a frozen interface, it should be part of SDK_XPIDLSRC, not XPIDLSRC. (netwerk/base/public/Makefile.in)
and finally: I'm not sure that an interface living in necko should mention use cases in parser... (sorry for being annoying)
All fixed, I believe (on branches etc). And landed on the trunk too.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Perhaps nsParserDataListener.h should have an "@status FROZEN" in the comments?
Done.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: