Last Comment Bug 337940 - Microsummary service should allow authors to dictate update frequency
: Microsummary service should allow authors to dictate update frequency
Status: RESOLVED FIXED
: fixed1.8.1
Product: Firefox Graveyard
Classification: Graveyard
Component: Microsummaries (show other bugs)
: 2.0 Branch
: All All
: P3 normal
: Firefox 2 beta2
Assigned To: Simon Bünzli
:
Mentors:
http://wiki.mozilla.org/Talk:Microsum...
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-05-14 13:02 PDT by Ben Basson
Modified: 2016-02-12 06:59 PST (History)
9 users (show)
mconnor: blocking‑firefox2+
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
proposal (11.03 KB, patch)
2006-06-20 19:14 PDT, Simon Bünzli
myk: review-
Details | Diff | Review
proposal (nits fixed) (10.80 KB, patch)
2006-07-10 17:04 PDT, Simon Bünzli
myk: review-
Details | Diff | Review
more nits fixed (10.81 KB, patch)
2006-07-14 16:52 PDT, Simon Bünzli
myk: review-
Details | Diff | Review
even more nits fixed (11.52 KB, patch)
2006-07-15 02:02 PDT, Simon Bünzli
myk: review-
Details | Diff | Review
still more nits fixed (and unbitrotted) (11.50 KB, patch)
2006-07-18 15:31 PDT, Simon Bünzli
myk: review+
mconnor: approval1.8.1+
Details | Diff | Review
patch checked into branch (interface GUIDs revved) (7.78 KB, patch)
2006-07-21 15:38 PDT, Myk Melez [:myk] [@mykmelez]
no flags Details | Diff | Review

Description Ben Basson 2006-05-14 13:02:15 PDT
Discussion: http://wiki.mozilla.org/Talk:Microsummaries

Some microsummaries will only need updating once per day, some will need updating a lot more frequently. 

Having the same update period for all microsummaries doesn't make much sense as some checks will occur too often (the page hasn't changed) and some may be too late (the page has changed several times but the summary did not reflect this).

At worst, it would be nice to have a single <update/> tag or similar, with a single integer value for seconds between updates.

Example:

  <update interval="14400"/>

At best, it would be great to have a conditional check, where (by way of using XPath value retrievals and XSLT boolean rules) a sliding scale can be implemented. Abstractly, this would be in something like:

Example: 

  <update interval="14400">
    <if (XPath boolean condition) interval="3600"/>
    <if (XPath boolean condition) interval="60"/>
  </update>

In this case, the update interval is 4 hours by default, then 1 hour and finally  1 minute, depending on which XPath conditions are satisfied.
Comment 1 Jonathan Haas 2006-05-14 13:36:24 PDT
I think we do not need a conditional check. This would probably be overkill.

And there should be a minimum interval. Nearly nobody wants his microsummary to refresh every 10 seconds even if it's specified in the file. IMHO the minimum should be something around 10 minutes by default and it should be adjustable with about:config.
Comment 2 Ben Basson 2006-05-14 18:06:08 PDT
(In reply to comment #1)
> I think we do not need a conditional check. This would probably be overkill.

Probably. The main idea of the conditional check is to better support auction-style scenarios and I can't think of a more elegant way of doing this.
Comment 3 Myk Melez [:myk] [@mykmelez] 2006-05-14 20:14:33 PDT
Agreed that we need a way for authors to specify the update frequency.  In addition to the auction use case, this functionality would also benefit stock quote microsummaries, which don't need to be updated at all while the market is closed.
Comment 4 Mike Connor [:mconnor] 2006-06-14 08:17:48 PDT
Ok, let's take this if we can, but I think this is lowest-priority of the list, but we can discuss today.
Comment 5 Simon Bünzli 2006-06-20 19:14:58 PDT
Created attachment 226449 [details] [diff] [review]
proposal

This patch implements the second suggested alternative: authors can either specify one fix interval with

  <update interval="14400"/>

or various intervals depending on the last downloaded version of a page with

  <update interval="14400">
    <if cond="<XPath boolean condition>" interval="3600"/>
    <if cond="<XPath boolean condition>" interval="60"/>
  </update>

All values are rounded to a precision of about CHECK_INTERVAL (currently 15 seconds) due to how the update checks are performed.

This patch adds a getUpdateInterval method to nsIMicrosummaryGenerator and an updateInterval attribute to nsIMicrosummary. Both interface changes could of course also be dropped.
Comment 6 Myk Melez [:myk] [@mykmelez] 2006-06-20 20:42:34 PDT
(In reply to comment #5)
> Created an attachment (id=226449) [edit]
> proposal
> 
> This patch implements the second suggested alternative:

I haven't had a chance to review this in-depth yet (will do so later tonight), but I wanted to mention that my primary concern is that we design this in a way that makes it possible to add time period-based conditions down the road.

For example, a NYSE stock quote page doesn't change when the market is closed, so it might specify that it should only be updated "between 9:30am and 4:30pm Eastern time Monday - Friday".  And we'd want these conditions to be negatable.

We might implement this via an XML version of the values used by cron (minute, hour, day of month, month, and day of week) to specify time points, but modified to specify time periods.  Or perhaps some other way.  I just want to make sure that whatever we implement here is extensible to other conditional types.
Comment 7 Myk Melez [:myk] [@mykmelez] 2006-06-21 05:19:53 PDT
Ok, I've taken a closer look at the proposal here, and it looks like it'll work fine and be extensible to time period-based conditions.  I'd just like to see two changes.  First, a terminology change that lets us expand the concept of the "condition" in the future to encompass time period-based conditions (and perhaps other types):

<update interval="">
  <condition expression="<xpath boolean expression>" interval="">
</update>

Second, we should be measuring intervals in minutes, not seconds, and not refresh microsummaries more than once a minute, because users aren't apt to need more frequent updates badly enough to justify the potential network load.
Comment 8 Myk Melez [:myk] [@mykmelez] 2006-07-10 14:30:30 PDT
Comment on attachment 226449 [details] [diff] [review]
proposal

>+    // allow the generators to set individual update values (even varying
>+    // depending on certain XPath conditions)
>+    var update = xml.getElementsByTagNameNS(MICSUM_NS, "update")[0] || null;
>+    if (update) {
>+      this._updateIntervals = new Array();
>+      this._updateIntervals.push({
>+        interval: parseInt(update.getAttribute("interval")) * 1000
>+      });

This code should account for the possibility that the <update> tag doesn't have an interval attribute.


>+  getUpdateInterval: function MSD_getUpdateInterval(pageContent) {

Nit: perhaps it would make sense to call this calculateUpdateInterval to distinguish it from a static getter.

This code should also incorporate the changes in comment 7.  Otherwise it looks good.  Thanks for this valuable fix!
Comment 9 Simon Bünzli 2006-07-10 17:04:15 PDT
Created attachment 228749 [details] [diff] [review]
proposal (nits fixed)
Comment 10 Myk Melez [:myk] [@mykmelez] 2006-07-14 16:16:22 PDT
Comment on attachment 228749 [details] [diff] [review]
proposal (nits fixed)

>+  _updateInterval: 0, // = use default

Wouldn't it be better to use null rather than zero to signify that the microsummary doesn't have a specified update interval?


>+    // allow the generators to set individual update values (even varying
>+    // depending on certain XPath expressions)
>+    var update = xml.getElementsByTagNameNS(MICSUM_NS, "update")[0] || null;

xml -> generatorNode


>+    if (update) {
>+      this._updateIntervals = new Array();
>+      this._updateIntervals.push({
>+        interval: update.hasAttribute("interval") ?
>+                  parseInt(update.getAttribute("interval")) * 60 * 1000 : 0
>+      });
>+      // collect the <if cond="XPath Expression" interval="time"/> clauses

This comment should be updated to reflect the new syntax.


>+  calculateUpdateInterval: function MSD_calculateUpdateInterval(pageContent) {
>+    if (this.content || !this._updateIntervals || !pageContent)
>+      return 0;
>+
>+    var doc = pageContent;
>+    for (var i = 1; i < this._updateIntervals.length; i++) {
>+      try {
>+        if (doc.evaluate(this._updateIntervals[i].expression, doc, null,
>+                         Ci.nsIDOMXPathResult.BOOLEAN_TYPE, null).booleanValue)
>+          return this._updateIntervals[i].interval;
>+      }

This code assumes all intervals will be accompanied by expressions, so it incorrectly evaluates the interval specified by the "interval" attribute to the <update> tag to false and ignores it.  It should take this interval into account.

Otherwise looks good!  I haven't tested expression-based intervals yet, but I'm whipping up a testcase for that.
Comment 11 Simon Bünzli 2006-07-14 16:52:05 PDT
Created attachment 229319 [details] [diff] [review]
more nits fixed

(In reply to comment #10)
> (From update of attachment 228749 [details] [diff] [review])
> >+  _updateInterval: 0, // = use default
> 
> Wouldn't it be better to use null rather than zero to signify that the
> microsummary doesn't have a specified update interval?

The reason for 0 instead of null was that the interface promises an integer value. For clarity I now convert null to an integer in the getter.

> >+    for (var i = 1; i < this._updateIntervals.length; i++) {
> >+      try {
> >+        if (doc.evaluate(this._updateIntervals[i].expression, doc, null,
> >+                         Ci.nsIDOMXPathResult.BOOLEAN_TYPE, null).booleanValue)
> >+          return this._updateIntervals[i].interval;
> >+      }
> 
> This code assumes all intervals will be accompanied by expressions, so it
> incorrectly evaluates the interval specified by the "interval" attribute to the
> <update> tag to false and ignores it.  It should take this interval into
> account.

The code was correct: it first iterated starting from index 1 and used the entry at index 0 (the interval of the update tag itself) as fall back. This should now be clearer (unless I completely misunderstood you).
Comment 12 Myk Melez [:myk] [@mykmelez] 2006-07-14 17:55:32 PDT
Comment on attachment 229319 [details] [diff] [review]
more nits fixed

> The reason for 0 instead of null was that the interface promises an integer
> value.

That's ok, you can return null for an integer XPCOM property, and using null allows us to differentiate between "the generator didn't specify an update interval" and "the generator specified the update interval 0" (which of course we would ignore because it would mean constantly reloading the page), so it'd be better to use null everywhere and have the calling code branch if it receives null (the branching shouldn't even need to change from the way you're doing it now).


> The code was correct: it first iterated starting from index 1 and used 
> the entry at index 0 (the interval of the update tag itself) as fall back.
> This should now be clearer (unless I completely misunderstood you).

Ah, sorry, I missed that.

Also, the code needs to make sure the interval is greater than or equal to 1.  Otherwise the generator could refresh faster than once per minute just by specifying a fractional refresh rate.

Besides the null and interval-greater-than-or-equal-to-1 issues, this code looks good, so I'm not sure why it's happening, but the microsummary service seems to be ignoring the intervals I specify.  I've been testing with two update sections:

<update interval="1"/>

<update>
  <condition expression="1 = 1" interval="1"/>
</update>

Both sections should produce the same result (an update interval of one minute), but neither does so.  Instead, the service falls back to the default interval.  I added some logging in _updateMicrosummary right before the setField call that updates the expiration time, and microsummary.updateInterval is always zero at that point:

  _updateMicrosummary: function MSS__updateMicrosummary(bookmarkID, microsummary) {
    this._setField(bookmarkID, FIELD_GENERATED_TITLE, microsummary.content);
LOG(microsummary.pageURI.spec + " updateInterval = " + microsummary.updateInterval);
    this._setField(bookmarkID, FIELD_MICSUM_EXPIRATION,
                   Date.now() + (microsummary.updateInterval || UPDATE_INTERVAL));

So there's something wrong, but I can't quite figure out what it is.

Note that there's work going on in bug 342044 to let users specify a global update interval via a preference.  That preference, like generator-specified intervals, takes minutes, and once both patches land, we're going to have to say something like:

 update interval = user pref ? <user value> 
                             : generator spec ? <generator value>
                                              : <default value>;

Given this, since both the user pref and the generator-specified values are in minutes, perhaps we should make the default value also be in minutes and then factor out the code for converting these values to milliseconds, reducing the risk of a mistake in the conversion code.
Comment 13 Simon Bünzli 2006-07-15 02:02:19 PDT
Created attachment 229343 [details] [diff] [review]
even more nits fixed

This patch returns null where no value has been specified, it makes sure that the time interval is at least one minute and - opposed to the previous patches - allows minute fractions (I previously used parseInt).

Then I've found one bug consisting of a pair of missing parentheses after a conditional (which IMO should be compulsory anyway - unfortunately mconnor still thinks otherwise) which not only broke custom updates but also adding text microsummaries.

As for the minutes vs. microseconds question: I'd internally rather use microseconds everywhere. You're usually already used to handle them for timers and intervals and have to convert them sooner or later anyways.

And as for letting the user specify an update interval: make sure not to add an official default value for that pref. Otherwise it might be somewhat counter-intuitive that specifically a pref value of 30 (which would probably be the default) allows authors to specify their own intervals. It might be better to introduce a second pref at this point: browser.microsummary.minimumUpdateInterval (or have that other pref behave like a minimum value).
Comment 14 Myk Melez [:myk] [@mykmelez] 2006-07-18 15:11:26 PDT
Comment on attachment 229343 [details] [diff] [review]
even more nits fixed

There's a conflict when attempting to apply this patch to the trunk, but it's trivial to resolve.

>-    var now = new Date().getTime();
>+    var now = new Date.now();

Date.now is a static method, so this should be:

  var now = Date.now();

Otherwise now will be an object, not an integer.  Also, it might be worth moving this variable declaration outside the loop so we don't reassign it to the current time for every loop cycle.


>   _content: null,
>   get content() {
>     // If we have everything we need to generate the content, generate it.
>     if (this._content == null &&
>         this.generator &&
>-        (this.pageContent || !this.generator.needsPageContent))
>+        (this.pageContent || !this.generator.needsPageContent)) {
>       this._content = this.generator.generateMicrosummary(this.pageContent);
>-  
>+      this.updateInterval = this.generator.calculateUpdateInterval(this.pageContent);
>+    }

I'm a bit concerned about this magical behavior of the content getter, but I don't have a better idea right now.


>+     * Calculate the interval until the microsummary should be updated for
>+     * the next time, depending on the page content. If the generator doesn't
>+     * specify an interval, a false value (null or 0) is returned.

It's not actually a false value, it's the null value, which happens to evaluate to false in a boolean context.  Since the code returns null if no interval was specified, and it returns 1 if an interval < 1 was specified, will it ever return zero?
Comment 15 Simon Bünzli 2006-07-18 15:31:24 PDT
Created attachment 229732 [details] [diff] [review]
still more nits fixed (and unbitrotted)

(In reply to comment #14)
> Date.now is a static method, so this should be:

Of course - just missed the "new". And it was already moved out of the loop.

> I'm a bit concerned about this magical behavior of the content getter, but I
> don't have a better idea right now.

It's not too magical. calculateUpdateInterval just has to be called whenever generateMicrosummary is called. One alternative coming to mind would be to pass the microsummary into generateMicrosummary so that that method can update the update interval. At least currently there are no other consumers of generateMicrosummary yet...

> Since the code returns null if no interval was
> specified, and it returns 1 if an interval < 1 was specified, will it ever
> return zero?

Indeed not. BTW: I adjusted the code so that it never returns NaN, either (could happen when passing a non-numeric string as interval value).
Comment 16 Myk Melez [:myk] [@mykmelez] 2006-07-18 16:46:52 PDT
Comment on attachment 229732 [details] [diff] [review]
still more nits fixed (and unbitrotted)

> Of course - just missed the "new". And it was already moved out of the loop.

Err, right, indeed it was. :-)


> It's not too magical. calculateUpdateInterval just has to be called whenever
> generateMicrosummary is called. One alternative coming to mind would be to 
> pass the microsummary into generateMicrosummary so that that method can 
> update the update interval. At least currently there are no other consumers
> of generateMicrosummary yet...

That might make sense, although the benefit of the current approach is that it's clear that we're both generating the microsummary and calculating the update interval.

We could also keep the page content around after generating the microsummary and then lazily calculate the update interval the moment someone calls its getter.  But perhaps that's risky, as the content could go away in the meantime (f.e. if the user unloaded the page).


> Indeed not. BTW: I adjusted the code so that it never returns NaN, either
> (could happen when passing a non-numeric string as interval value).

Excellent!
Comment 17 Martijn Wargers [:mwargers] (not working for Mozilla) 2006-07-19 05:39:51 PDT
Checking in src/nsMicrosummaryService.js.in;
/cvsroot/mozilla/browser/components/microsummaries/src/nsMicrosummaryService.js.
in,v  <--  nsMicrosummaryService.js.in
new revision: 1.25; previous revision: 1.24
done
Checking in public/nsIMicrosummaryService.idl;
/cvsroot/mozilla/browser/components/microsummaries/public/nsIMicrosummaryService
.idl,v  <--  nsIMicrosummaryService.idl
new revision: 1.3; previous revision: 1.2
done

Checked into trunk.

Simon, your surname is kind of hosed in the contributor's list, btw:
http://lxr.mozilla.org/seamonkey/search?string=Simon+B
Comment 18 Martijn Wargers [:mwargers] (not working for Mozilla) 2006-07-19 16:55:13 PDT
(In reply to comment #17)
> Simon, your surname is kind of hosed in the contributor's list, btw:
> http://lxr.mozilla.org/seamonkey/search?string=Simon+B

Never mind that, I didn't have the right character encoding set.

Comment 19 Myk Melez [:myk] [@mykmelez] 2006-07-21 08:44:25 PDT
Comment on attachment 229732 [details] [diff] [review]
still more nits fixed (and unbitrotted)

Notes for drivers considering this approval request:

This patch enables microsummary generators to specify an update interval different from the default (with a lower limit of one minute, the same as for livemarks).  It presents a moderate risk of regression, although the patch has been on the trunk since Wednesday morning, and no regressions have yet been found.
Comment 20 Mike Connor [:mconnor] 2006-07-21 11:08:34 PDT
Comment on attachment 229732 [details] [diff] [review]
still more nits fixed (and unbitrotted)

a=mconnor on behalf of drivers for checkin to the 1.8 branch

as a note, please rev the GUID for the two changed interfaces
Comment 21 Myk Melez [:myk] [@mykmelez] 2006-07-21 15:38:55 PDT
Created attachment 230205 [details] [diff] [review]
patch checked into branch (interface GUIDs revved)

Note You need to log in before you can comment on or make changes to this bug.