Closed Bug 1116715 Opened 9 years ago Closed 9 years ago

[NetworkStats] DB upgrade pattern of NetworkStats is error-prone

Categories

(Firefox OS Graveyard :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(blocking-b2g:2.2+, firefox36 wontfix, firefox37 wontfix, firefox38 fixed, b2g-v2.2 fixed, b2g-master fixed)

RESOLVED FIXED
2.2 S4 (23jan)
blocking-b2g 2.2+
Tracking Status
firefox36 --- wontfix
firefox37 --- wontfix
firefox38 --- fixed
b2g-v2.2 --- fixed
b2g-master --- fixed

People

(Reporter: ethan, Assigned: ethan)

References

Details

Attachments

(1 file, 10 obsolete files)

The following description is extracted from Ben Turner's comment in
https://bugzilla.mozilla.org/show_bug.cgi?id=1070944#c61.

The existing upgrade code of NetworkStatsDB uses a dangerously broken pattern.
It does async work (e.g. |openCursor()|) within a for-loop.
The for-loop doesn't wait for each async operation to complete before looping,
so we have a nasty problem.
Consider this simplified example code:

  // Pseudo-code, most details omitted...
  function upgradeSchema(aTransaction, aDb, aOldVersion, aNewVersion) {
    for (let currVersion = aOldVersion; currVersion < aNewVersion; currVersion++) {
      if (currVersion == 0) {
        aDb.createObjectStore("foo");
      } else if (currVersion == 1) {
        aTransaction.objectStore("foo").openCursor().onsuccess = function(event) {
          let cursor = event.target.result;
          if (cursor) {
            alert(cursor.key);
            cursor.continue();
          }
        };
      } else if (currVersion == 2) {
        aDB.deleteObjectStore("foo");
      }
    }
  }

This code doesn't actually loop over all the values in the objectStore like you probably think it should in the 1->2 upgrade. Instead the sequence of executed calls is:

  1. createObjectStore("foo"); // upgrade 0->1
  2. objectStore("foo").openCursor(); // upgrade 1->2
  3. deleteObjectStore("foo"); // upgrade 2->3
  4. cursor.continue(); // EXCEPTION!

The |openCursor()| success callback is asynchronous and will happen after the objectStore has been deleted, so an exception will be thrown when calling |cursor.continue()|.
The same broken pattern was fixed in the contacts database in bug 892497, and for the SMS database in bug 887701.
Assignee: nobody → ettseng
Blocks: 1070944
This is essential function for cost control and should be landed to 2.2.
feature-b2g: --- → 2.2?
Attached patch bug-1116715-wip.patch (obsolete) — Splinter Review
A WIP patch.
Attached patch bug-1116715-fix.patch (obsolete) — Splinter Review
A draft patch ready for test.
Attachment #8544415 - Attachment is obsolete: true
Hi Albert,

You are the one that is most familiar with NetworkStatsDB.
Could you help to clarify something for me?

1. There is no function for upgrading version 1 to 2.
   Is there a historical reason? I'd like to document it in code comment.

2. In upgrade4to5(), I assume it's okay to create the object store for alarms before the completion
   of openCursor() callback since they operate on different object stores.
   Can you help to double confirm my assumption is correct?

3. I am not confident of upgrade6to7() because I don't fully understand this part.
   My current patch executes upgradeNextVersion() in the end of this function, which is problematic
   as the original broken pattern. But there are many returns in the nested callback functions
   and I am not sure when should we terminate this upgrade operation and proceed to the next one.
   Could you provide some idea?
Flags: needinfo?(alberto.crespellperez)
(In reply to Ethan Tseng [:ethan] from comment #5)
> Hi Albert,
> 
> You are the one that is most familiar with NetworkStatsDB.
> Could you help to clarify something for me?

Sure

> 1. There is no function for upgrading version 1 to 2.
>    Is there a historical reason? I'd like to document it in code comment.

Yes, as you can see in upgrade to version 2 comment https://mxr.mozilla.org/mozilla-central/source/dom/network/NetworkStatsDB.jsm#74 changes of database schema didn't allow to migrate data from 1.2 to 1.3 (networkstatsdb version 1 to version 2). So, talking with Gene, who was the reviewer at that moment, we decided to remove upgrade upgrade from version 1 to 2 because there was no commercial device device with version 1.

You can see old upgrade function at http://hg.mozilla.org/releases/mozilla-b2g26_v1_2/file/4c359ff76654/dom/network/src/NetworkStatsDB.jsm#l93
 
> 2. In upgrade4to5(), I assume it's okay to create the object store for
> alarms before the completion
>    of openCursor() callback since they operate on different object stores.
>    Can you help to double confirm my assumption is correct?

Yes, here the objectStore for alarms is created for first time, so there is no problem of sync.

> 3. I am not confident of upgrade6to7() because I don't fully understand this
> part.
>    My current patch executes upgradeNextVersion() in the end of this
> function, which is problematic
>    as the original broken pattern. But there are many returns in the nested
> callback functions
>    and I am not sure when should we terminate this upgrade operation and
> proceed to the next one.
>    Could you provide some idea?

Here we are updating both alarms store and stats store, and two update processes are independent so can be run in parallel but you don't know which one will finish before so you should wait until both finished.

The alarms store update is quite simple, just updates the threshold value.

On the other hand, the stats store update search network values in all records and put each one (unique) in  'networks' variable. Then, for each network opens a cursor to go through all records of the given network in order to determine when happened a reset caused by a 'clearInterfaceStats' call. So you have an upgrade process for each 'network' value, you should wait to all openCursor in foreach network, then you can consider that stats store upgrade has finished.
Flags: needinfo?(alberto.crespellperez)
(In reply to Albert [:albert] from comment #6)
> > 1. There is no function for upgrading version 1 to 2.
> >    Is there a historical reason? I'd like to document it in code comment.
> Yes, as you can see in upgrade to version 2 comment
> https://mxr.mozilla.org/mozilla-central/source/dom/network/NetworkStatsDB.
> jsm#74 changes of database schema didn't allow to migrate data from 1.2 to
> 1.3 (networkstatsdb version 1 to version 2). So, talking with Gene, who was
> the reviewer at that moment, we decided to remove upgrade upgrade from
> version 1 to 2 because there was no commercial device device with version 1.

I got it. Thanks!

 
> > 2. In upgrade4to5(), I assume it's okay to create the object store for
> >    alarms before the completion
> >    of openCursor() callback since they operate on different object stores.
> >    Can you help to double confirm my assumption is correct?
> Yes, here the objectStore for alarms is created for first time, so there is
> no problem of sync.

Thanks.
Actually, after some investigation I found we cannot create object store in the onsuccess callback.
I will provide more detail in new comments.

 
> > 3. I am not confident of upgrade6to7() because I don't fully understand this
> >    part.
> >    My current patch executes upgradeNextVersion() in the end of this
> >    function, which is problematic
> >    as the original broken pattern. But there are many returns in the nested
> >    callback functions
> >    and I am not sure when should we terminate this upgrade operation and
> >    proceed to the next one.
> >    Could you provide some idea?
> Here we are updating both alarms store and stats store, and two update
> processes are independent so can be run in parallel but you don't know which
> one will finish before so you should wait until both finished.
> The alarms store update is quite simple, just updates the threshold value.
> On the other hand, the stats store update search network values in all
> records and put each one (unique) in  'networks' variable. Then, for each
> network opens a cursor to go through all records of the given network in
> order to determine when happened a reset caused by a 'clearInterfaceStats'
> call. So you have an upgrade process for each 'network' value, you should
> wait to all openCursor in foreach network, then you can consider that stats
> store upgrade has finished.

Thanks for you explanation. :)
Thus, we have to wait until all the records of all the networks are processed completely.
I think we can either:
1. Add a counter to mark the end condition.
or
2. Make the callbacks of networks.forEach() sequential.
Status: NEW → ASSIGNED
Attached patch bug-1116715.patch (obsolete) — Splinter Review
Attachment #8544445 - Attachment is obsolete: true
Attached patch bug-1116715.patch (obsolete) — Splinter Review
Refine the patch.
Attachment #8548844 - Attachment is obsolete: true
Attached patch bug-1116715.patch (obsolete) — Splinter Review
Minor refinement.
Attachment #8549369 - Attachment is obsolete: true
Attached patch bug-1116715.patch (obsolete) — Splinter Review
Minor refinement.
Attachment #8549370 - Attachment is obsolete: true
Attached patch bug-1116715.patch (obsolete) — Splinter Review
Fixed typo.
Attachment #8549373 - Attachment is obsolete: true
Comment on attachment 8549374 [details] [diff] [review]
bug-1116715.patch

Hi Albert,

I refined the part of upgrade6to7().
Could you help to see if my change does not break the original intention
of that part?
Attachment #8549374 - Flags: feedback?(alberto.crespellperez)
triage: this is must-have for not miscalculating data usage between browser and system.
We had this defect since 2.0, and should get it fixed in 2.2 at least.
blocking-b2g: --- → 2.2+
feature-b2g: 2.2? → ---
Comment on attachment 8549374 [details] [diff] [review]
bug-1116715.patch

Review of attachment 8549374 [details] [diff] [review]:
-----------------------------------------------------------------

See comments

::: dom/network/NetworkStatsDB.jsm
@@ +281,1 @@
>            cursor.continue();

You don't check if alarm index population ofALARMS_STORE_NAME finishes before / after the STATS_STORE_NAME upgrade. Both stores can be upgraded asynchronously but you should wait that both finished before doing next version upgrade.

@@ +386,5 @@
> +        return;
> +      }
> +
> +      try {
> +        var i = index++;

Why do you need var i? just index++

What happens if aNewVersion is 0? you never will call upgradeSteps[0]

@@ +387,5 @@
> +      }
> +
> +      try {
> +        var i = index++;
> +        if (DEBUG) debug("Upgrade step: " + i + "\n");

s/i/index

@@ +388,5 @@
> +
> +      try {
> +        var i = index++;
> +        if (DEBUG) debug("Upgrade step: " + i + "\n");
> +        upgradeSteps[i].call(outer);

s/i/index

@@ +397,4 @@
>        }
>      }
> +
> +    if (aNewVersion > upgradeSteps.length) {

You don't check aOldVersion, maybe you can do some upgrades before abort transaction.

@@ +398,5 @@
>      }
> +
> +    if (aNewVersion > upgradeSteps.length) {
> +      debug("No migration steps for the new version!");
> +      aTransaction.abort();

You abort the transaction but then you call upgradeNextVersion, do you miss return statement?
Attachment #8549374 - Flags: feedback?(alberto.crespellperez) → feedback-
Comment on attachment 8549374 [details] [diff] [review]
bug-1116715.patch

Review of attachment 8549374 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/network/NetworkStatsDB.jsm
@@ +281,1 @@
>            cursor.continue();

You are right.
Then I think it's better to make these operations sequential.
That is, after the completion of populating index to alarm store, upgrade stats store.

@@ +386,5 @@
> +        return;
> +      }
> +
> +      try {
> +        var i = index++;

Using an additional var i here is simply to make code clear.
Or I have to write "if ((index + 1) == aNewVersion)" instead.

aNewVersion should never be zero. Even if it is zero, we should consider it as an error case
and should not upgrade anything.

@@ +397,4 @@
>        }
>      }
> +
> +    if (aNewVersion > upgradeSteps.length) {

Actually, I treat this block as an assertion for checking code logic.
Since aNewVersion is specified by code (const DB_VERSION), this check should not fail
unless we do not synchronize DB_VERSION and upgrade functions.
Making partial upgrade here might just complicate things.

@@ +398,5 @@
>      }
> +
> +    if (aNewVersion > upgradeSteps.length) {
> +      debug("No migration steps for the new version!");
> +      aTransaction.abort();

Yes! I missed a return here.
Thanks for catching this.
Attached patch bug-1116715.patch (obsolete) — Splinter Review
This patch addresses two issues in comment 15.
1. Make asynchronous operations in upgrade6to7() sequential.
2. Adding a missing return statement in error handling code.
Attachment #8549374 - Attachment is obsolete: true
Attachment #8551054 - Flags: review?(alberto.crespellperez)
Attached patch bug-1116715.patch (obsolete) — Splinter Review
Albert, could you help to review this patch?
Attachment #8551054 - Attachment is obsolete: true
Attachment #8551054 - Flags: review?(alberto.crespellperez)
Attachment #8551056 - Flags: review?(alberto.crespellperez)
Attached patch bug-1116715.patch (obsolete) — Splinter Review
Fix a typo in code comment.
Attachment #8551056 - Attachment is obsolete: true
Attachment #8551056 - Flags: review?(alberto.crespellperez)
Attachment #8551057 - Flags: review?(alberto.crespellperez)
Comment on attachment 8551057 [details] [diff] [review]
bug-1116715.patch

Review of attachment 8551057 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me,
Thanks!
Attachment #8551057 - Flags: review?(alberto.crespellperez) → review+
(In reply to Albert [:albert] from comment #20)
> Review of attachment 8551057 [details] [diff] [review]:
> -----------------------------------------------------------------
> Looks good to me,
> Thanks!

Albert, thanks for your review.
Comment on attachment 8551057 [details] [diff] [review]
bug-1116715.patch

Hi Ben,

You helped us recognize this bug.
I also would like to have your help to confirm our solution is sane.
Attachment #8551057 - Flags: review+ → review?(bent.mozilla)
Comment on attachment 8551057 [details] [diff] [review]
bug-1116715.patch

Review of attachment 8551057 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good to me, thanks for fixing!

::: dom/network/NetworkStatsDB.jsm
@@ +395,5 @@
> +        if (DEBUG) debug("Upgrade step: " + i + "\n");
> +        upgradeSteps[i].call(outer);
> +      } catch (ex) {
> +        dump("Caught exception " + ex);
> +        aTransaction.abort();

This is not technically needed since any uncaught exception will automatically abort the transaction. However, since you are catching exceptions for logging purposes then it might be better to just re-throw the exception rather than manually calling abort().
Attachment #8551057 - Flags: review?(bent.mozilla) → feedback+
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #23)
> This looks good to me, thanks for fixing!
> This is not technically needed since any uncaught exception will
> automatically abort the transaction. However, since you are catching
> exceptions for logging purposes then it might be better to just re-throw the
> exception rather than manually calling abort().

Ben, thanks for your time reviewing this.
I will replace the abort by re-throwing exception.
1. Replace transaction abort by exception re-throw according to comment 23.
2. Refresh comment "r=albert, f=bent".
Attachment #8551057 - Attachment is obsolete: true
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/6287bcc6cce3
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → 2.2 S4 (23jan)
Comment on attachment 8552269 [details] [diff] [review]
bug-1116715.patch

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 1070944 (Separation of browsing
                                          data from system data)
User impact if declined: Incorrect usage in cost control app
Testing completed: On m-c
Risk to taking this patch (and alternatives if risky): None
String or UUID changes made by this patch: None
Attachment #8552269 - Flags: approval-mozilla-b2g37?
Attachment #8552269 - Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
You need to log in before you can comment on or make changes to this bug.