Closed Bug 964228 Opened 10 years ago Closed 10 years ago

Race condition in NetworkStatsService.jsm

Categories

(Core :: Networking, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla30
blocking-b2g 1.3+
Tracking Status
firefox28 --- wontfix
firefox29 --- wontfix
firefox30 --- fixed
b2g-v1.3 --- fixed
b2g-v1.3T --- fixed
b2g-v1.4 --- fixed

People

(Reporter: albert, Assigned: johnshih.bugs)

References

Details

Attachments

(7 files, 9 obsolete files)

35.55 KB, text/x-csrc
Details
18.58 KB, patch
johnshih.bugs
: review+
Details | Diff | Splinter Review
14.34 KB, patch
albert
: review+
Details | Diff | Splinter Review
9.99 KB, patch
albert
: review+
Details | Diff | Splinter Review
17.40 KB, patch
johnshih.bugs
: review+
Details | Diff | Splinter Review
18.60 KB, patch
Details | Diff | Splinter Review
27.13 KB, patch
Details | Diff | Splinter Review
As seen in bug 961162 there is a race condition in 'saveStats' and 'updateCachedStats' functions. Is necessary to add a mechanism to ensure single or synchronized access to 'cachedStats' array.
Race condition can be identified because cachedStats array is cleared before calling _db.saveStats and produces the following error:

 E/GeckoConsole( 3649): [JavaScript Error: "aStats is undefined" {file: "resource://gre/modules/NetworkStatsDB.jsm" line: 258}]
I'll work on it
QA Contact: jshih
Assignee: nobody → jshih
QA Contact: jshih
FYI: 
I obtain the same error when I browse with Gecko-c5866e0 and Gaia-bb2e2f3:

E/GeckoConsole(  822): [JavaScript Error: "aStats is undefined" {file: "resource://gre/modules/NetworkStatsDB.jsm" line: 354}]

Regards
Occurs the same on v1.3 branch, after browse, when open datausage, I saw the same error:

E/GeckoConsole(  477): [JavaScript Error: "aStats is undefined" {file: "resource://gre/modules/NetworkStatsDB.jsm" line: 234}]
E/GeckoConsole(  477): [JavaScript Error: "aStats is undefined" {file: "resource://gre/modules/NetworkStatsDB.jsm" line: 234}]
I've been seeing occasionally this bug with the following STR:

1. Browse 
2. Starts the data usage app. 

Actual behaviour:
  After launching the usage app, the data usage chart is not displayed. 
Expected behavior:
  Usage app can display data usage chart

After the error is produced if you reopen the application the behaviour is normal. 
 
I tried to reproduce several times this bug and it doesn't occurs always, approximately the twenty percent of the time with the previous STR.

Regards
Per comment 5 nominating to v1.3?
blocking-b2g: --- → 1.3?
Sorry I was stuck on another emergency blocker this week. Will work on this bug ASAP. (As I said in  Comment 2, I believe this bug will be covered by the work in Bug 928289, which is going to re-factor some logic of NetworkStatsService).
Please do not rely on the feature Bug 928289. 1.3+
blocking-b2g: 1.3? → 1.3+
In the patch I en-queue all the operations related to DB/cached read&write and It works for me.
But since I don't have a good STR to reproduce the race condition, I might need you or QA can help on test this solution.
Attachment #8384437 - Flags: review?(gene.lian)
Attachment #8384437 - Flags: review?(acperez)
Could you help on testing with the patch (or provide a good STR to reproduce the bug)? Actually I can't reproduce the bug on my own v1.3 build. But I still try to refactor the logic in NetworkStatsService and hope can solve the problem.
Flags: needinfo?(mri)
Hi, 
i don't mind testing the patch. The STR to reproduce is explained on Comment 5. 
1. Browse with the navigator.
2.  Minimize the browser, although the page is not completely loaded.
3. Launchs Costcontrol application.

Anyway, I'm going to test now, and i'll give you my feedback ;). 

Regards, 
Mai
Flags: needinfo?(mri)
(In reply to marina rodríguez [:mai] from comment #11)
> Hi, 
> i don't mind testing the patch. The STR to reproduce is explained on Comment
> 5. 
> 1. Browse with the navigator.
> 2.  Minimize the browser, although the page is not completely loaded.
> 3. Launchs Costcontrol application.
> 
> Anyway, I'm going to test now, and i'll give you my feedback ;). 
> 
> Regards, 
> Mai

Thanks for your help!
Comment on attachment 8384437 [details] [diff] [review]
(v1.3) Bug 964228: Fix race condition in NetworkStatsService.jsm. r=gene, albert

Review of attachment 8384437 [details] [diff] [review]:
-----------------------------------------------------------------

I can reproduce the bug in 1.3 and it is fixed with your patch

Thanks John!

::: dom/network/src/NetworkStatsService.jsm
@@ +549,5 @@
>      }
>  
> +    // Prevent from executing processQueue multiple times while queue is already
> +    // under process
> +    if (this.isQueueRunning) {

'if' not needed, same 'if' statement is evaluated in processQueue function.

@@ +702,5 @@
> +                            queueType: QUEUE_TYPE_WRITE_CACHE });
> +
> +    // Prevent from executing processQueue multiple times while queue is already
> +    // undger process.
> +    if (this.isQueueRunning) {

ditto

@@ +779,5 @@
> +    this.updateQueue.push({ callbacks: [aCallback],
> +                            queueType: QUEUE_TYPE_UPDATE_CACHE });
> +    // Prevent from executing processQueue multiple times while queue is already
> +    // undger process.
> +    if (this.isQueueRunning) {

ditto
Attachment #8384437 - Flags: review?(acperez) → review+
(In reply to Albert [:albert] from comment #13)
> Comment on attachment 8384437 [details] [diff] [review]
> (v1.3) Bug 964228: Fix race condition in NetworkStatsService.jsm. r=gene,
> albert
> 
> Review of attachment 8384437 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I can reproduce the bug in 1.3 and it is fixed with your patch
> 
> Thanks John!
> 
> ::: dom/network/src/NetworkStatsService.jsm
> @@ +549,5 @@
> >      }
> >  
> > +    // Prevent from executing processQueue multiple times while queue is already
> > +    // under process
> > +    if (this.isQueueRunning) {
> 
> 'if' not needed, same 'if' statement is evaluated in processQueue function.
> 
> @@ +702,5 @@
> > +                            queueType: QUEUE_TYPE_WRITE_CACHE });
> > +
> > +    // Prevent from executing processQueue multiple times while queue is already
> > +    // undger process.
> > +    if (this.isQueueRunning) {
> 
> ditto
> 
> @@ +779,5 @@
> > +    this.updateQueue.push({ callbacks: [aCallback],
> > +                            queueType: QUEUE_TYPE_UPDATE_CACHE });
> > +    // Prevent from executing processQueue multiple times while queue is already
> > +    // undger process.
> > +    if (this.isQueueRunning) {
> 
> ditto

Yes, I also notice the duplicated logic here. But when I remove these check and test, the successive processQueue call (without callback) could occur, which doesn't happen with these check. I'm guessing there is a timing issue or other problems. Will investigate more on it tomorrow.
Patch update: remove redundant 'if' statement.
Attachment #8384437 - Attachment is obsolete: true
Attachment #8384437 - Flags: review?(gene.lian)
Attachment #8385143 - Flags: review?(gene.lian)
Hi Albert,

After investigating deeper on the patch, I found another timing issue:
the error message is 
E/GeckoConsole(  520): [JavaScript Error: "item is undefined" {file: "resource://gre/modules/NetworkStatsService.jsm" line: 584}]

It is caused by while a queue is under process (with multiple callbacks), another processQueue() is called. In this case, the processQueue() will be called one more time. 

Please see the log below for more clear:
I/Gecko   (  520): jshih 4 processQueue start undefined
I/Gecko   (  520): jshih 4 queueType 0
I/Gecko   (  520): jshih 5 processQueue start true
I/Gecko   (  520): jshih 5 queue [{"netId":"00","callbacks":[null,null,null],"queueType":0}]
* Begins to handle execute callbacks
I/Gecko   (  520): jshih 6 processQueue start undefined
* Another processQueue is called
I/Gecko   (  520): jshih 6 queueType 1
I/Gecko   (  520): jshih 7 processQueue start undefined
I/Gecko   (  520): jshih 7 queue is running (length > 1). return.
I/Gecko   (  520): jshih 8 processQueue start undefined
I/Gecko   (  520): jshih 8 queue is running (length > 1). return.
I/Gecko   (  520): jshih 5 queueType 1
* Previous callbacks handling is ended. Cause executing processQueue one more time.
I/Gecko   (  520): jshih 9 processQueue start true
I/Gecko   (  520): jshih 9 queue [{"callbacks":[null],"queueType":1},{"callbacks":[null],"queueType":1},{"callbacks":[null],"queueType":1}]
I/Gecko   (  520): jshih 9 queueType 1
I/Gecko   (  520): jshih 10 processQueue start true
I/Gecko   (  520): jshih 10 queue [{"callbacks":[null],"queueType":1},{"callbacks":[null],"queueType":1}]
I/Gecko   (  520): jshih 10 queueType 1
I/Gecko   (  520): jshih 11 processQueue start true
I/Gecko   (  520): jshih 11 queue [{"callbacks":[null],"queueType":1}]
I/Gecko   (  520): jshih 11 turn isQueueRunning to false. return.
I/Gecko   (  520): jshih 12 processQueue start true
I/Gecko   (  520): jshih 12 queue []
* Error occurs, since there is no |callbacks| here.

What do you think on this case? Compare to my original method (add more check of isQueueRunning before each processQueue is called), I think the easy way might be adding a check (updateQueue is empty or not) before 
let item = this.updateQueue.shift();
Flags: needinfo?(acperez)
Debug message for processQueue
> 
> What do you think on this case? Compare to my original method (add more
> check of isQueueRunning before each processQueue is called), I think the
> easy way might be adding a check (updateQueue is empty or not) before 
> let item = this.updateQueue.shift();

One more thought! I think we can just change 
if(this.updateQueue.length > 1)
to 
if(this.updateQueue.length > 0)
Attachment #8385143 - Attachment is obsolete: true
Attachment #8385143 - Flags: review?(gene.lian)
Attachment #8385214 - Flags: review?(gene.lian)
Attachment #8385214 - Flags: review?(acperez)
patch for mozilla-central
Attachment #8385218 - Flags: review?(gene.lian)
Attachment #8385218 - Flags: review?(acperez)
The modification here will be affected by Bug 922926, Bug 964270, which were tagged as 1.3T+. Ask for triage.
John how do you reproduce it?
Flags: needinfo?(acperez)
(In reply to Albert [:albert] from comment #22)
> John how do you reproduce it?

Firstly you should remove the all the 'if' statements you mentioned in my previous patch, and then remain
if(this.updateQueue.length > 1)

The STR is simple, just open Usage app (mostly occur in the first launch time)
the alternative way is 
1. Open a browser
2. Navigate to a web (find one needs to load for a while, e.g., 'tw.yahoo.com', which I always test with)
3. During wed loading, press home button
4. Launch Usage app

It's easily reproducible for me.
Comment on attachment 8385214 [details] [diff] [review]
(v1.3) Bug 964228: Fix race condition in NetworkStatsService.jsm. r=gene, albert

Review of attachment 8385214 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good for me.

::: dom/network/src/NetworkStatsService.jsm
@@ +578,5 @@
>        // The caller is a function that has pushed new elements to the queue,
>        // if isQueueRunning is false it means there is no processing currently
>        // being done, so start.
>        if (this.isQueueRunning) {
> +        if(this.updateQueue.length > 0) {

Now that we are checking that length > 0, does make sense to have two if statements? I mean, we can return if isQueueRunning. Are you agree?
Attachment #8385214 - Flags: review?(acperez) → review+
Comment on attachment 8385218 [details] [diff] [review]
Bug 964228: Fix race condition in NetworkStatsService.jsm. r=gene, albert

Review of attachment 8385218 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine.

Same comment as the one for 1.3 patch regarding the if in processQueue.

Can you launch a try to ensure nothing is broken?
Attachment #8385218 - Flags: review?(acperez) → review+
Update based on Comment 24.
Attachment #8385214 - Attachment is obsolete: true
Attachment #8385214 - Flags: review?(gene.lian)
Attachment #8385806 - Flags: review?(gene.lian)
update based on Comment 25
Attachment #8385218 - Attachment is obsolete: true
Attachment #8385218 - Flags: review?(gene.lian)
Attachment #8385807 - Flags: review?(gene.lian)
Fix some errors founded by testing.
Attachment #8385807 - Attachment is obsolete: true
Attachment #8385807 - Flags: review?(gene.lian)
Attachment #8386610 - Flags: review?(gene.lian)
tests update for corresponding change
Attachment #8386613 - Flags: review?(gene.lian)
Attachment #8386613 - Flags: review?(acperez)
fix nits
Attachment #8386613 - Attachment is obsolete: true
Attachment #8386613 - Flags: review?(gene.lian)
Attachment #8386613 - Flags: review?(acperez)
Attachment #8386616 - Flags: review?(gene.lian)
Attachment #8386616 - Flags: review?(acperez)
Attachment #8385806 - Attachment is obsolete: true
Attachment #8385806 - Flags: review?(gene.lian)
Attachment #8386621 - Flags: review?(gene.lian)
Fix nits.
Attachment #8386616 - Attachment is obsolete: true
Attachment #8386616 - Flags: review?(gene.lian)
Attachment #8386616 - Flags: review?(acperez)
Attachment #8386707 - Flags: review?(gene.lian)
Attachment #8386707 - Flags: review?(acperez)
Test cases for v1.3
Attachment #8386709 - Flags: review?(gene.lian)
Attachment #8386709 - Flags: review?(acperez)
I don't really have time to review this until next week because I'm joining the work week with partners. Having Albert's review is enough to land if we're rushing to fix this for v1.3.

Again, please remember to get Fabrice's approval before uplifting.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: Caused JS Error sometimes when launch CostControl App
Testing completed: yes
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch: none
Attachment #8386621 - Attachment is obsolete: true
Attachment #8386621 - Flags: review?(gene.lian)
Attachment #8387347 - Flags: review+
Flags: needinfo?(fabrice)
Comment on attachment 8386610 [details] [diff] [review]
Bug 964228 - Part 1: Fix race condition in NetworkStatsService.jsm. r=gene, albert

Review of attachment 8386610 [details] [diff] [review]:
-----------------------------------------------------------------

r=albert.
Attachment #8386610 - Flags: review?(gene.lian) → review+
Comment on attachment 8386707 [details] [diff] [review]
Bug 964228 - Part 2: Tests update. r=gene, albert

Review of attachment 8386707 [details] [diff] [review]:
-----------------------------------------------------------------

Since Gene is not available for review, removing the flag.
Attachment #8386707 - Flags: review?(gene.lian)
Comment on attachment 8386709 [details] [diff] [review]
(v1.3) Bug 964228 - Part 2: Tests update. r=gene, albert

Review of attachment 8386709 [details] [diff] [review]:
-----------------------------------------------------------------

Since Gene is not available for review, removing the flag.
Attachment #8386709 - Flags: review?(gene.lian)
John, just ask for approval when it's ready, I track these.
Flags: needinfo?(fabrice)
Comment on attachment 8386707 [details] [diff] [review]
Bug 964228 - Part 2: Tests update. r=gene, albert

Review of attachment 8386707 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine
Attachment #8386707 - Flags: review?(acperez) → review+
Comment on attachment 8386709 [details] [diff] [review]
(v1.3) Bug 964228 - Part 2: Tests update. r=gene, albert

Review of attachment 8386709 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks John!
Attachment #8386709 - Flags: review?(acperez) → review+
Component: General → Networking
Product: Firefox OS → Core
Fabrice,

I think it is ready for landing now, ask for approval.
Flags: needinfo?(fabrice)
Hi Joe,

As I mentioned in Comment 21, this bug should also be uplifted to 1.3T. Can you take a look and make a confirmation?
Flags: needinfo?(jcheng)
John, you need to ask for approval in the patch details page, and fill in the approval questionnaire.
Flags: needinfo?(fabrice)
John, 1.4 also need the patch.
since its 1.3+, it should also get uplifted to 1.3T
Flags: needinfo?(jcheng)
Comment on attachment 8387347 [details] [diff] [review]
(v1.3) Bug 964228 - Part 1: Fix race condition in NetworkStatsService.jsm. r=albert a=fabrice

Review of attachment 8387347 [details] [diff] [review]:
-----------------------------------------------------------------

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 
User impact if declined: Caused JS Error sometimes when launch CostControl App
Testing completed: yes
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch: none
Attachment #8387347 - Flags: approval-mozilla-b2g28?
(In reply to Albert [:albert] from comment #46)
> John, 1.4 also need the patch.

The patch for 1.4 is already provided, so I'm guessing you are saying 1.3t, right ? ;)
I'll provide the patch later.
rebase for v1.3t
Attachment #8387347 - Flags: approval-mozilla-b2g28? → approval-mozilla-b2g28+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/861f6c26ef3a
https://hg.mozilla.org/mozilla-central/rev/49ec48262f79
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla30
Goes without saying that this made for some good times when merging b2g28 to v1.3t. Please double-check the files patched in this bug on the 1.3t branch to make sure everything looks sane. Basically what I did was revert them to their pre-merge versions then manually apply the 1.3t patches attached to this bug, and committed the results as part of the merge cset. Everything went smoothly like that, but it would still be nice to get confirmation :)
Flags: needinfo?(jshih)
Thanks Ryan! I've verified the newest 1.3t build locally, and it works fine (both running device and test cases)
Flags: needinfo?(jshih)
Thank *you* for posting the 1.3t patches. I would have been completely lost without them :)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: