Closed Bug 1092941 Opened 10 years ago Closed 10 years ago

mozSettings leaves orphan lock when running mnw tests.

Categories

(Core :: DOM: Device Interfaces, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla36
blocking-b2g 2.1+
Tracking Status
firefox34 --- wontfix
firefox35 --- wontfix
firefox36 --- fixed
b2g-v2.1 --- fixed
b2g-v2.2 --- fixed

People

(Reporter: jessica, Assigned: edgar)

References

Details

(Keywords: regression)

Attachments

(1 file, 2 obsolete files)

After bug 1082001, marionette-webapi tests time-out when getting settings using mozSettings. We have enabled settings log, and it seems that some locks are not finalized on OOP.

https://tbpl.mozilla.org/php/getParsedLog.php?id=51658340&tree=Try&full=1

You may not detect the failure cause mnw is currently hidden in treeherder.mozilla.org. :(
Alexandre, Kyle, do you have any idea about this? Thanks.
Flags: needinfo?(lissyx+mozillians)
Flags: needinfo?(kyle)
Attached patch Patch, v1 (obsolete) — Splinter Review
Comment on attachment 8515881 [details] [diff] [review]
Patch, v1

Review of attachment 8515881 [details] [diff] [review]:
-----------------------------------------------------------------

Before starting b2g marionette-webapi tests, marionette framework will unload apps first (e.g. FTU).
And this leaves orphan lock, because SettingsRequestManager.jsm doesn't handle locks well when receiving `child-process-shutdown`.
1). `child-process-shutdown` doesn't come with principal [1], checking |lock._mm| seems enough.
2). The finalize task pushed in |enqueueForceFinalize| should be in running situation, just like how we handle |Settings:Finalize|.

Hi Ben, may I have your feedback about this patch? Thank you.

[1] http://dxr.mozilla.org/mozilla-central/source/dom/ipc/ContentParent.cpp#1506-1509
Attachment #8515881 - Flags: feedback?(bent.mozilla)
(In reply to Jessica Jong [:jjong] [:jessica] from comment #1)
> Alexandre, Kyle, do you have any idea about this? Thanks.

We did detect the failure but: I was not able to reproduce this locally. Only on B2G Emulator on TBPL. :mdas was working on getting us more logging capabilities to be able to investigate why this was failing.
Flags: needinfo?(mdas)
Flags: needinfo?(lissyx+mozillians)
Flags: needinfo?(kyle)
Edgar, the code you removed is exactly code I added after ben reviewed :).
Do you mind providing a try run with multiple Mnw retrigger to show this fixes the issue ?
Confere comment 5.
Flags: needinfo?(echen)
Here is the try with the setting fix: https://tbpl.mozilla.org/?tree=Try&rev=98b8de915433
((with setting debug logging enabled))

But we encounter other telephony test failure :(, we are working on that now.
Will provide a try result again after we fixe them, keeps ni? to me for tracking.
(In reply to Edgar Chen [:edgar][:echen] from comment #7)
> Here is the try with the setting fix:
> https://tbpl.mozilla.org/?tree=Try&rev=98b8de915433
> ((with setting debug logging enabled))
> 
> But we encounter other telephony test failure :(, we are working on that now.
> Will provide a try result again after we fixe them, keeps ni? to me for
> tracking.

I took the liberty to do some retrigger. It seems to effectively improve things on Mnw.
I'm very sorry for the mess we created :(
Comment on attachment 8515881 [details] [diff] [review]
Patch, v1

Review of attachment 8515881 [details] [diff] [review]:
-----------------------------------------------------------------

I think Alexandre can manage this, otherwise feel free to kick it back to me.
Attachment #8515881 - Flags: feedback?(bent.mozilla) → feedback?(lissyx+mozillians)
Comment on attachment 8515881 [details] [diff] [review]
Patch, v1

Review of attachment 8515881 [details] [diff] [review]:
-----------------------------------------------------------------

That looks quite sane. And that will probably fix the Mnw bustage I landed :(
Make sure you also don't break the xpcshell test we wrote, in dom/settings/test/unit/, since it explicitely does a call to Settings:Run.

::: dom/settings/SettingsRequestManager.jsm
@@ +778,4 @@
>          }
>        );
> +      // Finalize is considered a task running situation, but it also needs to
> +      // queue a task.

Makes sense, but I'm a bit frightened, since we have a simiar logic running for the handling of messages. The switch() case handling Settings:Finalize message will do the same kind of forcing a Settings:Run operation, so I think we should share code.
Attachment #8515881 - Flags: feedback?(lissyx+mozillians) → feedback+
Thanks for the feedback, Alexandre.
I am working on formal patch, will upload it and request review soon.
Assignee: nobody → echen
Attached patch Patch, v2 (obsolete) — Splinter Review
Address comment #10:
- Share the code of forcing running operation.

Try result is on the way.

Hi Alexandre, may I have your review? Thank you.
Attachment #8515881 - Attachment is obsolete: true
Flags: needinfo?(echen)
Attachment #8517375 - Flags: review?(lissyx+mozillians)
Comment on attachment 8517375 [details] [diff] [review]
Patch, v2

Review of attachment 8517375 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me. I'll make sure to test on mulet oop and non oop and on device before r+ this, so keeping the r?, but I'll wait for the try run to do this.

::: dom/settings/SettingsRequestManager.jsm
@@ +800,5 @@
>          }
>        );
> +      // Finalize is considered a task running situation, but it also needs to
> +      // queue a task.
> +      this.startRunning(lock.lockID);

Shouldn't we run this in the success callback of the queueTask() call?
(In reply to Alexandre LISSY :gerard-majax from comment #13)
> Comment on attachment 8517375 [details] [diff] [review]
> Patch, v2
> 
> Review of attachment 8517375 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: dom/settings/SettingsRequestManager.jsm
> @@ +800,5 @@
> >          }
> >        );
> > +      // Finalize is considered a task running situation, but it also needs to
> > +      // queue a task.
> > +      this.startRunning(lock.lockID);
> 
> Shouldn't we run this in the success callback of the queueTask() call?

No, we can not do it in the success callback.
The success callback of queueTask() is for the task is proccessed successfully, not for pushed into task queue successfully.
Blocks: 1093121
Right. I still have to check on device, sorry for the latency, got caught up in stuff :(
Comment on attachment 8517375 [details] [diff] [review]
Patch, v2

Review of attachment 8517375 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/settings/SettingsRequestManager.jsm
@@ +401,5 @@
> +  startRunning: function(aLockID) {
> +    let lock = this.lockInfo[aLockID];
> +
> +    if (!lock) {
> +      if (DEBUG) debug("Lock no longer alive, cannot run tasks");

Please make sure we don't collide debug messages. This one does collide with some in runTasks().
This will make reading logs easier in the future.

@@ +409,5 @@
> +    lock.consumable = true;
> +    if (aLockID == this.settingsLockQueue[0] || this.settingsLockQueue.length == 0) {
> +      // If a lock is currently at the head of the queue, run all tasks for
> +      // it.
> +      if (DEBUG) debug("Running tasks for " + aLockID);

Same.
Comment on attachment 8517375 [details] [diff] [review]
Patch, v2

Review of attachment 8517375 [details] [diff] [review]:
-----------------------------------------------------------------

That seems to be okay on device and on Mulet. Just fix the remaining nits regarding the debug messages :)
This try: https://tbpl.mozilla.org/?tree=Try&rev=d1d08f9640e8 is already mostly green.
I retriggered a couple of Mnw, just to be sure.
Attachment #8517375 - Flags: review?(lissyx+mozillians) → review+
Blocks: 1082001
Address review comment #17,
- Only modify the debug message which collides with some in runTasks().
Attachment #8517375 - Attachment is obsolete: true
Attachment #8518180 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/832e23a206cc
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla36
Please request b2g34 approval on this patch so bug 1076597 can re-land.
Blocks: 1076597
blocking-b2g: --- → 2.1?
Flags: needinfo?(echen)
Hi Alexandre, this bug also depends on bug 1082001, do you think we should uplift bug 1082001 as well?
Flags: needinfo?(lissyx+mozillians)
Ryan, Edgar, I agree that we should uplift: bug 1082001, this one and we could reland bug 1076597. But I think it would be good that we want a bit that those patch do live a bit more on master.
Flags: needinfo?(ryanvm)
Flags: needinfo?(lissyx+mozillians)
Flags: needinfo?(echen)
Flags: needinfo?(ryanvm)
Comment on attachment 8518180 [details] [diff] [review]
Patch, v3, r=lissyx

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 1082001
User impact if declined: Marionette WebAPI tests failing to run
Testing completed: made Mnw green again, fixes bug 1093121 where keyboard never shows
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch: none
Attachment #8518180 - Flags: approval-mozilla-b2g34?
blocking-b2g: 2.1? → 2.1+
Attachment #8518180 - Flags: approval-mozilla-b2g34? → approval-mozilla-b2g34+
It looks like the settings app is broken in v2.1, with the same symptomes I had in bug  1091542 that were fixed with this bug, with a build from yesterday.

Any clue?
Flags: needinfo?(arthur.chen)
Looks like this happen after the phone is running for a long time. Nical (Nicolas Silva) showed me on his phone flashed from yesterday.

It seems it's also breaking other apps too.
(In reply to Julien Wajsberg [:julienw] from comment #31)
> Looks like this happen after the phone is running for a long time. Nical
> (Nicolas Silva) showed me on his phone flashed from yesterday.
> 
> It seems it's also breaking other apps too.

If it also affects other apps, it seems like an issue of the settings manager in gecko. But I haven't encountered this on v2.1 yet.
Flags: needinfo?(arthur.chen)
Nical, can you please file a bug for what you're seeing?
Flags: needinfo?(nical.bugzilla)
Bug 1105511 seems to be this bug.
Flags: needinfo?(nical.bugzilla)
Confirmed :julienw's comment. We're pulling null principals and not checking their validity when checking permissions on them, which causes us to throw silently and lock up. This is causing bug 1105511, bug 1093932, as most likely many others.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: