Race condition in settings api during app shutdown

RESOLVED FIXED in Firefox 35, Firefox OS v2.1

Status

()

Core
DOM: Device Interfaces
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: qdot, Assigned: gerard)

Tracking

25 Branch
2.1 S6 (10oct)
x86_64
Linux
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(blocking-b2g:2.1+, firefox33 wontfix, firefox34 wontfix, firefox35 fixed, b2g-v2.0 unaffected, b2g-v2.0M unaffected, b2g-v2.1 fixed, b2g-v2.2 fixed)

Details

(Whiteboard: [systemsfe])

Attachments

(2 attachments, 15 obsolete attachments)

9.78 KB, patch
gerard
: review+
Details | Diff | Splinter Review
46.75 KB, text/plain
Details
When an app shuts down, the SettingsManager will try to close all of its locks during dom-window-destroyed, by sending the Finalize message for each lock to the SettingsRequestManager. However, at the time of child message manager death, the SettingsRequestManager receives a child-process-shutdown event through the observer, and calls removeMessageManager, which removes all tracked locks and tasks for that message manager. This means that if we have things that shut down quickly while also running settings calls (like, say, the callscreen in bug 1070066), settings calls can be made that will never actually commit to the database, finish, and be broadcast to observers.

The fix here is an interesting question. I /think/ the right solution is that instead of just killing locks/tasks in removeMessageManager, we somehow make sure that finalize is going to be called on the remaining locks for the message manager. The app itself won't get anything back, but we'll still let the rest of the system update.
Created attachment 8498652 [details] [diff] [review]
Patch 1 (v1) - WIP: Always finalize open locks on child process death

Here's my cleaned up WIP. Honestly, this could probably land this way.

This changes what happens when we remove a message manager due to child process death. It used to be that we would just clear out all of the tasks, and transactions would die unfinished. Now, we make sure that all locks have a finalize task. If they don't, we queue one for them. This means that all locks for the dead message manager will still be in the queue, but should finish their transaction, which means observers will be called.

Note that I have NOT run mochis on this yet, just ran it on the phone, and it seems ok there. Doing a desktop build now, will post mochi results when that's done.
Attachment #8498652 - Flags: feedback?(lissyx+mozillians)
(Assignee)

Comment 3

4 years ago
Comment on attachment 8498652 [details] [diff] [review]
Patch 1 (v1) - WIP: Always finalize open locks on child process death

Review of attachment 8498652 [details] [diff] [review]:
-----------------------------------------------------------------

I have been looking at this since yesterday, and it all makes a lot of sense. I have been testing this on my Xperia ZR and now on my Flame. As far as I can say, for the ZR at least, I could not reproduce anymore most of the settings-related bugs once applied.

I would like, however, to be able to write some tests to make this nicer.
Attachment #8498652 - Flags: feedback?(lissyx+mozillians) → feedback+
(Assignee)

Comment 4

4 years ago
[Blocking Requested - why for this release]: because triggered by STRs of bug 1070066 which is 2.1+
Assignee: nobody → lissyx+mozillians
blocking-b2g: --- → 2.1?
Whiteboard: [systemsfe]
(Assignee)

Comment 5

4 years ago
Created attachment 8500406 [details] [diff] [review]
Settings API messages unit testing
(Assignee)

Comment 6

4 years ago
Created attachment 8500408 [details] [diff] [review]
Fix Settings API shutdown race condition
Blocks: 1074379
Blocks: 1070066
(Assignee)

Comment 7

4 years ago
Created attachment 8500480 [details] [diff] [review]
Settings API messages unit testing
(Assignee)

Comment 8

4 years ago
Created attachment 8500481 [details] [diff] [review]
Fix Settings API shutdown race condition
(Assignee)

Updated

4 years ago
Attachment #8500406 - Attachment is obsolete: true
(Assignee)

Updated

4 years ago
Attachment #8500408 - Attachment is obsolete: true
(Assignee)

Comment 9

4 years ago
There is few changes on the fix itself, mostly cosmetic. Tests are getting into shape, and now all bits should be in place to implement the core logic exposing the race condition there.
Regression
blocking-b2g: 2.1? → 2.1+
No longer blocks: 1070066
See Also: → bug 1070066
(Assignee)

Comment 12

4 years ago
(In reply to Alexandre LISSY :gerard-majax from comment #11)
> Try with tests and fix:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=a9bf91bc0468
> Try with only tests:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=1a879a93cd4f

And those are invalid because one need to add the xpcshell.ini to http://dxr.mozilla.org/mozilla-central/source/testing/xpcshell/xpcshell_b2g.ini :(
(Assignee)

Updated

4 years ago
Attachment #8498652 - Attachment is obsolete: true
(Assignee)

Updated

4 years ago
Attachment #8500480 - Attachment is obsolete: true
(Assignee)

Updated

4 years ago
Attachment #8500481 - Attachment is obsolete: true
(Assignee)

Comment 14

4 years ago
Created attachment 8501037 [details] [diff] [review]
Settings API messages unit testing
(Assignee)

Comment 15

4 years ago
Created attachment 8501038 [details] [diff] [review]
Fix Settings API shutdown race condition
(Assignee)

Comment 16

4 years ago
With those two patches, we expect to have:
 - try green with the fix
 - try red (timeout) without
(Assignee)

Comment 17

4 years ago
Created attachment 8501072 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

When child message manager dies, it sends a child-process-shutdown
message to the SettingsRequestManager. This would just close the locks
and tasks of this message manager. The race happens when some
applications can shutdown quickly: settings requests will never be
committed to the database. One example is the callscreen. The fix,
provided by qDot, simply put those tasks in a finalize state to make
sure they are properly executed and committed.
(Assignee)

Comment 18

4 years ago
Comment on attachment 8501072 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

Ben, according to git log you have been reviewing a lot of patches on dom/settings. Since qDot is away, I'm asking you for review on this. This patch is the same as the two others one, it's just a squashed as one commit and with debug disabled.

If someone else is more suited to review this, feel free to transfer :)
Attachment #8501072 - Flags: review?(bent.mozilla)
Target Milestone: --- → 2.1 S6 (10oct)
(Assignee)

Comment 19

4 years ago
Created attachment 8501192 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

When child message manager dies, it sends a child-process-shutdown
message to the SettingsRequestManager. This would just close the locks
and tasks of this message manager. The race happens when some
applications can shutdown quickly: settings requests will never be
committed to the database. One example is the callscreen. The fix,
provided by qDot, simply put those tasks in a finalize state to make
sure they are properly executed and committed.
(Assignee)

Updated

4 years ago
Attachment #8501072 - Attachment is obsolete: true
Attachment #8501072 - Flags: review?(bent.mozilla)
(Assignee)

Comment 20

4 years ago
Comment on attachment 8501192 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

Changed the name of the setting used in the test since the previous one exists and made the first test to fail (since it expects it does not exists)
Attachment #8501192 - Flags: review?(bent.mozilla)
Comment on attachment 8501192 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

Review of attachment 8501192 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/settings/SettingsRequestManager.jsm
@@ +751,5 @@
>      let closedLockIDs = [];
>      let lockIDs = Object.keys(this.lockInfo);
>      for (let i in lockIDs) {
> +      let lock = this.lockInfo[lockIDs[i]];
> +      if (lock._mm == aMsgMgr) {

Can you also verify that the principal is the same?

@@ +753,5 @@
>      for (let i in lockIDs) {
> +      let lock = this.lockInfo[lockIDs[i]];
> +      if (lock._mm == aMsgMgr) {
> +        let is_finalizing = false;
> +        for (let task_index in lock.tasks) {

Can you iterate backwards? Finalize should be at the end.
Attachment #8501192 - Flags: review?(bent.mozilla) → review+
(Assignee)

Comment 23

4 years ago
(In reply to Alexandre LISSY :gerard-majax from comment #21)
> Re-triggered try because of failure.
> Try with tests:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=2577906394be
> Try with tests and fix:
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=1d6aa022af4a

Tests worked as expected, but we have an issue on https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=1d6aa022af4a (which should be green): it's broken on emulator debug, with an IndexedDB error :(
(Assignee)

Comment 25

4 years ago
Created attachment 8501537 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

When child message manager dies, it sends a child-process-shutdown
message to the SettingsRequestManager. This would just close the locks
and tasks of this message manager. The race happens when some
applications can shutdown quickly: settings requests will never be
committed to the database. One example is the callscreen. The fix,
provided by qDot, simply put those tasks in a finalize state to make
sure they are properly executed and committed.
(Assignee)

Comment 26

4 years ago
Comment on attachment 8501537 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

Carrying r+ from attachment 8501192 [details] [diff] [review]
Adding principal check and backward iteration on tasks.
Attachment #8501537 - Flags: review+
(Assignee)

Updated

4 years ago
Attachment #8501192 - Attachment is obsolete: true
(Assignee)

Comment 27

4 years ago
The error seems to be 100% reproductible on b2g emulator debug. Looking at full log, I see:
> 00:41:00     INFO -  10-08 06:06:50.791 I/Gecko   ( 1567): -*- SettingsRequestManager: Making READWRITE transaction for 435d2192-4f21-48d4-90b7-285f147a56be
> 00:41:00     INFO -  10-08 06:06:50.841 I/Gecko   ( 1567): -*- SettingsRequestManager: Making get request for xpcshell.language.current
> 00:41:00     INFO -  10-08 06:06:50.981 I/Gecko   ( 1567): [1567] WARNING: 'NS_FAILED(rv)', file ../../../gecko/dom/indexedDB/ActorsParent.cpp, line 6582
> 00:41:00     INFO -  10-08 06:06:51.001 I/Gecko   ( 1567): [1567] WARNING: 'NS_FAILED(rv)', file ../../../gecko/dom/indexedDB/ActorsParent.cpp, line 12877
> 00:41:00     INFO -  10-08 06:06:51.011 I/Gecko   ( 1567): [1567] WARNING: Converting non-IndexedDB error code (0x80630001) to NS_ERROR_DOM_INDEXEDDB_UNKNOWN_ERR: file ../../../gecko/dom/indexedDB/ActorsParent.cpp, line 408
> 00:41:00  WARNING -  10-08 06:06:51.021 W/GeckoConsole( 1567): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:414"]
> 00:41:00     INFO -  10-08 06:06:51.101 I/Gecko   ( 1567): -*- SettingsRequestManager: Transaction for lock 435d2192-4f21-48d4-90b7-285f147a56be aborted
> 00:41:00     INFO -  10-08 06:06:51.141 I/Gecko   ( 1567): -*- SettingsRequestManager: getRequest FAILED xpcshell.language.current

Doing a local run against b2g desktop in debug mode, I don't reproduce :(
(Assignee)

Comment 28

4 years ago
Created attachment 8501613 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

When child message manager dies, it sends a child-process-shutdown
message to the SettingsRequestManager. This would just close the locks
and tasks of this message manager. The race happens when some
applications can shutdown quickly: settings requests will never be
committed to the database. One example is the callscreen. The fix,
provided by qDot, simply put those tasks in a finalize state to make
sure they are properly executed and committed.
(Assignee)

Comment 29

4 years ago
Comment on attachment 8501613 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

Carrying r+ from attachment 8501192 [details] [diff] [review]
Fixing some js warning/errors triggered in debug build. Also changed the Settings:Get, hoping this will make emulator debug happy.
Try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=98ad1b0fb7e5
Attachment #8501613 - Flags: review+
(Assignee)

Comment 30

4 years ago
(In reply to Alexandre LISSY :gerard-majax from comment #24)
> New set of try with the addressed review comments:
>  https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=0afae351c3de
>  https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=49b4f89be8b5

Apart from the emulator-debug failure still present, those try are ok: red without the fix, green with.
(Assignee)

Comment 31

4 years ago
Created attachment 8501718 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

When child message manager dies, it sends a child-process-shutdown
message to the SettingsRequestManager. This would just close the locks
and tasks of this message manager. The race happens when some
applications can shutdown quickly: settings requests will never be
committed to the database. One example is the callscreen. The fix,
provided by qDot, simply put those tasks in a finalize state to make
sure they are properly executed and committed.
(Assignee)

Comment 32

4 years ago
(In reply to Alexandre LISSY :gerard-majax from comment #31)
> Created attachment 8501718 [details] [diff] [review]
> Fix Settings API shutdown race condition r=bent
> 
> When child message manager dies, it sends a child-process-shutdown
> message to the SettingsRequestManager. This would just close the locks
> and tasks of this message manager. The race happens when some
> applications can shutdown quickly: settings requests will never be
> committed to the database. One example is the callscreen. The fix,
> provided by qDot, simply put those tasks in a finalize state to make
> sure they are properly executed and committed.

Changed the lock id to use "{uuid}" instead of "uuid".

New try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=58f9248a4c99

Locally, this is passing in my emulator debug :(
(Assignee)

Comment 33

4 years ago
Comment on attachment 8501718 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

Carrying r+
Attachment #8501718 - Flags: review+
(Assignee)

Updated

4 years ago
Attachment #8501537 - Attachment is obsolete: true
(Assignee)

Updated

4 years ago
Attachment #8501613 - Attachment is obsolete: true
(Assignee)

Updated

4 years ago
Attachment #8501718 - Attachment is obsolete: true
(Assignee)

Comment 34

4 years ago
Try at: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=98ad1b0fb7e5 is green for opt, with some known unrelated intermittents.
(Assignee)

Comment 35

4 years ago
Try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=58f9248a4c99
This one just fixes the lock id, adding curly bracsed around.
(Assignee)

Comment 36

4 years ago
Try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c12c7577df9c
This one just removes debug, in case it could explain the failures
(Assignee)

Comment 37

4 years ago
Try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=839f49403404
This one blacklist the added test for gonk debug, to avoid red on b2g emulator debug builds
(Assignee)

Comment 38

4 years ago
(In reply to Alexandre LISSY :gerard-majax from comment #27)
> The error seems to be 100% reproductible on b2g emulator debug. Looking at
> full log, I see:
> > 00:41:00     INFO -  10-08 06:06:50.791 I/Gecko   ( 1567): -*- SettingsRequestManager: Making READWRITE transaction for 435d2192-4f21-48d4-90b7-285f147a56be
> > 00:41:00     INFO -  10-08 06:06:50.841 I/Gecko   ( 1567): -*- SettingsRequestManager: Making get request for xpcshell.language.current
> > 00:41:00     INFO -  10-08 06:06:50.981 I/Gecko   ( 1567): [1567] WARNING: 'NS_FAILED(rv)', file ../../../gecko/dom/indexedDB/ActorsParent.cpp, line 6582
> > 00:41:00     INFO -  10-08 06:06:51.001 I/Gecko   ( 1567): [1567] WARNING: 'NS_FAILED(rv)', file ../../../gecko/dom/indexedDB/ActorsParent.cpp, line 12877
> > 00:41:00     INFO -  10-08 06:06:51.011 I/Gecko   ( 1567): [1567] WARNING: Converting non-IndexedDB error code (0x80630001) to NS_ERROR_DOM_INDEXEDDB_UNKNOWN_ERR: file ../../../gecko/dom/indexedDB/ActorsParent.cpp, line 408
> > 00:41:00  WARNING -  10-08 06:06:51.021 W/GeckoConsole( 1567): [JavaScript Error: "IndexedDB UnknownErr: ActorsParent.cpp:414"]
> > 00:41:00     INFO -  10-08 06:06:51.101 I/Gecko   ( 1567): -*- SettingsRequestManager: Transaction for lock 435d2192-4f21-48d4-90b7-285f147a56be aborted
> > 00:41:00     INFO -  10-08 06:06:51.141 I/Gecko   ( 1567): -*- SettingsRequestManager: getRequest FAILED xpcshell.language.current
> 
> Doing a local run against b2g desktop in debug mode, I don't reproduce :(

Ben, I could not even reproduce locally on a debug b2g in emulator. See above, I've sent a couple of try with slight changes. My plan is to land the fix with the test running on b2g emulator opt, and file a followup for this issue.
Flags: needinfo?(bent.mozilla)
The error code here says that the database is locked by another process... I don't know how that could happen since we should only touch the file from the parent process.
Flags: needinfo?(bent.mozilla)
(Assignee)

Comment 40

4 years ago
(In reply to Alexandre LISSY :gerard-majax from comment #37)
> Try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=839f49403404
> This one blacklist the added test for gonk debug, to avoid red on b2g
> emulator debug builds
>
> Try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c12c7577df9c
> This one just removes debug, in case it could explain the failures

Both are green, so it would have been fixed by removing debug \o/

I retriggered 10 jobs for each xpcshell tests in each build, just to make sure. But if we can land this without the disabling for b2g emulator debug, it's a good news :)
(Assignee)

Comment 41

4 years ago
Try with skip: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=a92a8ee50547
Try without skip: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=5c80e298220a

From the second link, it seems that removing the SettingsRequestManager debug improved the situation but it is still failing often enough for this not to be reliable for sherrifs. So I think the safe way is to disable this test for b2g emulator debug builds.
(Assignee)

Updated

4 years ago
Attachment #8501037 - Attachment is obsolete: true
(Assignee)

Updated

4 years ago
Attachment #8501038 - Attachment is obsolete: true
(Assignee)

Comment 42

4 years ago
Created attachment 8502305 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

When child message manager dies, it sends a child-process-shutdown
message to the SettingsRequestManager. This would just close the locks
and tasks of this message manager. The race happens when some
applications can shutdown quickly: settings requests will never be
committed to the database. One example is the callscreen. The fix,
provided by qDot, simply put those tasks in a finalize state to make
sure they are properly executed and committed.
(Assignee)

Comment 43

4 years ago
Comment on attachment 8502305 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

Carrying r+ from :bent
Attachment #8502305 - Flags: review+
(Assignee)

Comment 44

4 years ago
Try is green for b2g emulator opt, after retriggers: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=a92a8ee50547

The only failure are known intermittents.
(Assignee)

Updated

4 years ago
Keywords: checkin-needed
(Assignee)

Comment 46

4 years ago
Created attachment 8502421 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

When child message manager dies, it sends a child-process-shutdown
message to the SettingsRequestManager. This would just close the locks
and tasks of this message manager. The race happens when some
applications can shutdown quickly: settings requests will never be
committed to the database. One example is the callscreen. The fix,
provided by qDot, simply put those tasks in a finalize state to make
sure they are properly executed and committed.
(Assignee)

Comment 47

4 years ago
Comment on attachment 8502305 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

obsolete by backout
Attachment #8502305 - Attachment is obsolete: true
(Assignee)

Updated

4 years ago
Attachment #8502421 - Attachment is obsolete: true
(Assignee)

Comment 48

4 years ago
Created attachment 8502423 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

When child message manager dies, it sends a child-process-shutdown
message to the SettingsRequestManager. This would just close the locks
and tasks of this message manager. The race happens when some
applications can shutdown quickly: settings requests will never be
committed to the database. One example is the callscreen. The fix,
provided by qDot, simply put those tasks in a finalize state to make
sure they are properly executed and committed.
(Assignee)

Comment 49

4 years ago
Comment on attachment 8502423 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

Carrying r+ from :bent

Disabling the test on non b2g builds and on emulator/debug.
Attachment #8502423 - Flags: review+
(Assignee)

Comment 50

4 years ago
Finger crossed, but this one should be green ...
Keywords: checkin-needed
had to backout again for https://treeherder.mozilla.org/ui/logviewer.html#?job_id=2886050&repo=mozilla-inbound

maybe we need to run try again ?
Keywords: checkin-needed
(Assignee)

Comment 52

4 years ago
Created attachment 8502447 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

When child message manager dies, it sends a child-process-shutdown
message to the SettingsRequestManager. This would just close the locks
and tasks of this message manager. The race happens when some
applications can shutdown quickly: settings requests will never be
committed to the database. One example is the callscreen. The fix,
provided by qDot, simply put those tasks in a finalize state to make
sure they are properly executed and committed.
(Assignee)

Updated

4 years ago
Attachment #8502423 - Attachment is obsolete: true
(Assignee)

Comment 53

4 years ago
Comment on attachment 8502447 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

Carrying r+, fixing typo in skip-if
Attachment #8502447 - Flags: review+
(Assignee)

Updated

4 years ago
Keywords: checkin-needed
Duplicate of this bug: 1074379
https://hg.mozilla.org/mozilla-central/rev/69866036d793
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
Please request b2g34 approval on this patch when you get a chance.
status-b2g-v2.2: affected → fixed
status-firefox33: --- → wontfix
status-firefox34: --- → wontfix
status-firefox35: --- → fixed
Flags: needinfo?(lissyx+mozillians)
(Assignee)

Comment 58

4 years ago
Sure.
Flags: needinfo?(lissyx+mozillians)
(Assignee)

Comment 59

4 years ago
Comment on attachment 8502447 [details] [diff] [review]
Fix Settings API shutdown race condition r=bent

NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Settings OOP
User impact if declined: improper api settings behavior on quickly shutting down applications
Testing completed: adding xpcshell tests to ensure the race is covered, merged in master since a couple of days, multiple QA (:jlorenzo) checking of the patch to assert it was fixing the depends
Risk to taking this patch (and alternatives if risky): low
String or UUID changes made by this patch: none
Attachment #8502447 - Flags: approval-mozilla-b2g34?

Updated

4 years ago
Attachment #8502447 - Flags: approval-mozilla-b2g34? → approval-mozilla-b2g34+
OK, so this land on b-i and around the same time, Mnw goes basically permafail (with errors that I swear I remember qDot hitting at one point too). Now it gets uplifted to b2g34 and also permafail. Coincidence? :)
https://treeherder.mozilla.org/ui/logviewer.html#?job_id=3660&repo=mozilla-b2g34_v2_1

I'm backing this out from b2g34 for now to see if Mnw indeed greens up. If so, I'm going to recommend doing the same on trunk as well.
https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/76b4a13e4fc7
status-b2g-v2.1: fixed → affected
Yep, backing it out got us back from permafail to our more usual just fails most of the time state. I'd recommend backing it out from m-c as well, but seeing as how Mnw isn't a sheriffed test suite at this point, I'll defer to the bug owners here to decide.
Well, it sort of fixed the perma-fail. I did indeed make the cascade of failures from comment 61 go away. However, test_dom_BluetoothManager_enabled.js is still perma-failing afterwards.
(Assignee)

Comment 64

4 years ago
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #63)
> Well, it sort of fixed the perma-fail. I did indeed make the cascade of
> failures from comment 61 go away. However,
> test_dom_BluetoothManager_enabled.js is still perma-failing afterwards.

So test_dom_BluetoothManager_enabled.js is perma-fail without my patch, how can we be sure that it's my patch that is causing this mess then?
(Assignee)

Comment 65

4 years ago
Mach does not work. This will be hard to work on:

alex@portable-alex:~/codaz/Mozilla/b2g/devices/Emulator/B2G$ ./mach help marionette-webapi
usage: mach [global arguments] marionette-webapi [command arguments]

Run a Marionette webapi test (test WebAPIs using marionette).

Global Arguments:
  -v, --verbose         Print verbose output.
  -l FILENAME, --log-file FILENAME
                        Filename to write log data to.
  --log-interval        Prefix log line with interval from last message rather
                        than relative time. Note that this is NOT execution
                        time if there are parallel operations.
  --log-no-times        Do not prefix log lines with times. By default, mach
                        will prefix each output line with the time since
                        command start.
  -h, --help            Show this help message.

Command Arguments:
  --type TESTTYPE  Test type, usually one of: browser, b2g, b2g-qemu.
  TESTS            Path to test(s) to run.
alex@portable-alex:~/codaz/Mozilla/b2g/devices/Emulator/B2G$ ./mach --log-file bluetooth.log --log-interval marionette-webapi dom/bluetooth/tests/marionette/test_dom_BluetoothManager_enabled.js
Usage: mach [options]

mach: error: no such option: --log-file
alex@portable-alex:~/codaz/Mozilla/b2g/devices/Emulator/B2G$ git pull 
Already up-to-date.
alex@portable-alex:~/codaz/Mozilla/b2g/devices/Emulator/B2G$ ./mach -l bluetooth.log --log-interval marionette-webapi dom/bluetooth/tests/marionette/test_dom_BluetoothManager_enabled.js
Usage: mach [options]

mach: error: no such option: -l
alex@portable-alex:~/codaz/Mozilla/b2g/devices/Emulator/B2G$ ./mach -l bluetooth.log marionette-webapi dom/bluetooth/tests/marionette/test_dom_BluetoothManager_enabled.js
Usage: mach [options]

mach: error: no such option: -l
(Assignee)

Updated

4 years ago
Depends on: 1083685
(Assignee)

Comment 66

4 years ago
My patch landed in:
 - https://tbpl.mozilla.org/?rev=c0a98d9b9a33 with green Mnw
 - https://tbpl.mozilla.org/?rev=95d1486223f7 with green Mnw
 - https://tbpl.mozilla.org/?rev=b91b22431613 is the first failure.
 - https://tbpl.mozilla.org/?rev=50b689feab5f is then back to green Mnw
 - https://tbpl.mozilla.org/?rev=097821fd89ed green Mnw
 - https://tbpl.mozilla.org/?rev=708b45d9b1b1 orange Mnw but not similar failure
Starting with https://tbpl.mozilla.org/?rev=999ad2f1e108 I do see a perma red Mnw matching pattern.
(Assignee)

Comment 67

4 years ago
I ran the marionette-webapi test locally a dozen of times against a B2G Emulator Debug build, could not reproduce any problem ...
(Assignee)

Comment 68

4 years ago
(In reply to Alexandre LISSY :gerard-majax from comment #67)
> I ran the marionette-webapi test locally a dozen of times against a B2G
> Emulator Debug build, could not reproduce any problem ...

No luck with opt build locally when running: |./mach marionette-webapi gecko/dom/bluetooth/tests/marionette/|
(Assignee)

Comment 69

4 years ago
Retriggered a set of try jobs for master (where it has not been reverted): https://tbpl.mozilla.org/?tree=Try&rev=66345e60b0e2
(Assignee)

Comment 70

4 years ago
(In reply to Alexandre LISSY :gerard-majax from comment #69)
> Retriggered a set of try jobs for master (where it has not been reverted):
> https://tbpl.mozilla.org/?tree=Try&rev=66345e60b0e2

And there are errors on master, but intermittent.
Flags: needinfo?(ryanvm)
Yes, Mnw is known to be incredibly flaky. Always has been. Not sure what specifically you're expecting from me about it.
Flags: needinfo?(ryanvm)
(Assignee)

Comment 72

4 years ago
Failed run: https://tbpl.mozilla.org/php/getParsedLog.php?id=50360599&tree=Try&full=1
Successful run: https://tbpl.mozilla.org/php/getParsedLog.php?id=50378525&tree=Try&full=1

In the later case, we can see that we have the logcat of the emulator, and all the debug that has been switched. We don't have any logcat at all in the first case, so I cannot even know if emulator properly started or not ...
We're seeing increasing failures on Mnw (Bug 1081529), and that busted build problem is becoming more and more frequent. The problem is we can't get the logcat or any other build artifact b/c they don't get uploaded if buildbot kills the mozharness script before it can upload the artifacts, so we have no visibility into the core problem. I'm requesting this feature Bug 1069641, but in the mean time, I'll make time next week to try to isolate the cause of the busted builds.
Alexandre just pointed out to me that the patch that landed on m-c was the wrong one. I'm not sure how that happened (qimportbz should have Just Worked here AFAICT), but I've gone ahead and backed out and re-landed at least.
https://hg.mozilla.org/integration/b2g-inbound/rev/0197ff8217ec
https://hg.mozilla.org/integration/b2g-inbound/rev/cf4d42ad0275
Created attachment 8513491 [details]
logcat

By running the test with some additional parameters in mozharness, I have the logcat, but I don't see anything suspicious, especially not with marionette (we would have seen marionette error logs or something suspicious with the xpcom interfaces it uses).

You can see line 280 corresponds with https://mxr.mozilla.org/mozilla-central/source/dom/bluetooth/tests/marionette/head.js#551, so it got that far in execution, but https://mxr.mozilla.org/mozilla-central/source/dom/bluetooth/tests/marionette/test_dom_BluetoothManager_enabled.js#30 is never logged, so I suspect that a script you've been using is now timing out in your test's head.js or something before it can get there.
interestingly, the subsequent tests are failing because a setup script in marionette's test harness is timing out... I wonder if the emulator is just incredibly slow.
(Assignee)

Comment 82

4 years ago
Also:
> 17:13:46     INFO -  10-29 00:12:42.145 W/(Preallocated app)(  271): [JavaScript Error: "TypeError: infos is null" {file: "chrome://global/content/BrowserElementChild.js" line: 57}]
> 17:13:46     INFO -  10-29 00:12:45.085 W/GeckoConsole(   44): [JavaScript Error: "TypeError: actor.target is undefined" {file: "resource://gre/modules/InterAppCommService.jsm" line: 810}]
(Assignee)

Comment 83

4 years ago
Do you think you could capture a logcat after enabling all debug in dom/settings ?

Just switching the |DEBUG| to true in all js/jsm files in this directory, maybe it could help :)
Flags: needinfo?(mdas)
Unfortunately, I can't yet. That logcat output was a test run of a mozharness script. You can't modify any other code when testing out mozharness changes, so I can't change the test/gecko/anything. In the meantime, I filed https://bugzilla.mozilla.org/show_bug.cgi?id=1090978 to get logcat output interleaved in the stdout. When that lands, you'll be able to see logcat output in MnW, so you can try anything out and see what happens.
Flags: needinfo?(mdas)
(Assignee)

Comment 85

4 years ago
Ok. Well the log already show us that bluetooth test starts and that after this, we properly get the permissions. The very next step here is to read bluetooth.enabled setting, so we cannot go further until we can trigger debugging of this code :)
(Assignee)

Comment 86

4 years ago
Landing of bug 1092941 fixed Mnw.
Depends on: 1092941
*fingers crossed*

https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/90f687bae552
status-b2g-v2.1: affected → fixed
You need to log in before you can comment on or make changes to this bug.