Closed Bug 1046127 Opened 7 years ago Closed 7 years ago

ADB connection lost during B2G startup

Categories

(Firefox OS Graveyard :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(firefox32 wontfix, firefox33 wontfix, firefox34 fixed, b2g-v1.4 verified, b2g-v2.0 verified, b2g-v2.1 verified, b2g-v2.2 verified)

VERIFIED FIXED
2.1 S2 (15aug)
Tracking Status
firefox32 --- wontfix
firefox33 --- wontfix
firefox34 --- fixed
b2g-v1.4 --- verified
b2g-v2.0 --- verified
b2g-v2.1 --- verified
b2g-v2.2 --- verified

People

(Reporter: jgraham, Assigned: dhylands)

References

Details

Attachments

(1 file, 1 obsolete file)

This is running on 1.4 so it isn't impossible that the behaviour has changed in more recent versions.
Sigh, bugzilla decided that I wanted to submit too early.

We are targeting 1.4 for the certification suite. During setup we install the marionette extension (assuming a user build), run b2g stop; b2g start, then wait for the adb connection to come back up and start interacting with the device via adb and/or marionette. However this doesn't work too well because the adb connection gets temporarily dropped at some point during the startup process. The last message in the logcat before this happens is:

I/GeckoDump(  285): XXX FIXME : Got a mozContentEvent: system-message-listener-ready

It's not entirely clear to me if the connection being dropped is a direct response to this event or something that happens to occur at roughly the same time.

The code in [1] seems somewhat suspicious as the comment there specifically mentions this situation, but I'm not sure if it's actually being called here or not.

[1] http://dxr.mozilla.org/mozilla-central/source/b2g/chrome/content/settings.js#329
Do you see this by running run.sh, or by calling some piece of the cert suite directly?
Ah nm, you mentioned yesterday you get it by doing:

marionette_extension --uninstall 1.4; adb reboot; adb wait-for-device; runcertsuite

..and I was able to reproduce it on the Flame as well.
This problem causes the cert suite to die between the cert tests and the wpt tests on 1.4, since adb dies shortly after the reboot that occurs between those two suites.
Yeah, the ability to start b2g and know when it is safe to interact with the device is critical for many aspects of the cert suite e.g. wpt itself will both reboot and stop/start b2g to install a testing profile, we reboot between subsuites, and so on.

For 1.4-as-is we could work around the problem by always waiting for adb to disconnect before we try to use the device. However that may not work well in future versions of Firefox OS.

In any case I think having a solution here is a blocker for the certsuite.
Unfortunately, I don't think we can block on a fix for this for 1.4, since even if this is fixed, it's quite likely that OEM's won't be using a version of B2G which contains the fix.
I mean it's a blocker to have some reliable way of avoiding this problem by knowing when it's safe to use adb, not to have an upstream fix. For 1.4 that could be be "wait for adb to disconnect every time the device is restarted" method, which would be ugly.

Alternatively there might be something more futureproof we could do like poll for some state (e.g. a file being created / modified) that we know indicates that the device is in a usable state. That polling itself could be insulated against errors if adb disconnects.
Do we already wait for the homescreen when we reboot the device?  If not, that might be a reliable way to solve this, assuming we make the method that waits for the homescreen tolerant of this issue.
We might be able to solve this by waiting for the timestamp of prefs.js (in the profile) to be updated, after rebooting.  Local experiments show this file isn't updated until after the adb interruption.
The adb connection will be dropped any time that there are changes to the

sys.usb.config property.

The AdbController code (http://dxr.mozilla.org/mozilla-central/source/b2g/chrome/content/settings.js#200) runs when b2g starts and it tries not to change sys.usb.config if it doesn't have to.

It would be worthwhile to modify settings.js and set AdbController.DEBUG to true (line 201), and then see what's printed when the connection is dropped.
This problem isn't observed on the Flame with the v123 base image (which contains 1.3).  Shallow-flashing 1.4 on top of this causes the problem to appear.  I'm doing a local build now (of 1.4) to see if the problem persists.
(In reply to Dave Hylands [:dhylands] from comment #10)
> The adb connection will be dropped any time that there are changes to the
> 
> sys.usb.config property.
> 
> The AdbController code
> (http://dxr.mozilla.org/mozilla-central/source/b2g/chrome/content/settings.
> js#200) runs when b2g starts and it tries not to change sys.usb.config if it
> doesn't have to.
> 
> It would be worthwhile to modify settings.js and set AdbController.DEBUG to
> true (line 201), and then see what's printed when the connection is dropped.

Making this single change on a 1.4 user build causes the device to fail to boot.  Trying again with an eng build, but I'm not sure this problem will reproduce there anyway.
(In reply to Jonathan Griffin (:jgriffin) from comment #12)
> (In reply to Dave Hylands [:dhylands] from comment #10)
> > The adb connection will be dropped any time that there are changes to the
> > 
> > sys.usb.config property.
> > 
> > The AdbController code
> > (http://dxr.mozilla.org/mozilla-central/source/b2g/chrome/content/settings.
> > js#200) runs when b2g starts and it tries not to change sys.usb.config if it
> > doesn't have to.
> > 
> > It would be worthwhile to modify settings.js and set AdbController.DEBUG to
> > true (line 201), and then see what's printed when the connection is dropped.
> 
> Making this single change on a 1.4 user build causes the device to fail to
> boot.  Trying again with an eng build, but I'm not sure this problem will
> reproduce there anyway.

This also causes the device not to boot with an eng build.  :(
Sigh - I seem to remember that there were some bugs in the code when DEBUG was set to 1 (I ran into this on master). 

Let me see if I can a 1.4 build up and running and see what other changes are required.
Hmm. I built a 1.4 image and setting AdbController.DEBUG to true seems to be working fine.

Now I noticed that on the boot after doing my full-flash that my adb connection was dropped (It might have been around the time that the screen blanked, but I wasn't paying enough attention).

I was able to reconnect afterwards and here is the logcat:

> I   AdbController: setLockscreenState = true
> I   AdbController: Checking availability of storage: 'sdcard
> I   AdbController: Storage: 'sdcard' is 'available
> I   AdbController: Checking availability of storage: 'sdcard1
> I   AdbController: Storage: 'sdcard1' is 'available
> I   AdbController: updateStateInternal: called
> I   AdbController: isDebugging=false
> I   AdbController: updateState: enableAdb = true remoteDebuggerEnabled = true lockEnabled = true locked = true usbFuncActive = false
> I   *** UTM:SVC TimerManager:notify - notified @mozilla.org/b2g/webapps-update-timer;1
> E   UTM:SVC TimerManager:notify - notified @mozilla.org/b2g/webapps-update-timer;1

This is when the adb connection was dropped. What follows is additional logcat that was produced after I reconnected.

> E   BPUnw: [8 total] thread_unregister_for_profiling(me=0x741578)
> E   BPUnw: [9 total] thread_register_for_profiling(me=0x742a10, stacktop=0xb13ffdcf)
> E   BPUnw: [10 total] thread_register_for_profiling(me=0x741578, stacktop=0xb11ffdcf)
> I   UpdateState: umsAvail:1 umsEnabled:1 mode:0 usbCablePluggedIn:0 tryToShare:0
> I   UpdateState: Volume sdcard is Mounted and inserted @ /storage/sdcard gen 1 locked 0 sharing en-n
> I   UpdateState: Volume sdcard1 is Mounted and inserted @ /storage/sdcard1 gen 2 locked 0 sharing dis
> I   UpdateState: umsAvail:1 umsEnabled:1 mode:0 usbCablePluggedIn:1 tryToShare:0
> I   UpdateState: Volume sdcard is Mounted and inserted @ /storage/sdcard gen 1 locked 0 sharing en-n
> I   UpdateState: Volume sdcard1 is Mounted and inserted @ /storage/sdcard1 gen 2 locked 0 sharing dis
> E   BPUnw: [9 total] thread_unregister_for_profiling(me=0x742a10)
> E   BPUnw: [8 total] thread_unregister_for_profiling(me=0x741578)
> I   *** UTM:SVC TimerManager:notify - notified timerID: user-agent-updates-timer
> E   UTM:SVC TimerManager:notify - notified timerID: user-agent-updates-timer
> I   *** UTM:SVC TimerManager:registerTimer - id: user-agent-updates-timer
> E   UTM:SVC TimerManager:registerTimer - id: user-agent-updates-timer

What's significant is this line:

> UpdateState: umsAvail:1 umsEnabled:1 mode:0 usbCablePluggedIn:0 tryToShare:0

The usbCablePluggedIn status is determined by watching /sys/devices/virtual/android_usb/android0/state which is essentially watching the configured status of the usb connection.

One of the reasons for the usb connection to become unconfigured is because sys.usb.config is written to. I don't see any evidence that AdbController is the one doing the writing.

I'm going to put some debug right in bionic's function that does the property set and see if that uncovers anything.

I haven't figured out exactly what triggers the disconnect. Just rebooting or reflashing doesn't seem to cause the problem. I'll see if it happens again if I reflash the base image and then flash over top.
Looking a little closer at the log (with timestamps):

> 20:01:19.526 311/311                 GeckoDump  I   AdbController: updateState: enableAdb = true remoteDebuggerEnabled = true lockEnabled = true locked = true usbFuncActive = false
> 20:01:41.426 311/311                     Gecko  I   *** UTM:SVC TimerManager:notify - notified @mozilla.org/b2g/webapps-update-timer;1
> 20:01:41.426 311/311              GeckoConsole  E   UTM:SVC TimerManager:notify - notified @mozilla.org/b2g/webapps-update-timer;1
> 20:01:49.546 311/103                  Profiler  E   BPUnw: [8 total] thread_unregister_for_profiling(me=0x741578)
> ...drop occurred in here...
> 20:02:29.466 311/105                  Profiler  E   BPUnw: [9 total] thread_register_for_profiling(me=0x742a10, stacktop=0xb13ffdcf)
> 20:02:29.476 311/105                  Profiler  E   BPUnw: [10 total] thread_register_for_profiling(me=0x741578, stacktop=0xb11ffdcf)
> 20:02:29.526 311/510               AutoMounter  I   UpdateState: umsAvail:1 umsEnabled:1 mode:0 usbCablePluggedIn:0 tryToShare:0


it would seem that the screen lock occurred at 20:01:19.526 and the adb connection was dropped at 20:02:29.466 (I looked at my actual console and I was off by 1 line from what I reported in comment 15). The AutoMounter uses uevent's to notice that the usb configuration changed, so I'm pretty confident that the UpdateState message occurs very shortly (within 10's of milliseconds) after then usb configuration happens.

So the adb connection was dropped a minute and ten seconds after the screen was locked.

The next time I see this, I'll grab dmesg as well.
I haven't been able to reproduce the problem that I saw (of course - now that I have something to help debug)
easy steps to reproduce:

adb reboot; adb wait-for-device; while [ 0 == 0 ]; do adb shell ls -al /system/b2g/b2g; done

You'll see the 'adb shell' line succeed and then start to fail within a few seconds as the adb connection is interrupted.
I'd also like to clarify the exact steps needed to setup the phone.

- which base image?
- full flash or shallow flash?
- eng or userdebug or user build?
- any additional steps?
(In reply to Dave Hylands [:dhylands] from comment #19)
> I'd also like to clarify the exact steps needed to setup the phone.
> 
> - which base image?

v123 +fonts (the latest)

> - full flash or shallow flash?

shallow flash of B2G 1.4

> - eng or userdebug or user build?

user build only

> - any additional steps?

no other steps
ok - so what's happening is that the base image has:

ro.secure=0

This causes adb to start at bootup (before b2g starts).

Then, because this is a user build, when b2g starts it turns adb off until the screen is unlocked. If you disable the screen lock, then b2g will leave adb on (for 12 hours).
I can reproduce this problem with the screen.timeout set to 0 and the lockscreen disabled.  adb comes up, then briefly gets interrupted.
Hmm.

I'm not seeing that.

If I do:
adb reboot; adb wait-for-device; adb shell
or
adb reboot; adb wait-for-device; adb logcat

then my shell doesn't get dropped, nor does my logcat get interrupted.

I'm using master (is that what you're using?)

Ahh - I used the flash tool to shallow flash a 1.4 image over the 123 base and I can reproduce the problem.

I'll dig a bit deeper to see if I can figure out what's happening.
Oops - Spoke too soon. I forgot to disable screen lock. With screen lock disabled, I'm not seeing the problem.

Do you have a particular 1.4 build I can try?
I can repro this 100% of the time using this build shallow-flashed (gecko+gaia) onto the V123 base image:  https://pvtbuilds.mozilla.org/pvt/mozilla.org/b2gotoro/nightly/mozilla-b2g30_v1_4-flame/latest/

After flashing, I use the settings app to disable screen timeout and the lock screen, and to enable ADB access.

Then: adb reboot; adb wait-for-device; adb logcat

the logcat will start and then die, about the time the splash screen changes from the animated FirefoxOS logo to the static FirefoxOS screen.
OK - what tool are you using to flash?

I used the same 1.4 image (I used ./flash_pvt.sh from the B2G-flash-tool repository) and I'm not seeing the problem.

time wise - that sounds like about the time that b2g starts and then sets up the adb.

Here's the exact set of steps I did:
- run ./flash.sh from v123 directory
- Nexted through the FTU app, pressed Done and Skip
- Launch Settings
  - Enable Remote debugging
  - Disabled Screen Lock
- Ran ./flash_pvt.py -d flame -v mozilla-b2g30_v1_4 -b 20140811000206 --usr --gaia --gecko
- Nexted through the FTU app, pressed Done and Skip
- Launch Settings app
  - Disable Screen Lock
  - Enable Developer Menu
  - Set Remote debugging to ADB only
- Ran adb reboot; adb wait-for-device; adb logcat

And I see the problem (and it now seems to reproduce every time I reboot). Now that it seems reproducable (I'm still baffled why I couldn't reproduce it earlier). I'll see if I can figure out what's going on.
Progress. I pulled the omni.ja file, unpacked it, turned on Debug in the AdbController and pushed it, and I'm able to get more debug information.

I've identified the problem. We used to use the preference devtools.debugger.remote-enabled to control both remote debugging and adb.

Somewhere along the way we added support for the dropdown with 3 choices, and the preference was changed to: debugger.remote-mode

For whatever reason, we're getting triggered on both of these variables, and devtools.debugger.remote-enabled is set to false, which winds up causing the adb connection to drop.

When debugger.remote-mode gets called with 'adb-only' then it re-enables adb.

The simple solution seems to be to remove the call to   AdbController.setRemoteDebuggerState(value);
 from devtools.debugger.remote-enabled.

In order to do this, we need to be sure that when we upgrade from 1.3 to 1.4 that the devtools.debugger.remote-enabled gets properly translated into debugger.remote-mode or say that we don't care (the worst case behaviour would be that if a user has adb enabled in 1.3, then it might not be enabled after updating to 1.4, but they could still go into the settings app and turn it back on.
I tested this patch on my phone that was reproducing the problem by doing these steps:

> adb pull /system/b2g/omni.ja
> mkdir unpacked
> cd unpacked
> unzip ../omni.ja
> edited chrome/chrome/content/settings.js (commented out line 452)
> zip -r ../omni-new.ja *
> adb remount
> adb push ../omni-new.ja /system/b2g
> adb shell stop b2g
> adb shell mv /system/b2g/omni-new.ja /system/b2g/omni.ja
> adb reboot; adb wait-for-device; adb logcat

and observed logcat doesn't get dropped.
Attachment #8471266 - Flags: review?(fabrice)
(In reply to Dave Hylands [:dhylands] from comment #27)

> The simple solution seems to be to remove the call to  
> AdbController.setRemoteDebuggerState(value);
>  from devtools.debugger.remote-enabled.

I agree. I totally forgot to do this after things stabilized with the 3 states version.

> In order to do this, we need to be sure that when we upgrade from 1.3 to 1.4
> that the devtools.debugger.remote-enabled gets properly translated into
> debugger.remote-mode or say that we don't care (the worst case behaviour
> would be that if a user has adb enabled in 1.3, then it might not be enabled
> after updating to 1.4, but they could still go into the settings app and
> turn it back on.

I think we don't care. Users can still switch back manually and it's a dev setting.
Comment on attachment 8471266 [details] [diff] [review]
Don't drop adb connection during startup.

Review of attachment 8471266 [details] [diff] [review]:
-----------------------------------------------------------------

I would just remove the AdbController.setRemoteDebuggerState(value != 'disabled'); block. No need to keep history in comments!
Attachment #8471266 - Flags: review?(fabrice) → review+
Updated patch to remove the comments.
Attachment #8471266 - Attachment is obsolete: true
Attachment #8471323 - Flags: review+
Is this something that I should ask for approval on?
Much appreciated Dave.
https://hg.mozilla.org/mozilla-central/rev/332a663642b0
Assignee: nobody → dhylands
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment on attachment 8471323 [details] [diff] [review]
Don't drop adb connection during startup.

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 956185
User impact if declined: Affects v1.4 certification testing
Testing completed: Manual Testing
Risk to taking this patch (and alternatives if risky): Low
String or UUID changes made by this patch: None

I've been told by #ateam that they have a workaround, but would prefer if this could land.
Attachment #8471323 - Flags: approval-mozilla-b2g30?
Comment on attachment 8471323 [details] [diff] [review]
Don't drop adb connection during startup.

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): 956185
User impact if declined: Affects v1.4 certification testing (so it will also affect 2.0 certification testing)
Testing completed: Manual Testing
Risk to taking this patch (and alternatives if risky): Low
String or UUID changes made by this patch: None
Attachment #8471323 - Flags: approval-mozilla-b2g32?
Comment on attachment 8471323 [details] [diff] [review]
Don't drop adb connection during startup.

This looks low risk enough to get it landed and avoid the annoyance, hence making an exception on approving here given the request was made a while back. If this breaks any tests or has fallouts lets back this out immediately and live with workaraound instead Please make sure to confirm this is fixed after it lands on branches
Attachment #8471323 - Flags: approval-mozilla-b2g32?
Attachment #8471323 - Flags: approval-mozilla-b2g32+
Attachment #8471323 - Flags: approval-mozilla-b2g30?
Attachment #8471323 - Flags: approval-mozilla-b2g30+
Keywords: verifyme
It looks like the wrong attachment got uplifted to 2.0

If you look at the commits in comment 39 they're different.
Flags: needinfo?(ryanvm)
Sorry, backed out and re-landed with the correct hunk removed.

https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/13b41e22c8f6
Flags: needinfo?(ryanvm)
Duplicate of this bug: 1060427
Issue verified fixed on Flame 1.4, Flame 2.0, Flame 2.1, and Flame 2.2

Actual Results: With ADB Debugging enabled and lockscreen disabled, running command "adb reboot; adb wait-for-device; adb logcat" in terminal output logcat without interruption following adb reboot. Logcat does not get dropped at any time.

Device: Flame 1.4 (319mb)(Kitkat Base)(Full Flash)
BuildID: 20141027000203
Gaia: bb76c81f83e1e4acc2d2972a451db2bce78c8f34
Gecko: 1bde54b2e7b0
Gonk: 
Version: 30.0 (1.4)
Firmware: V123
User Agent: Mozilla/5.0 (Mobile; rv:30.0) Gecko/30.0 Firefox/30.0

Device: Flame 2.0 (319mb)(Kitkat Base)(Full Flash)
Build ID: 20141027000202
Gaia: 2183b4f3ec0eb47ab1f133c31732ec53b08ad253
Gecko: 43bee45176c4
Version: 32.0 (2.0)
Firmware Version: v188
User Agent: Mozilla/5.0 (Mobile; rv:32.0) Gecko/32.0 Firefox/32.0

Device: Flame 2.1 (319mb)(Kitkat Base)(Full Flash)
Build ID: 20141027001201
Gaia: c97463d61f45513a2123b19610386ddbfc916819
Gecko: 4f8c0c021128
Version: 34.0 (2.1)
Firmware Version: v188
User Agent: Mozilla/5.0 (Mobile; rv:34.0) Gecko/34.0 Firefox/34.0

Device: Flame Master (319mb)(Kitkat Base)(Full Flash)
Build ID: 20141027040237
Gaia: e91d99e4d96954f06383c00bb9d79598a697e310
Gecko: 8230834302c9
Version: 36.0a1 (Master)
Firmware Version: v188
User Agent: Mozilla/5.0 (Mobile; rv:36.0) Gecko/36.0 Firefox/36.0
Status: RESOLVED → VERIFIED
QA Whiteboard: [QAnalyst-Triage?]
Flags: needinfo?(ktucker)
Keywords: verifyme
Correction: 1.4 device in Comment 43 should be - "Device: Flame 1.4 (319mb)(Jellybean Base)(Shallow Flash)"
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)
You need to log in before you can comment on or make changes to this bug.