Closed Bug 959520 Opened 12 years ago Closed 11 years ago

Make socket.timeout error messages more useful

Tracking

(b2g-v1.3 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla30

Tracking Flags:

Tracking

Status

b2g-v1.3

---

fixed

People

(Reporter: mdas, Assigned: mdas)

References

Details

Attachments

(1 file, 3 obsolete files)

Determine B2G state and output recent errors 12 years ago Malini Das [:mdas] - Away, not checking bugmail 9.22 KB, patch	jgriffin : review-	Details \| Diff \| Splinter Review
Determine B2G state and output recent errors 12 years ago Malini Das [:mdas] - Away, not checking bugmail 10.40 KB, patch	jgriffin : review+	Details \| Diff \| Splinter Review
Determine B2G state and output recent errors 12 years ago Malini Das [:mdas] - Away, not checking bugmail 10.30 KB, patch	mdas : review+	Details \| Diff \| Splinter Review
Determine B2G state and output recent errors 11 years ago Malini Das [:mdas] - Away, not checking bugmail 10.73 KB, patch	jgriffin : review+	Details \| Diff \| Splinter Review

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Description

•

12 years ago

This is the most mystifying error for users, and it makes it seem like marionette is causing the failure when that is likely not the case. When we see this error in CI, we should have some logic to detect what has crashed or suggest areas to investigate. When we see this error, we should check our platform, and if it is a b2g device, we should check if the b2g process is still running, if not, report that. If it is still running we should grep some of the logcat for errors and suggest those as investigation points.

Stephen Donner [:stephend] Not actively reading bugmail

Updated

•

12 years ago

Blocks: 801898

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 1

•

12 years ago

Attached patch Determine B2G state and output recent errors (obsolete) — Details — Splinter Review

This patch adds a new TestResult mixin that will automatically be added by the base runner if it detects that we're running against a device. The mixin will check the test results, and if socket.timeout or Broken Pipe show up in the test result, it will run a diagnosis to figure out what went wrong. I don't like having to shell out to get the logcat output, but I don't see the benefit of doing that work through python. To test this out locally, I'd recommend the following changes: - in client.py, in line 60, "def connect(self, timeout=360.0):" change timeout to be 10 - test_log.py. At the end of the first test, add: self.marionette.set_context("chrome") self.marionette.navigate("chrome://browser/content/aboutCertError.xhtml") Then run test_log.py against a device, and desktop. The device will have B2G related output, and desktop output should have no difference. Also, I've changed the 'bla bla bla socket closed?' string since it's confusing.

Attachment #8363193 - Flags: review?(jgriffin)

Jonathan Griffin (:jgriffin)

Comment 2

•

12 years ago

Comment on attachment 8363193 [details] [diff] [review] Determine B2G state and output recent errors Review of attachment 8363193 [details] [diff] [review]: ----------------------------------------------------------------- Cool, I think something like this is exactly what we need! My primary concern is that I believe the system restarts the B2G process if it crashes, at least in some cases, and if that happens, depending on the timing of the Python exception, this code may report that there is an unknown cause of the error. To avoid that, can we record the PID of the b2g process at some opportune times, and then infer that B2G has crashed if one of these errors occurs and the PID has changed? ::: testing/marionette/client/marionette/runner/mixins/b2g.py @@ +50,5 @@ > + b2g_output = device_manager.shellCheckOutput(['b2g-ps']) > + if 'b2g' in b2g_output: > + # find recent errors > + # In the event that the logcat is very large(ie: endurance tests), we use files and grep instead of using dm.getLogcat > + b2g_running = subprocess.Popen(['adb', 'logcat'], You can use logcat -d to dump the logcat and exit, which would be cleaner and less error prone than waiting for a second and then killing the logcat process. @@ +55,5 @@ > + stdout=open('logcat_output.log', 'w'), > + stderr=subprocess.STDOUT) > + time.sleep(1) > + b2g_running.kill() > + relevant_output = subprocess.Popen(['grep', '-i', 'error\|exception', 'logcat_output.log'], It seems like this code would be cleaner using Python re; i.e. you could avoid using the secondary file. Using mozprocess for the invocation of 'adb logcat -d' with a processOutputLine handler would probably be cleaner still (e.g. http://mxr.mozilla.org/mozilla-central/source/build/mobile/b2gautomation.py#26), but it's not totally necessary.

Attachment #8363193 - Flags: review?(jgriffin) → review-

Stephen Donner [:stephend] Not actively reading bugmail

Updated

•

12 years ago

Blocks: 962700

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 3

•

12 years ago

(In reply to Jonathan Griffin (:jgriffin) from comment #2) > Comment on attachment 8363193 [details] [diff] [review] > Determine B2G state and output recent errors > > Review of attachment 8363193 [details] [diff] [review]: > ----------------------------------------------------------------- > > Cool, I think something like this is exactly what we need! > > My primary concern is that I believe the system restarts the B2G process if > it crashes, at least in some cases, and if that happens, depending on the > timing of the Python exception, this code may report that there is an > unknown cause of the error. To avoid that, can we record the PID of the b2g > process at some opportune times, and then infer that B2G has crashed if one > of these errors occurs and the PID has changed? > Sure I'm fiddling around with that now. > ::: testing/marionette/client/marionette/runner/mixins/b2g.py > @@ +50,5 @@ > > + b2g_output = device_manager.shellCheckOutput(['b2g-ps']) > > + if 'b2g' in b2g_output: > > + # find recent errors > > + # In the event that the logcat is very large(ie: endurance tests), we use files and grep instead of using dm.getLogcat > > + b2g_running = subprocess.Popen(['adb', 'logcat'], > > You can use logcat -d to dump the logcat and exit, which would be cleaner > and less error prone than waiting for a second and then killing the logcat > process. > > @@ +55,5 @@ > > + stdout=open('logcat_output.log', 'w'), > > + stderr=subprocess.STDOUT) > > + time.sleep(1) > > + b2g_running.kill() > > + relevant_output = subprocess.Popen(['grep', '-i', 'error\|exception', 'logcat_output.log'], > > It seems like this code would be cleaner using Python re; i.e. you could > avoid using the secondary file. Using mozprocess for the invocation of 'adb > logcat -d' with a processOutputLine handler would probably be cleaner still > (e.g. > http://mxr.mozilla.org/mozilla-central/source/build/mobile/b2gautomation. > py#26), but it's not totally necessary. Okay, we can use mozdevice for this. I've changed this and it looks simpler, hopefully the size of the logcat won't be an issue.

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 4

•

12 years ago

Attached patch Determine B2G state and output recent errors (obsolete) — Details — Splinter Review

Updated according to comments

Attachment #8363193 - Attachment is obsolete: true

Attachment #8365177 - Flags: review?(jgriffin)

Jonathan Griffin (:jgriffin)

Comment 5

•

12 years ago

Comment on attachment 8365177 [details] [diff] [review] Determine B2G state and output recent errors Review of attachment 8365177 [details] [diff] [review]: ----------------------------------------------------------------- ::: testing/marionette/client/marionette/client.py @@ +90,4 @@ > > for packet in [data[i:i + self.max_packet_length] for i in > range(0, len(data), self.max_packet_length)]: > + try: nit: trailing whitespace ::: testing/marionette/client/marionette/runner/base.py @@ +340,5 @@ > + > + def run(self, test): > + dm_type = os.environ.get('DM_TRANS', 'adb') > + if dm_type == 'adb': > + self.b2g_pid = get_b2g_pid(get_dm()) nite: trailing whitespace @@ +341,5 @@ > + def run(self, test): > + dm_type = os.environ.get('DM_TRANS', 'adb') > + if dm_type == 'adb': > + self.b2g_pid = get_b2g_pid(get_dm()) > + return super(B2GMarionetteTextTestRunner, self).run(test) We'll want to save self.b2g_pid during gaiatest's start_b2g() method as well, in a separate patch, since that method restarts B2G within the context of a test. Eventually, we may want to pull a few of the generic GaiaDevice methods into a B2GDevice mixin, along with get_b2g_pid(). ::: testing/marionette/client/marionette/runner/mixins/b2g.py @@ +62,5 @@ > + # find recent errors > + message = "" > + error_re = re.compile(r"""[\s\S]*(exception|error)[\s\S]*""", flags=re.IGNORECASE) > + logcat = device_manager.getLogcat() > + latest = [] nit: trailing whitespace @@ +74,5 @@ > + iters -= 1 > + message += "\nMost recent errors/exceptions are:\n" > + for line in reversed(latest): > + message += "%s" % line > + pid = get_b2g_pid(device_manager) We call 'b2g-ps' on line 60 too; I think we could consolidate these and just call 'b2g-ps' once.

Attachment #8365177 - Flags: review?(jgriffin) → review+

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 6

•

12 years ago

Attached patch Determine B2G state and output recent errors (obsolete) — Details — Splinter Review

Thanks, I've addressed the comments and filed Bug 964423 with respect to the Gaia restart issue. Moving r+ forward

Attachment #8365177 - Attachment is obsolete: true

Attachment #8366156 - Flags: review+

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 7

•

12 years ago

landed in m-i: https://hg.mozilla.org/integration/mozilla-inbound/rev/a37e6c57ae0f

Phil Ringnalda (:philor)

Comment 8

•

11 years ago

Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/4ca21fff1c94 - it would have been permaorange in marionette-webapi, from https://tbpl.mozilla.org/php/getParsedLog.php?id=33650158&tree=Mozilla-Inbound, except that buildbot treats "devicemanager.DMError" as a sign that a failure was from a busted tegra or panda and the job should be retried, so we just kept rerunning it and rerunning it all afternoon.

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 9

•

11 years ago

I've fixed this locally and will test on try as soon as the tree is open. Once green, I have to wait for https://bugzilla.mozilla.org/show_bug.cgi?id=940554 to land, since it changes the 'socket.timeout' message we rely on here.

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 10

•

11 years ago

Attached patch Determine B2G state and output recent errors — Details — Splinter Review

this patch passed on try: https://tbpl.mozilla.org/?tree=Try&rev=cc15cec23fd4 The difference here is using the given adb path when running against an emulator, and the change from 'socket.timeout' to 'Connection timed out'

Attachment #8366156 - Attachment is obsolete: true

Attachment #8367309 - Flags: review?(jgriffin)

Zac C (:zac)

Updated

•

11 years ago

Blocks: 965256

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Updated

•

11 years ago

Blocks: 964438

Jonathan Griffin (:jgriffin)

Comment 11

•

11 years ago

Comment on attachment 8367309 [details] [diff] [review] Determine B2G state and output recent errors Review of attachment 8367309 [details] [diff] [review]: ----------------------------------------------------------------- lgtm

Attachment #8367309 - Flags: review?(jgriffin) → review+

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Updated

•

11 years ago

Depends on: 940554

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Comment 12

•

11 years ago

landing on m-i: https://hg.mozilla.org/integration/mozilla-inbound/rev/083e26c54750

Ryan VanderMeulen [:RyanVM]

Comment 13

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/083e26c54750

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla30

Malini Das [:mdas] - Away, not checking bugmail

Assignee

Updated

•

11 years ago

status-b2g-v1.3: --- → fixed

BMO Automation

Updated

•

2 years ago

Product: Testing → Remote Protocol

You need to log in before you can comment on or make changes to this bug.