Closed Bug 1389815 Opened 3 years ago Closed 1 year ago

Intermittent remote-tp4m autophone-talos | application crashed [unknown top frame]

Categories

(Firefox for Android :: General, defect, P5)

defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
fennec + ---
firefox57 --- affected

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Attachments

(1 file)

tracking-fennec: --- → ?
It looks like these occur on shutdown: logcats show the crash happens after the test summary is dumped. This seems typical:

https://autophone.s3.amazonaws.com/v1/task/DtbDxJDKS625WLVbaldb2g/runs/0/artifacts/public/build/66c2c183-b23e-405d-a84e-d0589f99f893-autophone.log

017-07-21 04:27:13,442        pixel-01 INFO     TalosTestJob mozilla-inbound 20170721110318 opt api-25 android-api-15 remote-tp4m logcat: 07-21 07:27:08.490 I/GeckoDump( 4759): -------- Summary: end --------
2017-07-21 04:27:13,443        pixel-01 INFO     TalosTestJob mozilla-inbound 20170721110318 opt api-25 android-api-15 remote-tp4m logcat: 07-21 07:27:08.490 I/GeckoDump( 4759):
2017-07-21 04:27:13,443        pixel-01 INFO     TalosTestJob mozilla-inbound 20170721110318 opt api-25 android-api-15 remote-tp4m logcat: 07-21 07:27:08.494 D/GeckoSuggestedSites( 4759): Number of suggested sites: 8
2017-07-21 04:27:13,443        pixel-01 INFO     TalosTestJob mozilla-inbound 20170721110318 opt api-25 android-api-15 remote-tp4m logcat: 07-21 07:27:08.512 I/GeckoPushService( 4759): Handling event: PushServiceAndroidGCM:Uninitialized
2017-07-21 04:27:13,443        pixel-01 INFO     TalosTestJob mozilla-inbound 20170721110318 opt api-25 android-api-15 remote-tp4m logcat: 07-21 07:27:08.513 I/nsScreenManagerAndroid( 4759): nsWindow[0xc1a66920]::Show 0
2017-07-21 04:27:13,443        pixel-01 INFO     TalosTestJob mozilla-inbound 20170721110318 opt api-25 android-api-15 remote-tp4m logcat: 07-21 07:27:08.518 I/nsScreenManagerAndroid( 4759): nsWindow[0xbc590980]::Show 0
2017-07-21 04:27:13,443        pixel-01 INFO     TalosTestJob mozilla-inbound 20170721110318 opt api-25 android-api-15 remote-tp4m logcat: 07-21 07:27:08.530 D/GeckoNetworkManager( 4759): Incoming event disableNotifications for state OnWithListeners -> OnNoListeners
2017-07-21 04:27:13,444        pixel-01 INFO     TalosTestJob mozilla-inbound 20170721110318 opt api-25 android-api-15 remote-tp4m logcat: 07-21 07:27:08.532 I/nsScreenManagerAndroid( 4759): nsWindow[0xbc590820]::Show 0
2017-07-21 04:27:13,444        pixel-01 INFO     TalosTestJob mozilla-inbound 20170721110318 opt api-25 android-api-15 remote-tp4m logcat: 07-21 07:27:08.532 I/nsScreenManagerAndroid( 4759): trying to show invisible window! ignoring..
2017-07-21 04:27:13,444        pixel-01 INFO     TalosTestJob mozilla-inbound 20170721110318 opt api-25 android-api-15 remote-tp4m logcat: 07-21 07:27:08.639 W/google-breakpad( 4759): ExceptionHandler::GenerateDump cloned child
This has become quite frequent and needs attention -- fix or disable?
Flags: needinfo?(snorp)
Flags: needinfo?(bob)
Whiteboard: [stockwell needswork]
Similar to the svg case which we disabled. We can do it here to if it is ok with jmaher.
Flags: needinfo?(bob) → needinfo?(jmaher)
I am fine disabling, I assume this is related to the browesr shutting down and not the actual test itself.
Flags: needinfo?(jmaher)
Disable tp4m pixel everywhere.
Attachment #8899521 - Flags: review+
https://github.com/mozilla/autophone/commit/3c772e9c76cf4091b2bc478dabbf2e8106120cf0
deployed 2017-08-21 09:50

leaving open for the crash.
Whiteboard: [stockwell needswork] → [stockwell disabled]
Jim can you look?
Flags: needinfo?(snorp) → needinfo?(nchen)
Looks like for some reason, we get incomplete crash dumps for some crashes, and because we can't automatically analyze the incomplete crash dumps, they all get grouped into this bug. For example, the crash from [1] is really a crash inside `MessageLoop::PostTask_Helper`, i.e. bug 1394428.

[1] https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=124486347
Flags: needinfo?(nchen)
After the recent crash fixes, I started looking into re-enabling talos. svg looks green across the board.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b42f3165db903d7bb137f6611e7399a82794fee9&filter-searchStr=a(tpn)
is green in production but
https://treeherder.allizom.org/#/jobs?repo=try&revision=b42f3165db903d7bb137f6611e7399a82794fee9&filter-searchStr=a(tpn)
has intermittent bustage on autophone-4. jmaher, any idea what is wrong here? I looked at the logs but nothing stood out as to why tpn intermittently is busted on autophone-4.
Flags: needinfo?(jmaher)
from looking at some logs, I see WIFI_DISCONNECT:
2017-10-03 13:39:17,629 45342  pixel-12       MainThread          pixel-12 INFO     TalosTestJob try 20171003073646 opt api-25 android-api-16 remote-tp4m logcat: 10-03 13:34:33.745 D/ConnectivityService( 1099): NetworkAgentInfo [WIFI () - 100] EVENT_NETWORK_INFO_CHANGED, going from CONNECTED to DISCONNECTED
2017-10-03 13:39:17,629 45342  pixel-12       MainThread          pixel-12 INFO     TalosTestJob try 20171003073646 opt api-25 android-api-16 remote-tp4m logcat: 10-03 13:34:33.746 D/ConnectivityService( 1099): NetworkAgentInfo [WIFI () - 100] got DISCONNECTED, was satisfying 6
2017-10-03 13:39:17,629 45342  pixel-12       MainThread          pixel-12 INFO     TalosTestJob try 20171003073646 opt api-25 android-api-16 remote-tp4m logcat: 10-03 13:34:33.749 V/NativeCrypto( 1956): SSL shutdown failed: ssl=0x70f1503b80: I/O error during system call, Broken pipe

I think this is what is going on- why is this tpn only though?  is this a specific device vs another one?
Flags: needinfo?(jmaher)
Both pixels attached to autophone-4 had similar results. 2 * pixel-11, 3 * pixel-12 busted while 4 * pixel-11, 3 * pixel-12 passed. I see similar disconnections on the other hosts though not as frequently. I wonder if autophone-4's placement in the rack makes its phones network connections less reliable. Van, thoughts?
Flags: needinfo?(vle)
spoke to :bc offline about this. i dont think there are any real differences as the phones are less than a foot away from other phones. the main difference are the USB cards, they're not powered.
Flags: needinfo?(vle)
still a problem. reopening to help with sheriffing.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Re-triaging per https://bugzilla.mozilla.org/show_bug.cgi?id=1473195

Needinfo :susheel if you think this bug should be re-triaged.
Priority: P2 → P5

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:bc, could you have a look please?

Flags: needinfo?(bob)

The patch did land in the autophone github repo to disable the test but Autophone is no more and we do not run talos on android hardware any more and the crash reports from this error are not useful. I'll call this incomplete.

Status: REOPENED → RESOLVED
Closed: 3 years ago1 year ago
Flags: needinfo?(bob)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.