1380134 - Crash [@ mozilla::jni::detail::NativePtr<mozilla::widget::GeckoEditableSupport, 2>::Clear<mozilla::jni::LocalRef<mozilla::java::GeckoEditableChild> >

Reporter

Description

•

8 years ago

Beginning with https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=707b8c089de26e0eb3051e743a1cefea84ce2196&filter-searchStr=autophone I see a number of different crashes. Unfortunately the logging does not show the signature in the Treeherder Failure Summary but the crash signatures and stacks are available in the log. In addition to several stacks without symbols I see multiple instances of [@ mozilla::jni::detail::NativePtr<mozilla::widget::GeckoEditableSupport, 2>::Clear<mozilla::jni::LocalRef<mozilla::java::GeckoEditableChild> >] [@ mozilla::java::VsyncSource::ObserveVsync] This also appears but may be an existing intermittent. [@ nsWindow::RedrawAll] snorp, esawin: Do these crashes make sense with regard to the patch?

Bob Clary [:bc] (inactive)

Reporter

Comment 1

•

8 years ago

Other signatures that are appearing: [@ MessageLoop::PostTask_Helper] [@ nsWindow::CreateLayerManager]

Flags: needinfo?(snorp)

Flags: needinfo?(esawin)

Bob Clary [:bc] (inactive)

Reporter

Comment 2

•

8 years ago

This also appears to have started on autoland at https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=064f2ad7ca3bcd37df0ca60dd870a722ca96c756&filter-searchStr=autophone This leads me to think that this is not related to the check in where the failures began but the landing of the shutdown intent change in Autophone itself in bug 1371291. I don't yet see this on mozilla-central but if it does appear there on the next check in then I would say it is settled that this is an autophone caused issue.

Bob Clary [:bc] (inactive)

Reporter

Comment 3

•

8 years ago

I've prepared a patch to revert bug 1371291. I've started the autophone servers with pulse disabled so that we have a change to catch up on all jobs before we deploy the change. It will then be clearer if the reversion is the fix or not.

Geoff Brown [:gbrown]

Updated

•

8 years ago

Whiteboard: [stockwell needswork]

Bob Clary [:bc] (inactive)

Reporter

Updated

•

8 years ago

Blocks: 1371291
No longer blocks: 1379461

Bob Clary [:bc] (inactive)

Reporter

Comment 4

•

8 years ago

crashes fixed by backout in bug 1371291 the range of bad jobs is: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=autophone&exclusion_profile=false&fromchange=707b8c089de26e0eb3051e743a1cefea84ce2196&tochange=35eb832df41bef4cc58ee7df1e9950c608ce91ba https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=autophone&exclusion_profile=false&fromchange=064f2ad7ca3bcd37df0ca60dd870a722ca96c756&tochange=d5e832bbd4ee293d92a0520c91caa80beba6faa4 Leaving open to track why using the shutdown intent caused these crashes. I do have 3 nexus 5s and 2 pixel we can use on autophone-4 to test this and submit to staging. I'll get that setup.

Bob Clary [:bc] (inactive)

Reporter

Comment 5

•

8 years ago

autophone-4 is set up to report to staging and phonedash-dev.allizom.org. It is not listening to pulse, so we will need to trigger the jobs manually For example, from autophone-4 I did: python trigger_runs.py --build-location=tinderbox --loglevel=DEBUG --repo=autoland --buildtype=opt --buildtype=debug --first-revision=66d21dbbe9160da0df4a0329f6370a25d2eb78dc --last-revision=66d21dbbe9160da0df4a0329f6370a25d2eb78dc which resulted in the jobs https://treeherder.allizom.org/#/jobs?repo=autoland&revision=66d21dbbe9160da0df4a0329f6370a25d2eb78dc&filter-searchStr=autophone

Comment hidden (Intermittent Failures Robot)

Eugen Sawin [:esawin]

Comment 7

•

8 years ago

It looks like we're seeing an invalid/null reference in DisposeNative(mEditable) called from the lambda in GeckoEditableSupport::OnDetach. I don't see how that could happen, given that we pass a RefPtr of this to the lambda. Also interesting that we didn't see that signature when using the Quitter add-on. Any ideas, Jim?

Flags: needinfo?(esawin) → needinfo?(nchen)

Bob Clary [:bc] (inactive)

Reporter

Comment 8

•

8 years ago

Note that initially we had both quitter and the shutdown intent since I need a cross branch solution and we hadn't landed the shutdown intent on beta or release. Perhaps it is an interaction between the two.

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Comment 9

•

8 years ago

Eugen on the case

Flags: needinfo?(snorp)

Comment hidden (Intermittent Failures Robot)

Eugen Sawin [:esawin]

Comment 11

•

8 years ago

Bob, have you backed the Autophone patch out? I'm confused why we are still seeing the failures.

Flags: needinfo?(bob)

Bob Clary [:bc] (inactive)

Reporter

Comment 12

•

8 years ago

Comment 10 was for the whole week preceding which included the range where I had landed the patch. If you follow the link to brasstacks you'll see that the count dropped to zero.

Flags: needinfo?(bob)

Bob Clary [:bc] (inactive)

Reporter

Comment 13

•

8 years ago

I set up autophone-4 to run without installing quitter and without references to quitter in any of the files while relying solely on the shutdown intent and ran t, tg, tpn, svg on my samsung gs3 3 (android 4.0), nexus 5 (android 4.4) and pixel (android 7.1). Only t on pixel failed. It had the same signature as before: [@ mozilla::jni::detail::NativePtr<mozilla::widget::GeckoEditableSupport, 2>::Clear<mozilla::jni::LocalRef<mozilla::java::GeckoEditableChild> >] This contrasts to the failures when I was running both with quitter and the shutdown intent where we saw fairly consistent failures on the pixels, the 6ps and an occasional nexus 5 for both t and svg. Note the crash appears to happen in the most recent run just after the profile is created, fennec starts to load initialize_profile.html and an am force-stop is issued to the app within a couple of seconds. Subsequent invocations to load the page for the first and second visits do not appear to crash. You can see this run on staging at: https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=e4e7f0468271b6bdc8c703d1d52c9de6e2325be3&filter-searchStr=autophone The samsung gs3 was run locally with DEBUG level logging but unfortunately the nexus 5 and pixels runs were done on autophone-4 with INFO level logging. I've started a new run with only t, tpn, svg where autophone-4 has DEBUG level logging so we can see more details. Already, it has shown the same crash on tpn on both of the pixels. You can see this run at: https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=05e705204d3bced8e56631b0ef6fe2abe65710c7&filter-searchStr=autophone One possibility is to increase the time I wait before issuing the stop_application after loading the initialize_profile.html page just after creating the profile. Once this current run completes, I will test it.

Bob Clary [:bc] (inactive)

Reporter

Comment 14

•

8 years ago

It appears from the latest run that am stop isn't the cause. Looking at https://treeherder.allizom.org/logviewer.html#?job_id=109545184&repo=mozilla-inbound&lineNumber=1516 we see that initialize_profile is loaded, then we call stop_application which: 1. gives the home screen focus 2. issues the shutdown intent 3. waits for up to 10 seconds for the app to close before issuing am stop. In this particular case, we waited 8 seconds before the app shut itself down, then found the dmp files. I'll adjust the timing of issuing the shutdown intent to see if sending it too quickly is the cause.

Bob Clary [:bc] (inactive)

Reporter

Comment 15

•

8 years ago

I inserted a 10 second wait after the loading of initialize_profile which eliminated the crash: https://hg.mozilla.org/integration/mozilla-inbound/rev/9ea7e992bf5224d216e9a2546b6dd7be70cbe419 esawin: I'm reluctant to add 160 seconds to each test run and am leery of shortening the wait time which might result in intermittent crashes. Is there something we can do in Fennec to handle the shutdown intent gracefully even if it is received quickly? If not, what is a good way for me to determine when I can safely issue the shutdown intent after loading initialize_profile?

Flags: needinfo?(esawin)

Bob Clary [:bc] (inactive)

Reporter

Comment 16

•

8 years ago

fyi, at jim's suggestion I did a try run with the patches from Bug 1382335: https://treeherder.allizom.org/#/jobs?repo=try&revision=cd1899bd5674c5a0ffa76a42f33f8b193e6959b3&exclusion_profile=false several crashes at [@ mozilla::jni::detail::ProxyNativeCall<nsWindow::LayerViewSupport, mozilla::java::LayerView::Compositor, false, false, int, int, int, int>::operator()] with some at [@ MessageLoop::PostTask_Helper]

Eugen Sawin [:esawin]

Comment 17

•

8 years ago

bc: could you please try another run on m-c? I think some patches (including patch 3 from bug 1382335) have landed in the meanwhile which should fix the race we're experiencing here.

Flags: needinfo?(nchen)

Flags: needinfo?(esawin)

Bob Clary [:bc] (inactive)

Reporter

Comment 18

•

8 years ago

sure

Flags: needinfo?(bob)

Bob Clary [:bc] (inactive)

Reporter

Comment 19

•

8 years ago

tested smoketest, talos svg/tpn, s1s2geckoview, s1s2: https://treeherder.allizom.org/#/jobs?repo=mozilla-central&revision=a3e675a3b10a0ea289c301bedc31866f3daf7875&exclusion_profile=false&group_state=expanded&filter-searchStr=autophone smoketest, talos svg/tpn, s1s2geckoview were all happy, bug... s1s2 intermittently crashed. nexus-5: crashed [@ nsWindow::RedrawAll] crashed [@ nsWindow::LayerViewSupport::SyncResumeResizeCompositor(mozilla::jni::LocalRef<mozilla::java::LayerView::Compositor> const&, int, int, mozilla::jni::Ref<mozilla::jni::Object, _jobject*> const&)::OnResumedEvent::Run() + 0x3b] crashed [@ nsWindow::CreateLayerManager] pixel: crashed [@ nsWindow::CreateLayerManager]

Flags: needinfo?(bob)

Geoff Brown [:gbrown]

Updated

•

8 years ago

Whiteboard: [stockwell needswork] → [stockwell fixed:backout]

Michael Comella (:mcomella) [NI reported issues only: ex-Mozilla]

Updated

•

7 years ago

Priority: -- → P3

Firefox Bug Husbandry Bot

Comment 20

•

7 years ago

Re-triaging per https://bugzilla.mozilla.org/show_bug.cgi?id=1473195 Needinfo :susheel if you think this bug should be re-triaged.

Priority: P3 → P5

BugBot [:suhaib / :marco/ :calixte]

Comment 21

•

6 years ago

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WORKSFORME

Nobody; OK to take it and work on it

Assignee

Updated

•

5 years ago

Product: Firefox for Android → Firefox for Android Graveyard