Closed Bug 1380134 Opened 8 years ago Closed 6 years ago

Crash [@ mozilla::jni::detail::NativePtr<mozilla::widget::GeckoEditableSupport, 2>::Clear<mozilla::jni::LocalRef<mozilla::java::GeckoEditableChild> >

Categories

(Firefox for Android Graveyard :: General, defect, P5)

defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: bc, Unassigned)

References

Details

(Keywords: crash, regression, Whiteboard: [stockwell fixed:backout])

Crash Data

Beginning with https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=707b8c089de26e0eb3051e743a1cefea84ce2196&filter-searchStr=autophone I see a number of different crashes. Unfortunately the logging does not show the signature in the Treeherder Failure Summary but the crash signatures and stacks are available in the log. In addition to several stacks without symbols I see multiple instances of [@ mozilla::jni::detail::NativePtr<mozilla::widget::GeckoEditableSupport, 2>::Clear<mozilla::jni::LocalRef<mozilla::java::GeckoEditableChild> >] [@ mozilla::java::VsyncSource::ObserveVsync] This also appears but may be an existing intermittent. [@ nsWindow::RedrawAll] snorp, esawin: Do these crashes make sense with regard to the patch?
Other signatures that are appearing: [@ MessageLoop::PostTask_Helper] [@ nsWindow::CreateLayerManager]
Flags: needinfo?(snorp)
Flags: needinfo?(esawin)
This also appears to have started on autoland at https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=064f2ad7ca3bcd37df0ca60dd870a722ca96c756&filter-searchStr=autophone This leads me to think that this is not related to the check in where the failures began but the landing of the shutdown intent change in Autophone itself in bug 1371291. I don't yet see this on mozilla-central but if it does appear there on the next check in then I would say it is settled that this is an autophone caused issue.
I've prepared a patch to revert bug 1371291. I've started the autophone servers with pulse disabled so that we have a change to catch up on all jobs before we deploy the change. It will then be clearer if the reversion is the fix or not.
Whiteboard: [stockwell needswork]
Blocks: 1371291
No longer blocks: 1379461
autophone-4 is set up to report to staging and phonedash-dev.allizom.org. It is not listening to pulse, so we will need to trigger the jobs manually For example, from autophone-4 I did: python trigger_runs.py --build-location=tinderbox --loglevel=DEBUG --repo=autoland --buildtype=opt --buildtype=debug --first-revision=66d21dbbe9160da0df4a0329f6370a25d2eb78dc --last-revision=66d21dbbe9160da0df4a0329f6370a25d2eb78dc which resulted in the jobs https://treeherder.allizom.org/#/jobs?repo=autoland&revision=66d21dbbe9160da0df4a0329f6370a25d2eb78dc&filter-searchStr=autophone
It looks like we're seeing an invalid/null reference in DisposeNative(mEditable) called from the lambda in GeckoEditableSupport::OnDetach. I don't see how that could happen, given that we pass a RefPtr of this to the lambda. Also interesting that we didn't see that signature when using the Quitter add-on. Any ideas, Jim?
Flags: needinfo?(esawin) → needinfo?(nchen)
Note that initially we had both quitter and the shutdown intent since I need a cross branch solution and we hadn't landed the shutdown intent on beta or release. Perhaps it is an interaction between the two.
Eugen on the case
Flags: needinfo?(snorp)
Bob, have you backed the Autophone patch out? I'm confused why we are still seeing the failures.
Flags: needinfo?(bob)
Comment 10 was for the whole week preceding which included the range where I had landed the patch. If you follow the link to brasstacks you'll see that the count dropped to zero.
Flags: needinfo?(bob)
I set up autophone-4 to run without installing quitter and without references to quitter in any of the files while relying solely on the shutdown intent and ran t, tg, tpn, svg on my samsung gs3 3 (android 4.0), nexus 5 (android 4.4) and pixel (android 7.1). Only t on pixel failed. It had the same signature as before: [@ mozilla::jni::detail::NativePtr<mozilla::widget::GeckoEditableSupport, 2>::Clear<mozilla::jni::LocalRef<mozilla::java::GeckoEditableChild> >] This contrasts to the failures when I was running both with quitter and the shutdown intent where we saw fairly consistent failures on the pixels, the 6ps and an occasional nexus 5 for both t and svg. Note the crash appears to happen in the most recent run just after the profile is created, fennec starts to load initialize_profile.html and an am force-stop is issued to the app within a couple of seconds. Subsequent invocations to load the page for the first and second visits do not appear to crash. You can see this run on staging at: https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=e4e7f0468271b6bdc8c703d1d52c9de6e2325be3&filter-searchStr=autophone The samsung gs3 was run locally with DEBUG level logging but unfortunately the nexus 5 and pixels runs were done on autophone-4 with INFO level logging. I've started a new run with only t, tpn, svg where autophone-4 has DEBUG level logging so we can see more details. Already, it has shown the same crash on tpn on both of the pixels. You can see this run at: https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=05e705204d3bced8e56631b0ef6fe2abe65710c7&filter-searchStr=autophone One possibility is to increase the time I wait before issuing the stop_application after loading the initialize_profile.html page just after creating the profile. Once this current run completes, I will test it.
It appears from the latest run that am stop isn't the cause. Looking at https://treeherder.allizom.org/logviewer.html#?job_id=109545184&repo=mozilla-inbound&lineNumber=1516 we see that initialize_profile is loaded, then we call stop_application which: 1. gives the home screen focus 2. issues the shutdown intent 3. waits for up to 10 seconds for the app to close before issuing am stop. In this particular case, we waited 8 seconds before the app shut itself down, then found the dmp files. I'll adjust the timing of issuing the shutdown intent to see if sending it too quickly is the cause.
I inserted a 10 second wait after the loading of initialize_profile which eliminated the crash: https://hg.mozilla.org/integration/mozilla-inbound/rev/9ea7e992bf5224d216e9a2546b6dd7be70cbe419 esawin: I'm reluctant to add 160 seconds to each test run and am leery of shortening the wait time which might result in intermittent crashes. Is there something we can do in Fennec to handle the shutdown intent gracefully even if it is received quickly? If not, what is a good way for me to determine when I can safely issue the shutdown intent after loading initialize_profile?
Flags: needinfo?(esawin)
fyi, at jim's suggestion I did a try run with the patches from Bug 1382335: https://treeherder.allizom.org/#/jobs?repo=try&revision=cd1899bd5674c5a0ffa76a42f33f8b193e6959b3&exclusion_profile=false several crashes at [@ mozilla::jni::detail::ProxyNativeCall<nsWindow::LayerViewSupport, mozilla::java::LayerView::Compositor, false, false, int, int, int, int>::operator()] with some at [@ MessageLoop::PostTask_Helper]
bc: could you please try another run on m-c? I think some patches (including patch 3 from bug 1382335) have landed in the meanwhile which should fix the race we're experiencing here.
Flags: needinfo?(nchen)
Flags: needinfo?(esawin)
sure
Flags: needinfo?(bob)
tested smoketest, talos svg/tpn, s1s2geckoview, s1s2: https://treeherder.allizom.org/#/jobs?repo=mozilla-central&revision=a3e675a3b10a0ea289c301bedc31866f3daf7875&exclusion_profile=false&group_state=expanded&filter-searchStr=autophone smoketest, talos svg/tpn, s1s2geckoview were all happy, bug... s1s2 intermittently crashed. nexus-5: crashed [@ nsWindow::RedrawAll] crashed [@ nsWindow::LayerViewSupport::SyncResumeResizeCompositor(mozilla::jni::LocalRef<mozilla::java::LayerView::Compositor> const&, int, int, mozilla::jni::Ref<mozilla::jni::Object, _jobject*> const&)::OnResumedEvent::Run() + 0x3b] crashed [@ nsWindow::CreateLayerManager] pixel: crashed [@ nsWindow::CreateLayerManager]
Flags: needinfo?(bob)
Whiteboard: [stockwell needswork] → [stockwell fixed:backout]
Re-triaging per https://bugzilla.mozilla.org/show_bug.cgi?id=1473195 Needinfo :susheel if you think this bug should be re-triaged.
Priority: P3 → P5

Closing because no crashes reported for 12 weeks.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.