Closed
Bug 1380134
Opened 8 years ago
Closed 6 years ago
Crash [@ mozilla::jni::detail::NativePtr<mozilla::widget::GeckoEditableSupport, 2>::Clear<mozilla::jni::LocalRef<mozilla::java::GeckoEditableChild> >
Categories
(Firefox for Android Graveyard :: General, defect, P5)
Firefox for Android Graveyard
General
Tracking
(Not tracked)
RESOLVED
WORKSFORME
People
(Reporter: bc, Unassigned)
References
Details
(Keywords: crash, regression, Whiteboard: [stockwell fixed:backout])
Crash Data
Beginning with https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=707b8c089de26e0eb3051e743a1cefea84ce2196&filter-searchStr=autophone I see a number of different crashes. Unfortunately the logging does not show the signature in the Treeherder Failure Summary but the crash signatures and stacks are available in the log.
In addition to several stacks without symbols I see multiple instances of
[@ mozilla::jni::detail::NativePtr<mozilla::widget::GeckoEditableSupport, 2>::Clear<mozilla::jni::LocalRef<mozilla::java::GeckoEditableChild> >]
[@ mozilla::java::VsyncSource::ObserveVsync]
This also appears but may be an existing intermittent.
[@ nsWindow::RedrawAll]
snorp, esawin: Do these crashes make sense with regard to the patch?
| Reporter | ||
Comment 1•8 years ago
|
||
Other signatures that are appearing:
[@ MessageLoop::PostTask_Helper]
[@ nsWindow::CreateLayerManager]
Flags: needinfo?(snorp)
Flags: needinfo?(esawin)
| Reporter | ||
Comment 2•8 years ago
|
||
This also appears to have started on autoland at https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=064f2ad7ca3bcd37df0ca60dd870a722ca96c756&filter-searchStr=autophone
This leads me to think that this is not related to the check in where the failures began but the landing of the shutdown intent change in Autophone itself in bug 1371291. I don't yet see this on mozilla-central but if it does appear there on the next check in then I would say it is settled that this is an autophone caused issue.
| Reporter | ||
Comment 3•8 years ago
|
||
I've prepared a patch to revert bug 1371291. I've started the autophone servers with pulse disabled so that we have a change to catch up on all jobs before we deploy the change. It will then be clearer if the reversion is the fix or not.
Updated•8 years ago
|
Whiteboard: [stockwell needswork]
| Reporter | ||
Updated•8 years ago
|
| Reporter | ||
Comment 4•8 years ago
|
||
crashes fixed by backout in bug 1371291
the range of bad jobs is:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-searchStr=autophone&exclusion_profile=false&fromchange=707b8c089de26e0eb3051e743a1cefea84ce2196&tochange=35eb832df41bef4cc58ee7df1e9950c608ce91ba
https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=autophone&exclusion_profile=false&fromchange=064f2ad7ca3bcd37df0ca60dd870a722ca96c756&tochange=d5e832bbd4ee293d92a0520c91caa80beba6faa4
Leaving open to track why using the shutdown intent caused these crashes. I do have 3 nexus 5s and 2 pixel we can use on autophone-4 to test this and submit to staging. I'll get that setup.
| Reporter | ||
Comment 5•8 years ago
|
||
autophone-4 is set up to report to staging and phonedash-dev.allizom.org. It is not listening to pulse, so we will need to trigger the jobs manually
For example, from autophone-4 I did:
python trigger_runs.py --build-location=tinderbox --loglevel=DEBUG --repo=autoland --buildtype=opt --buildtype=debug --first-revision=66d21dbbe9160da0df4a0329f6370a25d2eb78dc --last-revision=66d21dbbe9160da0df4a0329f6370a25d2eb78dc
which resulted in the jobs
https://treeherder.allizom.org/#/jobs?repo=autoland&revision=66d21dbbe9160da0df4a0329f6370a25d2eb78dc&filter-searchStr=autophone
| Comment hidden (Intermittent Failures Robot) |
Comment 7•8 years ago
|
||
It looks like we're seeing an invalid/null reference in DisposeNative(mEditable) called from the lambda in GeckoEditableSupport::OnDetach.
I don't see how that could happen, given that we pass a RefPtr of this to the lambda.
Also interesting that we didn't see that signature when using the Quitter add-on.
Any ideas, Jim?
Flags: needinfo?(esawin) → needinfo?(nchen)
| Reporter | ||
Comment 8•8 years ago
|
||
Note that initially we had both quitter and the shutdown intent since I need a cross branch solution and we hadn't landed the shutdown intent on beta or release. Perhaps it is an interaction between the two.
Eugen on the case
Flags: needinfo?(snorp)
| Comment hidden (Intermittent Failures Robot) |
Comment 11•8 years ago
|
||
Bob, have you backed the Autophone patch out? I'm confused why we are still seeing the failures.
Flags: needinfo?(bob)
| Reporter | ||
Comment 12•8 years ago
|
||
Comment 10 was for the whole week preceding which included the range where I had landed the patch. If you follow the link to brasstacks you'll see that the count dropped to zero.
Flags: needinfo?(bob)
| Reporter | ||
Comment 13•8 years ago
|
||
I set up autophone-4 to run without installing quitter and without references to quitter in any of the files while relying solely on the shutdown intent and ran t, tg, tpn, svg on my samsung gs3 3 (android 4.0), nexus 5 (android 4.4) and pixel (android 7.1). Only t on pixel failed. It had the same signature as before:
[@ mozilla::jni::detail::NativePtr<mozilla::widget::GeckoEditableSupport, 2>::Clear<mozilla::jni::LocalRef<mozilla::java::GeckoEditableChild> >]
This contrasts to the failures when I was running both with quitter and the shutdown intent where we saw fairly consistent failures on the pixels, the 6ps and an occasional nexus 5 for both t and svg.
Note the crash appears to happen in the most recent run just after the profile is created, fennec starts to load initialize_profile.html and an am force-stop is issued to the app within a couple of seconds. Subsequent invocations to load the page for the first and second visits do not appear to crash.
You can see this run on staging at:
https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=e4e7f0468271b6bdc8c703d1d52c9de6e2325be3&filter-searchStr=autophone
The samsung gs3 was run locally with DEBUG level logging but unfortunately the nexus 5 and pixels runs were done on autophone-4 with INFO level logging.
I've started a new run with only t, tpn, svg where autophone-4 has DEBUG level logging so we can see more details. Already, it has shown the same crash on tpn on both of the pixels. You can see this run at:
https://treeherder.allizom.org/#/jobs?repo=mozilla-inbound&revision=05e705204d3bced8e56631b0ef6fe2abe65710c7&filter-searchStr=autophone
One possibility is to increase the time I wait before issuing the stop_application after loading the initialize_profile.html page just after creating the profile. Once this current run completes, I will test it.
| Reporter | ||
Comment 14•8 years ago
|
||
It appears from the latest run that am stop isn't the cause. Looking at https://treeherder.allizom.org/logviewer.html#?job_id=109545184&repo=mozilla-inbound&lineNumber=1516 we see that initialize_profile is loaded, then we call stop_application which:
1. gives the home screen focus
2. issues the shutdown intent
3. waits for up to 10 seconds for the app to close before issuing am stop.
In this particular case, we waited 8 seconds before the app shut itself down, then found the dmp files. I'll adjust the timing of issuing the shutdown intent to see if sending it too quickly is the cause.
| Reporter | ||
Comment 15•8 years ago
|
||
I inserted a 10 second wait after the loading of initialize_profile which eliminated the crash:
https://hg.mozilla.org/integration/mozilla-inbound/rev/9ea7e992bf5224d216e9a2546b6dd7be70cbe419
esawin: I'm reluctant to add 160 seconds to each test run and am leery of shortening the wait time which might result in intermittent crashes. Is there something we can do in Fennec to handle the shutdown intent gracefully even if it is received quickly? If not, what is a good way for me to determine when I can safely issue the shutdown intent after loading initialize_profile?
Flags: needinfo?(esawin)
| Reporter | ||
Comment 16•8 years ago
|
||
fyi, at jim's suggestion I did a try run with the patches from Bug 1382335:
https://treeherder.allizom.org/#/jobs?repo=try&revision=cd1899bd5674c5a0ffa76a42f33f8b193e6959b3&exclusion_profile=false
several crashes at [@ mozilla::jni::detail::ProxyNativeCall<nsWindow::LayerViewSupport, mozilla::java::LayerView::Compositor, false, false, int, int, int, int>::operator()] with some at [@ MessageLoop::PostTask_Helper]
Comment 17•8 years ago
|
||
bc: could you please try another run on m-c? I think some patches (including patch 3 from bug 1382335) have landed in the meanwhile which should fix the race we're experiencing here.
Flags: needinfo?(nchen)
Flags: needinfo?(esawin)
| Reporter | ||
Comment 19•8 years ago
|
||
tested smoketest, talos svg/tpn, s1s2geckoview, s1s2:
https://treeherder.allizom.org/#/jobs?repo=mozilla-central&revision=a3e675a3b10a0ea289c301bedc31866f3daf7875&exclusion_profile=false&group_state=expanded&filter-searchStr=autophone
smoketest, talos svg/tpn, s1s2geckoview were all happy, bug...
s1s2 intermittently crashed.
nexus-5:
crashed [@ nsWindow::RedrawAll]
crashed [@ nsWindow::LayerViewSupport::SyncResumeResizeCompositor(mozilla::jni::LocalRef<mozilla::java::LayerView::Compositor> const&, int, int, mozilla::jni::Ref<mozilla::jni::Object, _jobject*> const&)::OnResumedEvent::Run() + 0x3b]
crashed [@ nsWindow::CreateLayerManager]
pixel:
crashed [@ nsWindow::CreateLayerManager]
Flags: needinfo?(bob)
Updated•8 years ago
|
Whiteboard: [stockwell needswork] → [stockwell fixed:backout]
Priority: -- → P3
Comment 20•7 years ago
|
||
Re-triaging per https://bugzilla.mozilla.org/show_bug.cgi?id=1473195
Needinfo :susheel if you think this bug should be re-triaged.
Priority: P3 → P5
Comment 21•6 years ago
|
||
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
| Assignee | ||
Updated•5 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•