arm64 nightly: tab crash on github documentation/wiki pages

RESOLVED FIXED in Firefox 66

Status

()

defect
RESOLVED FIXED
5 months ago
4 months ago

People

(Reporter: leif.lindholm, Assigned: mgaudet, NeedInfo)

Tracking

(Blocks 1 bug, {crash, regression})

67 Branch
mozilla67
ARM64
Windows 10
Points:
---
Dependency tree / graph
Bug Flags:
qe-verify +

Firefox Tracking Flags

(firefox-esr60 unaffected, firefox65 unaffected, firefox66 fixed, firefox67 fixed)

Details

(Whiteboard: [Feb 14: awaiting crash report to move out of Core:General], )

Attachments

(1 attachment)

Reporter

Description

5 months ago

User Agent: Mozilla/5.0 (Windows NT 10.0; rv:67.0) Gecko/20100101 Firefox/67.0

Steps to reproduce:

On arm64 windows nightly, loaded https://help.github.com/articles/associating-an-email-with-your-gpg-key/.

Actual results:

Loads seemingy fine, then after 5-10 seconds, tab crashes. Clicking "Restore this tab" just repeats the process.

I have seen this on multiple documentation/wiki pages on github - not just the github documentation, but project documentation as well.

First noticed a day or two ago. Definitely affects 67.0a1 (2019-01-30) and (2019-01-31).

Tested both on Lenovo Yoga C630 and HP Envy X2. Both running windows version 1803 on the "windows insider" slow track.

Expected results:

Tab not crashed.

Comment 1

5 months ago

Not reproducible for me.
Tested on following builds. Given link is loaded and no crashes observed.
Build ID 20190130215539
User Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0

Build ID 20190205023948
User Agent Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:67.0) Gecko/20100101 Firefox/67.0

Component: Untriaged → General

Can you provide a crash report ID from about:crashes? Thanks.

For the future, you can file the bug from the crash report website and it will pre-fill fields in Bugzilla.

Reporter

Comment 3

5 months ago

Hmm.

about:crashes says only "No crash reports have been submitted."

"Tab crash reporter" contains only

Gah. Your tab just crashed.
We can help!

Choose Restore This Tab to reload the page.

And "Close tab", "Restore this tab" buttons.

Sorry, I don't know the URL to the crash report website, and google isn't helping me.

Hmm… do you have crash reporting turned on? If you open about:preferences and search for crash, do you have

[x] "Allow Nightly to send technical and interaction data to Mozilla"

checked?

Flags: needinfo?(leif.lindholm)
Reporter

Comment 5

4 months ago

(Apologies for delay - the last few nights' arm64 Windows builds have been completely non-functional, but now working again.)

Yes, "Allow Nightly to send technical and interaction data to Mozilla" is checked.

Ted, any ideas for why there are no crash reports for this user?

Flags: needinfo?(ted)

There are some known issues with crash reporting on aarch64-windows currently, but I'm not actively involved in any of that work.

Status: UNCONFIRMED → RESOLVED
Closed: 4 months ago
Flags: needinfo?(ted)
Resolution: --- → DUPLICATE
Duplicate of bug: 1526276
Reporter

Comment 8

4 months ago

Umm, this bug wasn't about crashreporting - that was just something I was asked to do to provide more input to the actual problem. So I'm not sure RESOLVED:DUPLICATE of the crash reporting bug is the appropriate state?

Flags: needinfo?(leif.lindholm)

Ok, moving to Core::General then since this it's very unlikely this issue is in browser/ code.

If you're able to use a debugger (lldb or gdb) then you could get the crash stack. Otherwise we'll have to wait for bug 1526276 or until someone else can reproduce this.

Status: RESOLVED → REOPENED
Depends on: 1526276
Ever confirmed: true
Product: Firefox → Core
Resolution: DUPLICATE → ---

Andrew, since you have Lenovo Yoga C630, can you please try to repro?

Flags: needinfo?(overholt)

Updated

4 months ago
Component: General → Tabbed Browser
Product: Core → Firefox

(The crash reporting stuff is being worked on in bug 1526276.)

This reliably reproduces for me but of course due to bug 1526276 we can't tell why so back to Core:General we go :)

Note that Leif reported this before Ion was an option and I have Ion turned on and still crash after a few seconds so it's not related to Ion.

Component: Tabbed Browser → General
Flags: needinfo?(overholt)
OS: Unspecified → Windows 10
Product: Firefox → Core
Hardware: Unspecified → ARM64
Whiteboard: [Feb 14: awaiting crash report to move out of Core:General]

This reliably reproduces for me but of course due to bug 1526276 we can't
tell why so back to Core:General we go :)

Y'know... WinDbg is your friend :-)

This is a crash in JITted code. The memory at xip1 (aka x17) is inaccessible. I can't get a stack because we don't generate proper unwind info. I don't suppose anyone from JS could glance at this disassembly and magically know where it came from?

000001dc`c7ac1590 9100039f mov         sp,x28
000001dc`c7ac1594 cb30ef9c sub         x28,x28,xip0 sxtx #3
000001dc`c7ac1598 9278df9c and         x28,x28,#-0x100
000001dc`c7ac159c 9100039f mov         sp,x28
000001dc`c7ac15a0 aa1c03f1 mov         xip1,x28
000001dc`c7ac15a4 ea01003f tst         x1,x1
000001dc`c7ac15a8 540000a0 beq         000001dc`c7ac15bc
000001dc`c7ac15ac f8408458 ldr         x24,[x2],#8
000001dc`c7ac15b0 f8008638 str         x24,[xip1],#8  <<<<<<<<<<<<<< crash here
000001dc`c7ac15b4 f1000610 subs        xip0,xip0,#1
000001dc`c7ac15b8 54ffffa1 bne         000001dc`c7ac15ac
000001dc`c7ac15bc b94000f0 ldr         wip0,[x7]
000001dc`c7ac15c0 d100439f sub         sp,x28,#0x10
000001dc`c7ac15c4 a9bf4384 stp         x4,xip0,[x28,#-0x10]!
000001dc`c7ac15c8 cb1c0273 sub         x19,x19,x28
000001dc`c7ac15cc d378de73 lsl         x19,x19,#8

Actually maybe I can take a stab at this...

xip1 was set equal to sp. They have the value of 00000033c7dea700.

That address is in a MEM_RESERVE region from 00000033c7c00000 to 00000033c7ded000.

The next block above that (starting at 00000033c7ded000) has PAGE_READWRITE|PAGE_GUARD bits, suggesting that the next block above that was our stack.

It sounds like we grew the stack in such a sudden increment that we didn't take a guard page fault?

Lars, this is sounding an awful lot like bug 1351278 comment 21 -- with a similar github repro too. Are you aware of any reason that this might remain unfixed on arm64?

Flags: needinfo?(lhansen)

000001dc`c7ac1594 cb30ef9c sub x28,x28,xip0 sxtx #3

xip0 was 8768, so we subtracted 70k from the stack all in one go. Definitely the same type of issue...

Component: General → JavaScript Engine: JIT

David, can you go back and edit comment 12 and put triple-backticks around the backtrace so that it's possible to read it properly? Presumably the backticks in the addresses from windbg is confusing markdown in a major way.

Anyway, that's compiled JS code that we're looking at (x28 used as a stack pointer gives it away). It's possible that it's a similar problem to what we had with apply in bug 1351278. But the fix there was in platform-independent code, so there's no reason to believe that that bug in particular should be biting on arm64.

Flags: needinfo?(lhansen)

(In reply to Lars T Hansen [:lth] from comment #16)

David, can you go back and edit comment 12 and put triple-backticks around the backtrace so that it's possible to read it properly? Presumably the backticks in the addresses from windbg is confusing markdown in a major way.

Hmph, it looked fine on my end, and I have no edit button. We must be using a different interface. :-)

Lars, in bug 1351278 comment 22 you mentioned a possible exception for arm64. Is that still the case? It was quite some time ago so I admit it's a long shot.

It looks like bug 1488763 missed this one case on arm64.

Blocks: 1488763

Matt, can you take this?

Flags: needinfo?(mgaudet)

Yeah, let me take a look.

Assignee: nobody → mgaudet
Flags: needinfo?(mgaudet)
Assignee

Updated

4 months ago
Blocks: 1528399
Reporter

Comment 24

4 months ago

I just got a tab crash for me on a different site, and the crash reporter finally appeared.
So I figured I would go back and generate a crash report for the aforementioned github URL.
But I can't - it's no longer crashing!

This on 67.0a1 (2019-02-20) (64-bit).

Comment 25

4 months ago
Pushed by mgaudet@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/29f06630e00e
Incrementally touch stack on arm64 r=tcampbell

Comment 26

4 months ago
bugherder
Status: REOPENED → RESOLVED
Closed: 4 months ago4 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67

I'm guessing we'll want this on 66 as well. Please nominate for Beta approval when you get a chance.

Flags: needinfo?(mgaudet)

Comment on attachment 9044291 [details]
Bug 1524419: Incrementally touch stack on arm64 r?tcampbell

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: None
  • User impact if declined: Potential crashes on aarch64 windows
  • Is this code covered by automated tests?: Unknown
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Changes stack touching to match the windows ABI, so should only improve things.
  • String changes made/needed: None.
Flags: needinfo?(mgaudet)
Attachment #9044291 - Flags: approval-mozilla-beta?

Matthew, touching the stack is only needed on Windows, right? Can we use an #ifdef to disable it for the other platforms?

Flags: needinfo?(mgaudet)

I thought touching the stack on all platforms was deliberate, no? (bug 1488763 comment 2)

Touching stack is not /strictly/ needed on other platforms, but it doesn't seem worth having divergent behavior over. Linux has still experienced things like stack-clash and Bug 909094.

Flags: needinfo?(mgaudet)

Comment on attachment 9044291 [details]
Bug 1524419: Incrementally touch stack on arm64 r?tcampbell

Should help avoid a crash, help for testing arm64 on beta.
OK for uplift to beta 12.

Attachment #9044291 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Whiteboard: [Feb 14: awaiting crash report to move out of Core:General] → [Feb 14: awaiting crash report to move out of Core:General][qa-triaged]

Comment 34

4 months ago

I couldn't reproduce the issue to check if it's fixed on Firefox Nightly 67.0a1 (2019-01-30) and (2019-01-31) aarch64 builds and non aarch builds on Lenovo Yoga C630-13Q50 with Windows 10.

Leif Lindholm could you please check if the issue is fixed on the latest Firefox nightly and on Firefox 66.0b12?
Thanks.

Flags: needinfo?(leif.lindholm)
QA Whiteboard: [qa-triaged]
Whiteboard: [Feb 14: awaiting crash report to move out of Core:General][qa-triaged] → [Feb 14: awaiting crash report to move out of Core:General]
You need to log in before you can comment on or make changes to this bug.