1671132 - Profile initialization | application crashed [@ __llvm_profile_instrument_target + 0x53]

Reporter

Description

•

4 years ago

treeherder

Filed by: sgiesecke [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=318581365&repo=try
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Kb_S2E05RcmjhT8UGBjL1g/runs/0/artifacts/public/logs/live_backing.log

With some local changes, I get crashes in the generate-profile-macosx64-shippable/opt jobs. Other platforms do not seem to be affected.

Could this be related to the issue mentioned in https://github.com/llvm/llvm-project-staging/commit/27650ec5541cd604a5027ad63895e0badfd35efe? Do we have that fix?```

Simon Giesecke [:sg] [he/him]

Updated

•

4 years ago

Flags: needinfo?(dmajor)

Simon Giesecke [:sg] [he/him]

Updated

•

4 years ago

Blocks: 1663924

(Away)

Comment 1

•

4 years ago

Could this be related to the issue mentioned in https://github.com/llvm/llvm-project-staging/commit/27650ec5541cd604a5027ad63895e0badfd35efe? Do we have that fix?

The code that that patch reverted didn't land until clang trunk was version 12, so I don't think that's (directly) it.

Flags: needinfo?(dmajor)

Simon Giesecke [:sg] [he/him]

Comment 2

•

4 years ago

Ok, but this looks like an issue in llvm, rather than an issue caused by the specific (dom/indexedDB) code changes in my push. Right?

(Away)

Comment 3

•

4 years ago

At flrst glance, yes.

It's unfortunate that it's on Mac. If it was Linux or Windows, I'd be able to debug it much more easily. Any chance you could paste a disassembly of __llvm_profile_instrument_target from beginning up to the point of failure?

Simon Giesecke [:sg] [he/him]

Comment 4

•

4 years ago

(In reply to :dmajor from comment #3)

At flrst glance, yes.

It's unfortunate that it's on Mac. If it was Linux or Windows, I'd be able to debug it much more easily. Any chance you could paste a disassembly of __llvm_profile_instrument_target from beginning up to the point of failure?

I don't have a Mac either. I have no idea how I would do that, unfortunately.

(Away)

Comment 5

•

4 years ago

Going by these values from the treeherder log

[task 2020-10-14T10:50:39.180Z] Thread 32 (crashed)
[task 2020-10-14T10:50:39.180Z]  0  XUL!__llvm_profile_instrument_target + 0x53
...
[task 2020-10-14T10:50:39.180Z]     rip = 0x000000010e66f293
...
[task 2020-10-14T10:50:39.203Z] Loaded modules:
...
[task 2020-10-14T10:50:39.203Z] 0x10e65f000 - 0x117dc0fff  XUL  ???

Then the xul offset should be 0x10240 so I think this is it:

0000000000010240	pushq	%rbp
0000000000010241	movq	%rsp, %rbp
0000000000010244	pushq	%r15
0000000000010246	pushq	%r14
0000000000010248	pushq	%r12
000000000001024a	pushq	%rbx
000000000001024b	testq	%rsi, %rsi
000000000001024e	je	0x103ad
0000000000010254	movl	%edx, %r14d
0000000000010257	movq	%rsi, %rbx
000000000001025a	movq	%rdi, %r15
000000000001025d	movq	0x20(%rsi), %r12
0000000000010261	testq	%r12, %r12
0000000000010264	je	0x102db
0000000000010266	movl	%r14d, %r14d
0000000000010269	movq	(%r12,%r14,8), %rsi
000000000001026d	testq	%rsi, %rsi
0000000000010270	je	0x10338
0000000000010276	movq	$-0x1, %rdx
000000000001027d	xorl	%ecx, %ecx
000000000001027f	xorl	%eax, %eax
0000000000010281	nopw	%cs:(%rax,%rax)
000000000001028b	nopl	(%rax,%rax)
0000000000010290	movq	%rsi, %rbx
0000000000010293	movq	0x8(%rsi), %rsi
0000000000010297	cmpq	%r15, (%rbx)
000000000001029a	je	0x10398
00000000000102a0	cmpq	%rdx, %rsi
00000000000102a3	cmovbq	%rbx, %rax
00000000000102a7	cmovbq	%rsi, %rdx
00000000000102ab	incb	%cl
00000000000102ad	movq	0x10(%rbx), %rsi
00000000000102b1	testq	%rsi, %rsi
00000000000102b4	jne	0x10290

It looks like we are crashing reading CurVNode->Next (For context, I believe 10290-102b4 is the while loop) because CurVNode is full of e5e5.

jemalloc shouldn't be poisoning the instrumentation control blocks, of course. Any chance the push might have had some memory unsafety where a bad pointer got passed to free() and jemalloc poisoned the wrong area? Otherwise, if it's a miscompile, it could be anywhere (code, jemalloc, instrumentation, etc.) and this will be terrible to debug.

Simon Giesecke [:sg] [he/him]

Comment 6

•

4 years ago

jemalloc shouldn't be poisoning the instrumentation control blocks, of course. Any chance the push might have had some memory unsafety where a bad pointer got passed to free() and jemalloc poisoned the wrong area?

I cannot completely rule this out, but I think it's rather unlikely, given

the nature of the changes
all tests looking fine
it seems to be deterministically reproducible on OS X, and
the Linux generate-profile job is ok.
Unfortunately, the Windows generate-profile didn't run because of Bug 1670712.

Otherwise, if it's a miscompile, it could be anywhere (code, jemalloc, instrumentation, etc.) and this will be terrible to debug.

(Away)

Comment 7

•

4 years ago

It looks like the push is part of a large stack. Could you try narrowing it down to a specific changeset?

Simon Giesecke [:sg] [he/him]

Comment 8

•

4 years ago

(In reply to :dmajor from comment #7)

It looks like the push is part of a large stack. Could you try narrowing it down to a specific changeset?

Ok, will try to do that.

Flags: needinfo?(sgiesecke)

Simon Giesecke [:sg] [he/him]

Comment 9

•

4 years ago

Oh. I found this is caused by disabling optimizations on two directories: https://hg.mozilla.org/try/rev/2100b35947620404ea1f2cd78cf8641079cf977f This wasn't intended for landing, of course. Maybe that's expected? It's a bit annoying, but definitely of low priority then.

Flags: needinfo?(sgiesecke)

(Away)

Comment 10

•

4 years ago

I vaguely recall that there are some footguns around -O0 but I don't recall offhand if they are related here. If this was on a more convenient OS then I'd like to look deeper just for curiosity's sake, but in practice if it's not blocking then I probably won't be able to spend time on it.

Simon Giesecke [:sg] [he/him]

Comment 11

•

4 years ago

Fine for me, I will close this as WORKSFORME then. Thanks for looking into it so quickly, and sorry for me not noticing this part of the stack earlier.

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → WORKSFORME

Comment hidden (Intermittent Failures Robot)

Bugzilla

Quick Search

Profile initialization | application crashed [@ __llvm_profile_instrument_target + 0x53]

Categories

(Firefox Build System :: Toolchains, defect)

Tracking

(Not tracked)

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: crash)

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12