Closed Bug 1945507 Opened 1 year ago Closed 1 year ago

Some uses of MOZ_CRASH_ANNOTATE get optimized out in release builds

Categories

(Toolkit :: Crash Reporting, defect)

defect

Tracking

()

RESOLVED FIXED
137 Branch
Tracking Status
firefox137 --- fixed

People

(Reporter: yannis, Assigned: sergesanspaille)

References

Details

(Keywords: perf-alert)

Attachments

(2 files, 2 obsolete files)

:padenot mentioned that the crash reason annotation was missing while working on crashes in mozilla::AudioChunk::ChannelDataForWrite<T> (example report). The crash goes through mozilla::detail::InvalidArrayIndex_CRASH:

MFBT_API MOZ_NORETURN MOZ_COLD void mozilla::detail::InvalidArrayIndex_CRASH(
    size_t aIndex, size_t aLength) {
  MOZ_CRASH_UNSAFE_PRINTF("ElementAt(aIndex = %zu, aLength = %zu)", aIndex,
                          aLength);
}

MOZ_CRASH_UNSAFE_PRINTF resolves to MOZ_Crash(__FILE__, __LINE__, MOZ_CrashPrintf("" format, __VA_ARGS__)); where:

static MOZ_ALWAYS_INLINE_EVEN_DEBUG MOZ_COLD MOZ_NORETURN void MOZ_Crash(
    const char* aFilename, int aLine, const char* aReason) {
  MOZ_FUZZING_HANDLE_CRASH_EVENT4("MOZ_CRASH", aFilename, aLine, aReason);
#if defined(DEBUG) || defined(FUZZING)
  MOZ_ReportCrash(aReason, aFilename, aLine);
#endif
  MOZ_CRASH_ANNOTATE(aReason);
  MOZ_REALLY_CRASH(aLine);
}

Looking at the binaries for release build 134.0.2 (and possibly others) shows that the call to MOZ_Crash gets inlined and the inner call to MOZ_CRASH_ANNOTATE(aReason); gets optimized out (or is considered a no-op), both on Windows x64 and Android aarch64:

    mozglue!mozilla::detail::InvalidArrayIndex_CRASH:
sub     rsp, 28h
mov     r8, aLength (rdx)
mov     rdx, aIndex (rcx)
lea     rcx, [mozglue!... (7ffae87cfa5d)]
call    mozglue!MOZ_CrashPrintf (7ffae87c5f20)
// start of MOZ_REALLY_CRASH(0x33)
int     3
mov     ecx, 33h
call    mozglue!MOZ_NoReturn (7ffae878e797)
    mozilla::detail::InvalidArrayIndex_CRASH
str  x30, [sp, #-0x10]!
mov  x2, x1
mov  x1, x0
nop  
adr  x0, s_ElementAtString
bl   MOZ_CrashPrintf
// start of MOZ_REALLY_CRASH(0x33)
mov  x8, xzr
movz w9, #0x33
str  w9, [x8]
bl   abort

We probably want to annotate gMozCrashReason as volatile? I'm not sure if that will be enough.

Assignee: nobody → sguelton

jstutte mentioned this bug in relation to bug 1945246. They seem similar but not the same, in that in the other bug we're crashing on the assignment to gMozCrashReason instead of skipping it entirely.

See Also: → 1945246

The asm statement forces the load of the global, so that a store
happens and the reason appears as live to the compiler.

I filed a bug about the same basic problem in 2023, bug 1858670. Maybe that could be duped here or something.

See Also: → 1858670
Attachment #9463495 - Attachment description: Bug 1945507 - Use a combination of asm(..) and __builtin_trap to implement AnnotateMozCrashReason r=gsvelto → Bug 1945507 - Use an asm volatile statement to prevent optimisation in AnnotateMozCrashReason r=gsvelto
Attachment #9463495 - Attachment description: Bug 1945507 - Use an asm volatile statement to prevent optimisation in AnnotateMozCrashReason r=gsvelto → Bug 1945507 - get rid of now useless asm memory barrier in AnnotateMozCrashReason r=gsvelto
Attachment #9463495 - Attachment description: Bug 1945507 - get rid of now useless asm memory barrier in AnnotateMozCrashReason r=gsvelto → Bug 1945507 - improve asm memory barrier in AnnotateMozCrashReason r=gsvelto
Blocks: 1945934
Attachment #9463495 - Attachment description: Bug 1945507 - improve asm memory barrier in AnnotateMozCrashReason r=gsvelto → Bug 1945507 - improve asm memory barrier in AnnotateMozCrashReason r=gsvelto!

The difference it makes is that it makes m mfbt inherit the
LIBRARY_DEFINES from mozglue, which were intended to propagate
there.

The difference it makes is that it makes m mfbt inherit the
LIBRARY_DEFINES from mozglue, which were intended to propagate
there.

Attachment #9464050 - Attachment is obsolete: true
Attachment #9466117 - Attachment is obsolete: true
Pushed by mh@glandium.org: https://hg.mozilla.org/integration/autoland/rev/9a6f61aab352 improve asm memory barrier in AnnotateMozCrashReason r=gsvelto https://hg.mozilla.org/integration/autoland/rev/4bf33527613e Use FINAL_LIBRARY to link mfbt to mozglue. r=firefox-build-system-reviewers,sergesanspaille

Mike, I checked if at that point https://searchfox.org/mozilla-central/source/mfbt/Assertions.cpp#48 we had (defined(MOZ_HAS_MOZGLUE) || defined(MOZILLA_INTERNAL_API)) and it's still false, while it should be true. Can you see why your patch didn't do the trick?

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 137 Branch

My bad, as Mike pointed out in a conversation we had, Assertions.cpp is compiled several times across the codebase, some callers don't need the extra annotation support. We seem to be interested by the one in Firefox and it's fine:

% objdump --disassemble=_ZN7mozilla6detail23InvalidArrayIndex_CRASHEmm obj-x86_64-pc-linux-gnu/dist/bin/firefox   

obj-x86_64-pc-linux-gnu/dist/bin/firefox:     file format elf64-x86-64


Disassembly of section .text:

0000000000051422 <_ZN7mozilla6detail23InvalidArrayIndex_CRASHEmm>:
   51422:	55                   	push   %rbp
   51423:	48 89 e5             	mov    %rsp,%rbp
   51426:	53                   	push   %rbx
   51427:	50                   	push   %rax
   51428:	48 89 f2             	mov    %rsi,%rdx
   5142b:	48 89 fe             	mov    %rdi,%rsi
   5142e:	48 8d 3d 1b e3 fb ff 	lea    -0x41ce5(%rip),%rdi        # f750 <_ZZN7mozilla3dmd19ToIdStringConverter6Base32EjE6digits+0x220>
   51435:	31 db                	xor    %ebx,%ebx
   51437:	31 c0                	xor    %eax,%eax
   51439:	e8 72 fe ff ff       	call   512b0 <MOZ_CrashPrintf>
   5143e:	48 8d 05 fb 77 06 00 	lea    0x677fb(%rip),%rax        # b8c40 <_ZL18sPrintfCrashReason>
   51445:	48 8d 0d e4 77 06 00 	lea    0x677e4(%rip),%rcx        # b8c30 <gMozCrashReason>
   5144c:	48 89 01             	mov    %rax,(%rcx)
   5144f:	b8 33 00 00 00       	mov    $0x33,%eax
   51454:	48 89 03             	mov    %rax,(%rbx)
   51457:	e8 74 91 ff ff       	call   4a5d0 <abort>

(In reply to Sandor Molnar[:smolnar] from comment #9)

https://hg.mozilla.org/mozilla-central/rev/9a6f61aab352
https://hg.mozilla.org/mozilla-central/rev/4bf33527613e

Perfherder has detected a browsertime performance change from push 4bf33527613eb08cac37f891c362ecba9e15c718.

Improvements:

Ratio Test Platform Options Absolute values (old vs new) Performance Profiles
7% speedometer3 TodoMVC-jQuery/CompletingAllItems/total linux1804-64-nightlyasrelease-qr fission webrender 167.84 -> 156.54 Before/After
5% speedometer3 TodoMVC-jQuery/CompletingAllItems/total linux1804-64-nightlyasrelease-qr fission webrender 165.00 -> 156.50
4% speedometer3 TodoMVC-JavaScript-ES6-Webpack-Complex-DOM/Adding100Items/Sync linux1804-64-nightlyasrelease-qr fission webrender 50.62 -> 48.37 Before/After
4% speedometer jQuery-TodoMVC linux1804-64-nightlyasrelease-qr fission webrender 351.24 -> 336.60
4% speedometer3 TodoMVC-JavaScript-ES6-Webpack-Complex-DOM/Adding100Items/total linux1804-64-nightlyasrelease-qr fission webrender 57.05 -> 54.70 Before/After
... ... ... ... ... ...
3% speedometer3 TodoMVC-JavaScript-ES5/total linux1804-64-nightlyasrelease-qr fission webrender 104.94 -> 102.11 Before/After

Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests.

If you need the profiling jobs you can trigger them yourself from treeherder job view or ask a sheriff to do that for you.

You can run these tests on try with ./mach try perf --alert 44075

For more information on performance sheriffing please see our FAQ.

Keywords: perf-alert

How did these patches improve some subtests on SP3?
(https://treeherder.mozilla.org/perfherder/alerts?id=44075&hideDwnToInv=0 also looks to be becuase of this bug, but still being confirmed)

I've observed that this patch changes the inlining decision for several functions: some functions are now inlined, some are no longer etc. As a result the final binary is actually smaller (but looks like its also slower)...

The alerts say these were improvements. And is it possible to deliberately do more of what this bug did accidentally/incidentally?

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: