Crashes in lul::DerefTUW

RESOLVED FIXED in Firefox 52

Status

()

Core
Gecko Profiler
RESOLVED FIXED
2 years ago
9 months ago

People

(Reporter: nical, Assigned: jseward)

Tracking

(Blocks: 1 bug, {crash})

unspecified
mozilla54
x86_64
Linux
crash
Points:
---

Firefox Tracking Flags

(firefox52 fixed, firefox-esr52 fixed, firefox53 fixed, firefox54 fixed)

Details

(crash signature)

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

2 years ago
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=lul%3A%3ADerefTUW#tab-reports

I've had several crashes in the last few days with the same signature on nightly (Linux). The crash happens in unwinding related code, which sounds like it might be related to the gecko profiler addon, the latter being installed, indeed.

I don't have clear steps to reproduce. Once I was watching stuff on air mozilla, a few times I was going through my emails (fastmail.com), so no clear correlation there, unless its a tab in the background (fastmail being always open as an app tab).
Mozilla/5.0 (X11; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0
Build ID: 20160211030242

Unfortunately, I couldn't install the gecko profiler addon on the latest Nightly on my Ubuntu 14.04 64 bit machine, I'm getting the message that the add-on appears to be corrupt. Tried to install it from: https://github.com/bgirard/Gecko-Profiler-Addon

Moving this to Gecko Profiler for further investigation.
Crash Signature: [@lul::DerefTUW]
Component: Untriaged → Gecko Profiler
Keywords: crash
OS: Unspecified → Linux
Hardware: Unspecified → x86_64

Comment 2

2 years ago
Got this too.
(and disabled Gecko profiler for now)
(In reply to Simona B [:simonab ] from comment #1)
> Unfortunately, I couldn't install the gecko profiler addon on the latest
> Nightly [...] I'm getting the message that the add-on appears to be corrupt

For the record, the place to get the modern profiler addon is now: https://new.cleopatra.io/
(That has a link to the addon download, which is hosted in mstange's github-space.)

(I just hit this crash with the current version of the addon installed & latest Nightly, too -- so this still happening.)
(Assignee)

Comment 4

11 months ago
I tried the profiler on nightly, but I can't get it to crash, after
quite some minutes of surfing and 7 tabs open.  Does anyone have a
repeatable way to make it crash?
I just got this too. Was looking at a Google map. I'm guessing that it's mostly a matter of what code happens to be running when you sample the stack, not so much what you're doing, but I don't know. Sadly, the stack above doNativeBacktrace looks totally corrupt.

All of the crash addresses are 0xfffff something. It appears to be dereferencing 

  *(uintptr_t)(aStackImg->mContents + aAddr.Value() - aStackImg->mStartAvma))

Is it possible for aStackImg->mContents to be NULL? That would match the crash addrs.

Updated

11 months ago
Blocks: 1329181
Just hit this on Nightly.
(In reply to Steve Fink [:sfink] [:s:] from comment #5)
> I just got this too. Was looking at a Google map. I'm guessing that it's
> mostly a matter of what code happens to be running when you sample the
> stack, not so much what you're doing, but I don't know. Sadly, the stack
> above doNativeBacktrace looks totally corrupt.
> 
> All of the crash addresses are 0xfffff something. It appears to be
> dereferencing 
> 
>   *(uintptr_t)(aStackImg->mContents + aAddr.Value() - aStackImg->mStartAvma))
> 
> Is it possible for aStackImg->mContents to be NULL? That would match the
> crash addrs.

StackImage::mContents is an array:

https://dxr.mozilla.org/mozilla-central/source/tools/profiler/lul/LulMain.h#173

so mContents being NULL seems unlikely.  IIUC, that expression is trying to reference something inside of mContents, and it's being bounds-checked prior to the dereference:

https://hg.mozilla.org/mozilla-central/annotate/9c06e744b1be/tools/profiler/lul/LulMain.cpp#l1014

And the crashing address from https://crash-stats.mozilla.com/report/index/ae5f5753-0778-4057-9d52-d0f4d2170131 is 0xffffffffffff79a7, which is just a bit over the N_STACK_BYTES size of mContents.  So it looks like the bounds checks here are not quite right?
Flags: needinfo?(jseward)
Heh. Thanks for reading the actual code. It raises more mysteries, though.

  if (aAddr.Value() < aStackImg->mStartAvma) {
    return TaggedUWord();
  }
  return TaggedUWord(*(uintptr_t*)(aStackImg->mContents + aAddr.Value() - aStackImg->mStartAvma));

From the first, we get Value >= mStartAvma, so Value - mStartAvma is positive. How the heck are we getting a negative address with mContents + positive, then? Value and mStartAvma are both uintptr_t, so I don't see a weird overflow issue. I guess Value and/or mStartAvma are bogus, so that mContents + Value > 2**64 and mStartAvma is smallish?

Anyway, I think I'd better leave this for jseward.
(Assignee)

Comment 9

10 months ago
Nathan and I looked at the bounds checks in DerefTUW in some detail.
It seems to us that the checks are correct -- no off-by-ones -- but
they are insufficient.  In particular they fail to guard against the
case where |aAddr| itself wraps around.  For example, consider aAddr
being 0xFFFF...FFFF (right at the end of the address space).  It is
not (quite) beyond the realms of possibility that the unwinder picks
up such a value from the stack (stack end marker?) or register, and
tries to dereference it.

Then what we have is

  // lower limit check
  if 0xFFFF....FFFF  <unsigned  (aAddr.Value() < aStackImg->mStartAvma
     reject

So that doesn't kick in, and

  // upper limit check
  if 0xFFFF....FFFF + sizeof(uintptr_t) >unsigned ...
     reject

we are testing 7 > ..., which is very unlikely, so that doesn't
kick in either.  So we've accepted an obviously-invalid address.

Unfortunately I still can't reproduce this.  I can get an immediate
segfault though by placing this at the top of the function

  aAddr = TaggedUWord(~ (uintptr_t)1);

which validates the above theory a bit.
Flags: needinfo?(jseward)
(Assignee)

Comment 10

10 months ago
Created attachment 8832508 [details] [diff] [review]
bug1245477-1.cset

Guards against |aAddr| wraparounds.

How shall we proceed here, given that I can't check that it
fixes the reported problem?
Comment on attachment 8832508 [details] [diff] [review]
bug1245477-1.cset

Review of attachment 8832508 [details] [diff] [review]:
-----------------------------------------------------------------

Could you use a CheckedInt<uintptr_t> here?
(In reply to Julian Seward [:jseward] from comment #10)
> How shall we proceed here, given that I can't check that it
> fixes the reported problem?

I'd say land it, because we agree that the problem can theoretically happen; watch crash stats, and if the crash still occurs, file a new bug to do a new investigation.
(Assignee)

Comment 13

10 months ago
Created attachment 8832889 [details] [diff] [review]
bug1245477-2.cset

Now using CheckedInt<uintptr_t> as suggested in comment #11.
Attachment #8832508 - Attachment is obsolete: true
(Assignee)

Updated

10 months ago
Attachment #8832889 - Flags: review?(nfroyd)
Comment on attachment 8832889 [details] [diff] [review]
bug1245477-2.cset

Review of attachment 8832889 [details] [diff] [review]:
-----------------------------------------------------------------

FWIW, examining the register values in a crash report and correlating them with the assembly indicates that receiving 0xfff...ff is exactly what's happening.

::: tools/profiler/lul/LulMain.cpp
@@ +1029,5 @@
> +  // avoid overflow.  In particular if |aAddr| is 0xFFF...FFF or the
> +  // 3/7 values below that, then we will get overflow.  See bug #1245477.
> +  CheckedInt<uintptr_t> highest_requested_plus_one
> +    = CheckedInt<uintptr_t>(aAddr.Value())
> +      + CheckedInt<uintptr_t>(sizeof(uintptr_t));

FWIW, you ought to be able to say:

... = CheckedInt<uintptr_t>(aAddr.Value()) + sizeof(uintptr_t);

and the expected coercions will happen.

A local:

typedef CheckedInt<uintptr_t> CheckedWord;

and using that might make this a little less verbose.
Attachment #8832889 - Flags: review?(nfroyd) → review+

Comment 15

10 months ago
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4f8613cea566
Crashes in lul::DerefTUW.  r=nfroyd.

Comment 16

9 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/4f8613cea566
Status: NEW → RESOLVED
Last Resolved: 9 months ago
status-firefox54: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Please request Aurora/Beta approval on this when you're comfortable doing so.
Assignee: nobody → jseward
status-firefox52: --- → affected
status-firefox53: --- → affected
Flags: needinfo?(jseward)
(Assignee)

Comment 18

9 months ago
Comment on attachment 8832889 [details] [diff] [review]
bug1245477-2.cset

Approval Request Comment

[Feature/Bug causing the regression]:
Range checks in lul::DerefTUW are inadequate, causing invalid
acceptance of addresses that wrap around the end of the address space.

[User impact if declined]:
For users of the Gecko profiler on Linux-{x86_64,i686} -- occasional
crashes when profiling.  For all other scenarios -- none.

[Is this code covered by automated tests?]:
No

[Has the fix been verified in Nightly?]:
Yes

[Needs manual test from QE? If yes, steps to reproduce]: 
I never managed to reproduce it with the profiler.  The fix
was generated by inspecting the code and crash dump carefully.

[List of other uplifts needed for the feature/fix]:
None

[Is the change risky?]:
No

[Why is the change risky/not risky?]:
Very tiny fix, Linux only, just adds extra validity checks for this function.

[String changes made/needed]:
None
Flags: needinfo?(jseward)
Attachment #8832889 - Flags: approval-mozilla-beta?
Attachment #8832889 - Flags: approval-mozilla-aurora?
Comment on attachment 8832889 [details] [diff] [review]
bug1245477-2.cset

Fix a crash. Aurora53+.
Attachment #8832889 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Comment 20

9 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-aurora/rev/1b6df84a2353
status-firefox53: affected → fixed
Comment on attachment 8832889 [details] [diff] [review]
bug1245477-2.cset

address range check fix in profiler, beta52+
Attachment #8832889 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment 22

9 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-beta/rev/446c05b65b97
status-firefox52: affected → fixed

Comment 23

9 months ago
bugherderuplift
https://hg.mozilla.org/releases/mozilla-esr52/rev/446c05b65b97
status-firefox-esr52: --- → fixed
You need to log in before you can comment on or make changes to this bug.