Closed
Bug 1245477
Opened 9 years ago
Closed 8 years ago
Crashes in lul::DerefTUW
Categories
(Core :: Gecko Profiler, defect)
Tracking
()
RESOLVED
FIXED
mozilla54
People
(Reporter: nical, Assigned: jseward)
References
(Blocks 1 open bug)
Details
(Keywords: crash)
Crash Data
Attachments
(1 file, 1 obsolete file)
2.98 KB,
patch
|
froydnj
:
review+
gchang
:
approval-mozilla-aurora+
jcristau
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
https://crash-stats.mozilla.com/report/list?product=Firefox&signature=lul%3A%3ADerefTUW#tab-reports
I've had several crashes in the last few days with the same signature on nightly (Linux). The crash happens in unwinding related code, which sounds like it might be related to the gecko profiler addon, the latter being installed, indeed.
I don't have clear steps to reproduce. Once I was watching stuff on air mozilla, a few times I was going through my emails (fastmail.com), so no clear correlation there, unless its a tab in the background (fastmail being always open as an app tab).
Comment 1•9 years ago
|
||
Mozilla/5.0 (X11; Linux x86_64; rv:47.0) Gecko/20100101 Firefox/47.0
Build ID: 20160211030242
Unfortunately, I couldn't install the gecko profiler addon on the latest Nightly on my Ubuntu 14.04 64 bit machine, I'm getting the message that the add-on appears to be corrupt. Tried to install it from: https://github.com/bgirard/Gecko-Profiler-Addon
Moving this to Gecko Profiler for further investigation.
Crash Signature: [@lul::DerefTUW]
Component: Untriaged → Gecko Profiler
Keywords: crash
OS: Unspecified → Linux
Hardware: Unspecified → x86_64
Comment 2•9 years ago
|
||
Got this too.
(and disabled Gecko profiler for now)
Comment 3•8 years ago
|
||
(In reply to Simona B [:simonab ] from comment #1)
> Unfortunately, I couldn't install the gecko profiler addon on the latest
> Nightly [...] I'm getting the message that the add-on appears to be corrupt
For the record, the place to get the modern profiler addon is now: https://new.cleopatra.io/
(That has a link to the addon download, which is hosted in mstange's github-space.)
(I just hit this crash with the current version of the addon installed & latest Nightly, too -- so this still happening.)
Assignee | ||
Comment 4•8 years ago
|
||
I tried the profiler on nightly, but I can't get it to crash, after
quite some minutes of surfing and 7 tabs open. Does anyone have a
repeatable way to make it crash?
Comment 5•8 years ago
|
||
I just got this too. Was looking at a Google map. I'm guessing that it's mostly a matter of what code happens to be running when you sample the stack, not so much what you're doing, but I don't know. Sadly, the stack above doNativeBacktrace looks totally corrupt.
All of the crash addresses are 0xfffff something. It appears to be dereferencing
*(uintptr_t)(aStackImg->mContents + aAddr.Value() - aStackImg->mStartAvma))
Is it possible for aStackImg->mContents to be NULL? That would match the crash addrs.
![]() |
||
Comment 6•8 years ago
|
||
Just hit this on Nightly.
![]() |
||
Comment 7•8 years ago
|
||
(In reply to Steve Fink [:sfink] [:s:] from comment #5)
> I just got this too. Was looking at a Google map. I'm guessing that it's
> mostly a matter of what code happens to be running when you sample the
> stack, not so much what you're doing, but I don't know. Sadly, the stack
> above doNativeBacktrace looks totally corrupt.
>
> All of the crash addresses are 0xfffff something. It appears to be
> dereferencing
>
> *(uintptr_t)(aStackImg->mContents + aAddr.Value() - aStackImg->mStartAvma))
>
> Is it possible for aStackImg->mContents to be NULL? That would match the
> crash addrs.
StackImage::mContents is an array:
https://dxr.mozilla.org/mozilla-central/source/tools/profiler/lul/LulMain.h#173
so mContents being NULL seems unlikely. IIUC, that expression is trying to reference something inside of mContents, and it's being bounds-checked prior to the dereference:
https://hg.mozilla.org/mozilla-central/annotate/9c06e744b1be/tools/profiler/lul/LulMain.cpp#l1014
And the crashing address from https://crash-stats.mozilla.com/report/index/ae5f5753-0778-4057-9d52-d0f4d2170131 is 0xffffffffffff79a7, which is just a bit over the N_STACK_BYTES size of mContents. So it looks like the bounds checks here are not quite right?
Flags: needinfo?(jseward)
Comment 8•8 years ago
|
||
Heh. Thanks for reading the actual code. It raises more mysteries, though.
if (aAddr.Value() < aStackImg->mStartAvma) {
return TaggedUWord();
}
return TaggedUWord(*(uintptr_t*)(aStackImg->mContents + aAddr.Value() - aStackImg->mStartAvma));
From the first, we get Value >= mStartAvma, so Value - mStartAvma is positive. How the heck are we getting a negative address with mContents + positive, then? Value and mStartAvma are both uintptr_t, so I don't see a weird overflow issue. I guess Value and/or mStartAvma are bogus, so that mContents + Value > 2**64 and mStartAvma is smallish?
Anyway, I think I'd better leave this for jseward.
Assignee | ||
Comment 9•8 years ago
|
||
Nathan and I looked at the bounds checks in DerefTUW in some detail.
It seems to us that the checks are correct -- no off-by-ones -- but
they are insufficient. In particular they fail to guard against the
case where |aAddr| itself wraps around. For example, consider aAddr
being 0xFFFF...FFFF (right at the end of the address space). It is
not (quite) beyond the realms of possibility that the unwinder picks
up such a value from the stack (stack end marker?) or register, and
tries to dereference it.
Then what we have is
// lower limit check
if 0xFFFF....FFFF <unsigned (aAddr.Value() < aStackImg->mStartAvma
reject
So that doesn't kick in, and
// upper limit check
if 0xFFFF....FFFF + sizeof(uintptr_t) >unsigned ...
reject
we are testing 7 > ..., which is very unlikely, so that doesn't
kick in either. So we've accepted an obviously-invalid address.
Unfortunately I still can't reproduce this. I can get an immediate
segfault though by placing this at the top of the function
aAddr = TaggedUWord(~ (uintptr_t)1);
which validates the above theory a bit.
Flags: needinfo?(jseward)
Assignee | ||
Comment 10•8 years ago
|
||
Guards against |aAddr| wraparounds.
How shall we proceed here, given that I can't check that it
fixes the reported problem?
Comment 11•8 years ago
|
||
Comment on attachment 8832508 [details] [diff] [review]
bug1245477-1.cset
Review of attachment 8832508 [details] [diff] [review]:
-----------------------------------------------------------------
Could you use a CheckedInt<uintptr_t> here?
Comment 12•8 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #10)
> How shall we proceed here, given that I can't check that it
> fixes the reported problem?
I'd say land it, because we agree that the problem can theoretically happen; watch crash stats, and if the crash still occurs, file a new bug to do a new investigation.
Assignee | ||
Comment 13•8 years ago
|
||
Now using CheckedInt<uintptr_t> as suggested in comment #11.
Attachment #8832508 -
Attachment is obsolete: true
Assignee | ||
Updated•8 years ago
|
Attachment #8832889 -
Flags: review?(nfroyd)
![]() |
||
Comment 14•8 years ago
|
||
Comment on attachment 8832889 [details] [diff] [review]
bug1245477-2.cset
Review of attachment 8832889 [details] [diff] [review]:
-----------------------------------------------------------------
FWIW, examining the register values in a crash report and correlating them with the assembly indicates that receiving 0xfff...ff is exactly what's happening.
::: tools/profiler/lul/LulMain.cpp
@@ +1029,5 @@
> + // avoid overflow. In particular if |aAddr| is 0xFFF...FFF or the
> + // 3/7 values below that, then we will get overflow. See bug #1245477.
> + CheckedInt<uintptr_t> highest_requested_plus_one
> + = CheckedInt<uintptr_t>(aAddr.Value())
> + + CheckedInt<uintptr_t>(sizeof(uintptr_t));
FWIW, you ought to be able to say:
... = CheckedInt<uintptr_t>(aAddr.Value()) + sizeof(uintptr_t);
and the expected coercions will happen.
A local:
typedef CheckedInt<uintptr_t> CheckedWord;
and using that might make this a little less verbose.
Attachment #8832889 -
Flags: review?(nfroyd) → review+
Comment 15•8 years ago
|
||
Pushed by jseward@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4f8613cea566
Crashes in lul::DerefTUW. r=nfroyd.
Comment 16•8 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 8 years ago
status-firefox54:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
Comment 17•8 years ago
|
||
Please request Aurora/Beta approval on this when you're comfortable doing so.
Assignee: nobody → jseward
status-firefox52:
--- → affected
status-firefox53:
--- → affected
Flags: needinfo?(jseward)
Assignee | ||
Comment 18•8 years ago
|
||
Comment on attachment 8832889 [details] [diff] [review]
bug1245477-2.cset
Approval Request Comment
[Feature/Bug causing the regression]:
Range checks in lul::DerefTUW are inadequate, causing invalid
acceptance of addresses that wrap around the end of the address space.
[User impact if declined]:
For users of the Gecko profiler on Linux-{x86_64,i686} -- occasional
crashes when profiling. For all other scenarios -- none.
[Is this code covered by automated tests?]:
No
[Has the fix been verified in Nightly?]:
Yes
[Needs manual test from QE? If yes, steps to reproduce]:
I never managed to reproduce it with the profiler. The fix
was generated by inspecting the code and crash dump carefully.
[List of other uplifts needed for the feature/fix]:
None
[Is the change risky?]:
No
[Why is the change risky/not risky?]:
Very tiny fix, Linux only, just adds extra validity checks for this function.
[String changes made/needed]:
None
Flags: needinfo?(jseward)
Attachment #8832889 -
Flags: approval-mozilla-beta?
Attachment #8832889 -
Flags: approval-mozilla-aurora?
Comment 19•8 years ago
|
||
Attachment #8832889 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 20•8 years ago
|
||
bugherder uplift |
Comment 21•8 years ago
|
||
Comment on attachment 8832889 [details] [diff] [review]
bug1245477-2.cset
address range check fix in profiler, beta52+
Attachment #8832889 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 22•8 years ago
|
||
bugherder uplift |
Comment 23•8 years ago
|
||
bugherder uplift |
status-firefox-esr52:
--- → fixed
You need to log in
before you can comment on or make changes to this bug.
Description
•