Closed Bug 1431868 Opened 6 years ago Closed 6 years ago

UBSan: -49.9797 is outside the range of representable values of type 'unsigned int' in include/mozilla/Telemetry.h:190

Categories

(Toolkit :: Telemetry, defect, P2)

59 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox59 --- wontfix
firefox60 --- fixed

People

(Reporter: tsmith, Assigned: chutten)

References

Details

(Keywords: csectype-undefined)

Attachments

(2 files)

This is triggered on launch when built with UBSan

changeset: 399986:6bb6f3b25f9f

/mozilla-central/objdir-ff-ubsan/dist/include/mozilla/Telemetry.h:190:47: runtime error: -49.9797 is outside the range of representable values of type 'unsigned int'
    #0 0x7fb18c49f1c8 in compute /mozilla-central/objdir-ff-ubsan/dist/include/mozilla/Telemetry.h:190:47
    #1 0x7fb18c49f1c8 in mozilla::Telemetry::AutoTimer<(mozilla::Telemetry::HistogramID)1521, (mozilla::Telemetry::TimerResolution)0>::~AutoTimer() /mozilla-central/objdir-ff-ubsan/dist/include/mozilla/Telemetry.h:227
    #2 0x7fb18c49ee09 in mozilla::Maybe<mozilla::Telemetry::AutoTimer<(mozilla::Telemetry::HistogramID)1521, (mozilla::Telemetry::TimerResolution)0> >::reset() /mozilla-central/objdir-ff-ubsan/dist/include/mozilla/Maybe.h:446:17
    #3 0x7fb18c46f3e1 in ~Maybe /mozilla-central/objdir-ff-ubsan/dist/include/mozilla/Maybe.h:100:14
    #4 0x7fb18c46f3e1 in nsThread::ProcessNextEvent(bool, bool*) /mozilla-central/xpcom/threads/nsThread.cpp:1041
    #5 0x7fb18c4a3a10 in NS_ProcessNextEvent(nsIThread*, bool) /mozilla-central/xpcom/threads/nsThreadUtils.cpp:517:10
    #6 0x7fb18d816e48 in mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) /mozilla-central/ipc/glue/MessagePump.cpp:97:21
    #7 0x7fb18d68b9e9 in RunHandler /mozilla-central/ipc/chromium/src/base/message_loop.cc:319:3
    #8 0x7fb18d68b9e9 in MessageLoop::Run() /mozilla-central/ipc/chromium/src/base/message_loop.cc:299
    #9 0x7fb193c6e615 in nsBaseAppShell::Run() /mozilla-central/widget/nsBaseAppShell.cpp:157:27
    #10 0x7fb199ccf677 in nsAppStartup::Run() /mozilla-central/toolkit/components/startup/nsAppStartup.cpp:288:30
    #11 0x7fb199ec1ec2 in XREMain::XRE_mainRun() /mozilla-central/toolkit/xre/nsAppRunner.cpp:4702:22
    #12 0x7fb199ec3cce in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) /mozilla-central/toolkit/xre/nsAppRunner.cpp:4841:8
    #13 0x7fb199ec4f11 in XRE_main(int, char**, mozilla::BootstrapConfig const&) /mozilla-central/toolkit/xre/nsAppRunner.cpp:4933:21
    #14 0x516a0f in do_main /mozilla-central/browser/app/nsBrowserApp.cpp:231:22
    #15 0x516a0f in main /mozilla-central/browser/app/nsBrowserApp.cpp:304
    #16 0x7fb1be1881c0 in __libc_start_main /build/glibc-itYbWN/glibc-2.26/csu/../csu/libc-start.c:308
    #17 0x41f059 in _start (/mozilla-central/objdir-ff-ubsan/dist/bin/firefox+0x41f059)
This seems to be due to the fact that |end| is 0 here? https://searchfox.org/mozilla-central/rev/e9a067a401e41017766f4ab90bc3906603273d51/toolkit/components/telemetry/Telemetry.h#190

Chris, thoughts?
Flags: needinfo?(chutten)
Priority: -- → P2
`end` is TimeStamp::Now(), which is monotonically-increasing relative to other TimeStamp::Now() (which is what most/all of TelemetryAutoTimer instances use for `start`... and is what this particular use (at nsThread::ProcessNextEvent[1]) uses)

So, end - start should always result in a timeduration greater than 0.

So... I'm confused how two sequential executions of TimeStamp::Now() could contain non-monotonically-increasing stamps. Maybe there was an accuracy adjustment in between?

:tsmith - How often does this happen? Every time?

[1]: https://searchfox.org/mozilla-central/source/xpcom/threads/nsThread.cpp#983
Flags: needinfo?(chutten) → needinfo?(twsmith)
(In reply to Chris H-C :chutten from comment #2)
> :tsmith - How often does this happen? Every time?

Yes every time.
Flags: needinfo?(twsmith)
Well, at least it's consistent.

The more I stare at this, the less I can see how this has anything to do with the AutoTimer. It's an RAII class being used correctly... though the `Maybe<T>` around them are a construct I haven't seen before. I'm not sure what purpose they're supposed to solve. ni?:farre for what the Maybes are for.
Flags: needinfo?(afarre)
Conditional telemetry! :) We only want to collect telemetry MAIN_THREAD_RUNNABLE_MS when we run an event on the main thread, so we do a construct in place of a Some value at https://searchfox.org/mozilla-central/source/xpcom/threads/nsThread.cpp#991 and the same for idleTimer at https://searchfox.org/mozilla-central/source/xpcom/threads/nsThread.cpp#1010.

If that construct never occurs the destructor of the RAII won't run and we'll skip collecting telemetry. At least this is the idea.
Flags: needinfo?(afarre)
See Also: → 1432362
Well, hello:

>          // If we construct the AutoTimer with the deadline, then we'll
>          // compute TimeStamp::Now() - mNextIdleDeadline when
>          // accumulating telemetry.  If that is positive we've
>          // overdrawn our idle budget, if it's negative it will go in
>          // the 0 bucket of the histogram.
>          idleTimer.emplace(name, mNextIdleDeadline);


That'd probably be our problem right there. To my knowledge we don't actually allow negative values in AutoTimer (given we cast the resultant millis to a uint32_t). I'm guessing this is exactly what UBSan is complaining of. Our compilers probably (kindly) allow us to static_cast that negative into a 0, but it's not the safest thing.

So... I guess we have to support now > end in AccumulateDelta_impl... and hey, what are the chances we can remove the wrappers that say 

>/**
> * Those wrappers are needed because the VS versions we use do not support free
> * functions with default template arguments.
> */

That comment was introduced about five years ago, surely we can do something about that now, right? *checks* Hm, no one actually appears to use Microsecond-resolution Telemetry AutoTimers. Maybe we can just revert back to millis-only and side-step the problem.

Thoughts, Alessio?
Flags: needinfo?(alessio.placitelli)
(In reply to Chris H-C :chutten from comment #6)
> Well, hello:
> 
> >          // If we construct the AutoTimer with the deadline, then we'll
> >          // compute TimeStamp::Now() - mNextIdleDeadline when
> >          // accumulating telemetry.  If that is positive we've
> >          // overdrawn our idle budget, if it's negative it will go in
> >          // the 0 bucket of the histogram.
> >          idleTimer.emplace(name, mNextIdleDeadline);
> 
> 
> That'd probably be our problem right there. To my knowledge we don't
> actually allow negative values in AutoTimer (given we cast the resultant
> millis to a uint32_t). I'm guessing this is exactly what UBSan is
> complaining of. Our compilers probably (kindly) allow us to static_cast that
> negative into a 0, but it's not the safest thing.

Ha! Good catch!

> So... I guess we have to support now > end in AccumulateDelta_impl... and
> hey, what are the chances we can remove the wrappers that say 

Yes, I think that's the safest bet for now :(

> >/**
> > * Those wrappers are needed because the VS versions we use do not support free
> > * functions with default template arguments.
> > */
> 
> That comment was introduced about five years ago, surely we can do something
> about that now, right? *checks* Hm, no one actually appears to use
> Microsecond-resolution Telemetry AutoTimers. Maybe we can just revert back
> to millis-only and side-step the problem.
> 
> Thoughts, Alessio?

Not sure if we won't ever need that. However, I'm all for chopping out untested/unused code. Let's defer this to Georg for the final call.
Flags: needinfo?(alessio.placitelli) → needinfo?(gfritzsche)
Assignee: nobody → chutten
Status: NEW → ASSIGNED
:tsmith, is there a cheatsheet for building gecko with UBSan?
Flags: needinfo?(twsmith)
Attached file mozconfig
(In reply to Chris H-C :chutten from comment #8)
> :tsmith, is there a cheatsheet for building gecko with UBSan?

Not at the moment. As it is you need to pick and choose the sanitizers because some break the build and others cause crashes at runtime, but I'm working on cleaning it up a bit at least. You can try the attached mozconfig but no guarantees. I've used it with LLVM4.

That said I'm always happy to test patches if needed.
Flags: needinfo?(twsmith)
:tsmith, could you give the attached mozreview patch a go? It should take care of the listed error without moving too many bits around.
(In reply to Chris H-C :chutten from comment #11)
> :tsmith, could you give the attached mozreview patch a go? It should take
> care of the listed error without moving too many bits around.

I tested and the patch does get ride of the UB.
Comment on attachment 8944836 [details]
bug 1431868 Handle AutoTimer users passing us future values for start.

https://reviewboard.mozilla.org/r/214986/#review220772

::: commit-message-94057:1
(Diff revision 1)
> +bug 1431868 Handle AutoTimer users passing us future values for start. r?Dexter

nit: shouldn't this be "Bug 1431868 - ..." ?

::: toolkit/components/telemetry/Telemetry.h:188
(Diff revision 1)
>  template<>
>  struct AccumulateDelta_impl<Millisecond>
>  {
>    static void compute(HistogramID id, TimeStamp start, TimeStamp end = TimeStamp::Now()) {
> +    if (start > end) {
> +      Accumulate(id, 0);

nit: do you think it's worth dropping a comment here to explain what we're doing? i.e. "If the elapsed time ends up being negative, accumulate it in the underflow bucket (0)" or something like that.
Attachment #8944836 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8944836 [details]
bug 1431868 Handle AutoTimer users passing us future values for start.

https://reviewboard.mozilla.org/r/214986/#review220772

> nit: shouldn't this be "Bug 1431868 - ..." ?

Lowercase "bug" is permitted, and the hyphen is optional (for parsing purposes, at any rate). I'll remember to capitalize and punctuate future commit messages I ask you to review :)

> nit: do you think it's worth dropping a comment here to explain what we're doing? i.e. "If the elapsed time ends up being negative, accumulate it in the underflow bucket (0)" or something like that.

I don't think it's worth it. A comment like that suggests it was a semantics-level decision we're making. This code is specifically here to avoid undefined behaviour in the below code, since casting a negative value to an unsigned is Bad News(tm).

(that the code happens to satisfy the semantic decision we document [1] is a happy accident :)

[1]: https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/histograms.html#low
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e7b87f259f43
Handle AutoTimer users passing us future values for start. r=Dexter
Blocks: 1432791
Moving ni?gfritzsche to bug 1432791
Flags: needinfo?(gfritzsche)
https://hg.mozilla.org/mozilla-central/rev/e7b87f259f43
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: