Implement tools/power/rapl, a RAPL-reading program for power profiling

RESOLVED FIXED in Firefox 43

Status

()

Core
General
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: njn, Assigned: njn)

Tracking

Trunk
mozilla43
Points:
---

Firefox Tracking Flags

(firefox43 fixed)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Assignee)

Description

3 years ago
The Intel RAPL interface provides machine-specific registers containing energy approximations for a processor's package, cores, GPU and RAM.

Some existing tools can read some of these, but Linux's |perf| is the only one that can read all of them. It would be nice to have a tiny multi-platform utility in the tree that that gets just these values.

I know which system APIs can be used to read them on both Mac (diagCall64()) and Linux (perf_event_open()) and I have sample code on both platforms that works. I just need to combine the code into a single program, clean it up, and do some more testing.
(Assignee)

Updated

3 years ago
Summary: Implement tools/rapl-power, a RAPL-reading program for power profiling → Implement tools/power/rapl, a RAPL-reading program for power profiling
(Assignee)

Comment 1

3 years ago
Created attachment 8649189 [details] [diff] [review]
Add tools/power/rapl, a RAPL-reading program for power profiling

glandium: can you please review the build system stuff, and anything else you
care to look at.

erahm: can you please review the rest.

froydnj: feel free to take a look or not, as you like.

The patch has no tests in it. Locally I've been verifying it via small scripts 
comparing with |perf| on Linux (can compare all four RAPL estimates) and with
|powermetrics| on Mac (can compare just the package RAPL estimate). I wonder if
including something like that would be good, though I'm not sure which test
harness it would fit in with.
Attachment #8649189 - Flags: review?(mh+mozilla)
Attachment #8649189 - Flags: review?(erahm)
Attachment #8649189 - Flags: feedback?(nfroyd)
(Assignee)

Comment 2

3 years ago
I'm having problems with the Mac code. Compiling with no optimization it works, but compiling with -Os I get various failures. I strongly suspect the __asm__ code is the problem. I'll probably need to use extended asm instead. But reviews can still go ahead, because it's just that one corner that's causing problems.
Comment on attachment 8649189 [details] [diff] [review]
Add tools/power/rapl, a RAPL-reading program for power profiling

Review of attachment 8649189 [details] [diff] [review]:
-----------------------------------------------------------------

This is a great addition, it should really help with the power efforts. I didn't double check that the platform specific bits were absolutely correct (I trust you did your research thoroughly). Overall just a few minor comments, so r=me.

::: tools/power/rapl.cpp
@@ +56,5 @@
> +{
> +  if (aMsg) {
> +    fprintf(stderr, "%s: %s\n", gArgv0, aMsg);
> +  }
> +  fprintf(stderr, "Use --help for more information.\n");

Maybe just call usage for them.

@@ +67,5 @@
> +//---------------------------------------------------------------------------
> +// Linux-specific code
> +//---------------------------------------------------------------------------
> +
> +#ifdef __linux__

So just super high level: I think this would be easier to review (now and in the future) if we moved the platform specific bits to their own files and just exposed |Init| and |RAPLEnergyEstimates|.

@@ +86,5 @@
> +static bool
> +ReadValueFromPowerFile(const char* aStr1, const char* aStr2, const char* aStr3,
> +                       const char* aScanfString, T* aOut)
> +{
> +  char filename[128];

min: Maybe MAX_PATH?

@@ +120,5 @@
> +    if (!ReadValueFromPowerFile("events/energy-", aName, "", "event=%x",
> +         &config)) {
> +      // Failure is allowed for optional domains.
> +      if (!aIsOptional) {
> +        Abort("failed to open file for non-optional domain");

It would be useful to know which domain.

@@ +132,5 @@
> +    ReadValueFromPowerFile("events/energy-", aName, ".scale", "%lf",
> +                           &mJoulesPerTick);
> +
> +    char units[128];
> +    ReadValueFromPowerFile("events/energy-", aName, ".unit", "%s", units);

min: We should specify the max size in the format string: "%128s"

@@ +134,5 @@
> +
> +    char units[128];
> +    ReadValueFromPowerFile("events/energy-", aName, ".unit", "%s", units);
> +    if (strcmp(units, "Joules") != 0) {
> +      Abort("unexpected unit in .unit file");

Perhaps print the units here as well.

@@ +169,5 @@
> +      return -kUnsupported_j;
> +    }
> +
> +    int64_t thisTicks;
> +    if (read(mFd, &thisTicks, 8) != 8) {

nit: Perhaps sizeof instead of hardcoding 8

@@ +196,5 @@
> +  gCores = new Domain("cores", type, /* isOptional = */ false);
> +  gGpu   = new Domain("gpu",   type, /* isOptional = */ true);
> +  gRam   = new Domain("ram",   type, /* isOptional = */ true);
> +  if (!gPkg || !gCores || !gGpu || !gRam) {
> +    Abort("new Domain() failed");

Is this linking w/ mozglue (or whatever defines infallible new)? If so we could get rid of the null checks.

@@ +217,5 @@
> +
> +#elif __APPLE__
> +
> +// Because of the pkg_energy_statistics_t::pkes_version check below, the
> +// earliest OS X version this code will work with is 10.9.0 (xnu-2422.1.72).

All of this inline documentation was super helpful, thanks for taking the time to add it.

@@ +432,5 @@
> +        mIsRamSupported = true;
> +        break;
> +
> +      default:
> +        Abort("unknown CPU model");

Maybe print the CPU model so we can add it.

@@ +459,5 @@
> +    // Do an initial measurement so that the first sample's diffs are sensible.
> +    double dummy1, dummy2, dummy3, dummy4;
> +    JoulesSinceLastSample(dummy1, dummy2, dummy3, dummy4);
> +  }
> +

min: For completeness could add a dtor (and cleanup mPkes).

@@ +535,5 @@
> +
> +// "Normalize" here means convert kUnsupported_j to zero so it can be used in
> +// additive expressions.
> +static void
> +NormalizeAndPrintAsWatts(char* aBuf, double& aValue_J)

Perhaps pass in a buffer size, use snprintf.

@@ +584,5 @@
> +  double total = pkg_J + ram_J;
> +  NormalizeAndPrintAsWatts(totalStr, total);
> +
> +  printf("#%02d %s W = %s (%s + %s + %s) + %s W\n",
> +         sampleNumber++, totalStr, pkgStr, coresStr, gpuStr, otherStr, ramStr);

It might be nice provide this in an easy to parse format like csv (or maybe just an option for that).

@@ +641,5 @@
> +        if (*endptr) {
> +          CmdLineAbort("sample interval is not an integer");
> +        }
> +        if (sampleInterval_msec < 1 || sampleInterval_msec > 3600000) {
> +          CmdLineAbort("sample interval must be in the range 1..3600000 ms");

Perhaps add this to the usage message?

@@ +651,5 @@
> +        if (*endptr) {
> +          CmdLineAbort("sample count is not an integer");
> +        }
> +        if (sampleCount < 0 || sampleCount > 1000000) {
> +          CmdLineAbort("sample count must be in the range 0..1000000");

Perhaps add this to the usage message?
Attachment #8649189 - Flags: review?(erahm) → review+
Comment on attachment 8649189 [details] [diff] [review]
Add tools/power/rapl, a RAPL-reading program for power profiling

Review of attachment 8649189 [details] [diff] [review]:
-----------------------------------------------------------------

Very comments.  Much helpful.
Attachment #8649189 - Flags: feedback?(nfroyd) → feedback+
(Assignee)

Comment 5

3 years ago
Thank you for the fast review.


> > +  fprintf(stderr, "Use --help for more information.\n");
> 
> Maybe just call usage for them.

Some programs do that, but some just print a short error. I personally prefer the latter because it's often caused by a typo and so you don't want the actual error message overwhelmed by the full usage message.


> So just super high level: I think this would be easier to review (now and in
> the future) if we moved the platform specific bits to their own files and
> just exposed |Init| and |RAPLEnergyEstimates|.

I can see this, but it also makes it harder to test on different machines. I found it extremely helpful to be able to copy just a single C++ file to another machine, compile and run it. This is the same reason why I avoided using any Mozilla libraries, as the comment at the top says.


> > +  char filename[128];
> 
> min: Maybe MAX_PATH?

I've added a comment about how "/sys/bus/event_source/devices/power/events/energy-cores.scale" is the longest filename we have to deal with here. I also upped it to 256 just to be super-safe :)


> It would be useful to know which domain.

I changed Abort() to be a varargs function and call vfprintf(), so I could print the erroneous value here and all the other places you requested.


> > +  if (!gPkg || !gCores || !gGpu || !gRam) {
> > +    Abort("new Domain() failed");
> 
> Is this linking w/ mozglue (or whatever defines infallible new)? If so we
> could get rid of the null checks.

As above, it's not using any Mozilla libraries, so the null checks are necessary.


> > +static void
> > +NormalizeAndPrintAsWatts(char* aBuf, double& aValue_J)
> 
> Perhaps pass in a buffer size, use snprintf.

I instead added a comment about the expected size requirement.


> > +  printf("#%02d %s W = %s (%s + %s + %s) + %s W\n",
> > +         sampleNumber++, totalStr, pkgStr, coresStr, gpuStr, otherStr, ramStr);
> 
> It might be nice provide this in an easy to parse format like csv (or maybe
> just an option for that).

I'll defer this due to YAGNI. It's trivial to parse this format with a regexp, which is the most likely way it'll be done.

BTW, I forgot to show you sample output. Here it is:

>     total W = _pkg_ (cores + _gpu_ + other) + _ram_ W
> #01  8.20 W =  4.54 ( 1.06 +  0.22 +  3.25) +  3.67 W
> #02 13.63 W =  9.34 ( 3.27 +  1.24 +  4.84) +  4.28 W
> #03 20.57 W = 15.01 ( 4.47 +  4.26 +  6.28) +  5.56 W
> #04 18.74 W = 13.45 ( 3.92 +  3.76 +  5.76) +  5.29 W
> #05 17.15 W = 12.53 ( 4.87 +  2.03 +  5.63) +  4.62 W


> > +        if (sampleInterval_msec < 1 || sampleInterval_msec > 3600000) {
> > +          CmdLineAbort("sample interval must be in the range 1..3600000 ms");
> 
> Perhaps add this to the usage message?
> 
> > +        if (sampleCount < 0 || sampleCount > 1000000) {
> > +          CmdLineAbort("sample count must be in the range 0..1000000");
> 
> Perhaps add this to the usage message?

I tried but couldn't get them both to fit nicely on a single line. I figured the limits are extreme enough that people shouldn't hit them in practice, in which case failing to mention them in the usage message is forgivable :)


I addressed all your other comments.
(Assignee)

Comment 6

3 years ago
Created attachment 8649654 [details] [diff] [review]
Add tools/power/rapl, a RAPL-reading program for power profiling

Updated version, with erahm's comments addressed. I also fixed up the Mac
inline asm problems with help from Julian and Stack Overflow.
Attachment #8649654 - Flags: review?(mh+mozilla)
(Assignee)

Updated

3 years ago
Attachment #8649189 - Attachment is obsolete: true
Attachment #8649189 - Flags: review?(mh+mozilla)
(Assignee)

Comment 7

3 years ago
I did a try build: https://hg.mozilla.org/try/rev/7bb42573dae4

A couple of strange failures...

- OS X debug built ok, but OS X opt failed, and the failure is this:

> error: inline asm error: This value type register class is not natively supported!
> 21:48:57     INFO -      "syscall"
> 21:48:57     INFO -      ^

  Something's very wrong with the inline asm parsing there.

- Android builds fail like this:

> /builds/slave/try-and-api-9-0000000000000000/build/src/tools/power/rapl.cpp:80:30: fatal error: linux/perf_event.h: No such file or directory 

  But I don't even want to build on Android. I guess I need this in moz.build:

> if CONFIG['OS_ARCH'] == 'Linux' and CONFIG['OS_TARGET'] != 'Android'
(Assignee)

Comment 8

3 years ago
>   But I don't even want to build on Android. I guess I need this in
> moz.build:
> 
> > if CONFIG['OS_ARCH'] == 'Linux' and CONFIG['OS_TARGET'] != 'Android'

Ok, so that works:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=22811b666a3c

But I still get the opt-only inline asm error on OS X. Bizarre.
(Assignee)

Comment 9

3 years ago
Created attachment 8649802 [details] [diff] [review]
Add tools/power/rapl, a RAPL-reading program for power profiling

As before, just review for the build system bits, thank you.
Attachment #8649802 - Flags: review?(mh+mozilla)
(Assignee)

Updated

3 years ago
Attachment #8649654 - Attachment is obsolete: true
Attachment #8649654 - Flags: review?(mh+mozilla)
Comment on attachment 8649802 [details] [diff] [review]
Add tools/power/rapl, a RAPL-reading program for power profiling

Review of attachment 8649802 [details] [diff] [review]:
-----------------------------------------------------------------

I was curious, so I took an overall look.

::: moz.build
@@ +18,5 @@
>  
>  DIRS += [
>      'config',
>      'python',
> +    'tools/power',

This would be better around the other tools in toolkit/toolkit.mozbuild.

::: tools/power/moz.build
@@ +4,5 @@
> +# License, v. 2.0. If a copy of the MPL was not distributed with this
> +# file, You can obtain one at http://mozilla.org/MPL/2.0/.
> +
> +if CONFIG['OS_ARCH'] == 'Darwin' or \
> +   (CONFIG['OS_ARCH'] == 'Linux' and CONFIG['OS_TARGET'] != 'Android'):

There should be a check for x86.

@@ +9,5 @@
> +    SimplePrograms([
> +        'rapl',
> +    ])
> +
> +if CONFIG['GNU_CXX']:

Why the condition? None of the platforms this will build on will have GNU_CXX not set.

::: tools/power/rapl.cpp
@@ +68,5 @@
> +  exit(1);
> +}
> +
> +// A special value that represents an estimate from an unsupported RAPL domain.
> +static double kUnsupported_j = -1.0;

You want to use a const here.

@@ +120,5 @@
> +
> +  // These three are only set if |mIsSupported| is true.
> +  double mJoulesPerTick;  // How many Joules each tick of the MSR represents.
> +  int mFd;                // The fd through which the MSR is read.
> +  double mPrevTicks;      // The previous sample's MSR value.

You'd get better packing with { double, double, int, bool }.

Although, looking at the rest of the code, mIsSupported and mFd are always correlated, so you might as well remove mIsSupported if you want.

@@ +123,5 @@
> +  int mFd;                // The fd through which the MSR is read.
> +  double mPrevTicks;      // The previous sample's MSR value.
> +
> +public:
> +  Domain(const char* aName, int aType, bool aIsOptional)

if you use an enum for aIsOptional, and use a default value in the function declaration, then callers can do:
 new Domain(name, type) for non optional domains, and
 new Domain(name, type, OPTIONAL) for optional ones.

@@ +126,5 @@
> +public:
> +  Domain(const char* aName, int aType, bool aIsOptional)
> +  {
> +    int config;
> +    if (!ReadValueFromPowerFile("events/energy-", aName, "", "event=%x",

You could create and pass a temporary std::string instead of passing 3 strings.

%x wants an *unsigned* int.

@@ +149,5 @@
> +      Abort("unexpected unit '%s' in .unit file", unit);
> +    }
> +
> +    struct perf_event_attr attr;
> +    memset(&attr, 0, sizeof(attr));

Note that declaring: struct perf_event_attr attr(); should do the memset for you.

@@ +150,5 @@
> +    }
> +
> +    struct perf_event_attr attr;
> +    memset(&attr, 0, sizeof(attr));
> +    attr.type = aType;

attr.type is a uint32_t, aType an int.

@@ +163,5 @@
> +            "Did you run as root or "
> +            "set /proc/sys/kernel/perf_event_paranoid to 0?");
> +    }
> +
> +    mPrevTicks = 0;

Without actually trying to run this code: is this skewing the first value JoulesSinceLastSample() returns?

@@ +179,5 @@
> +    if (!mIsSupported) {
> +      return -kUnsupported_j;
> +    }
> +
> +    int64_t thisTicks;

The values you get on a perf_event_open fd are unsigned.

@@ +205,5 @@
> +
> +  gPkg   = new Domain("pkg",   type, /* isOptional = */ false);
> +  gCores = new Domain("cores", type, /* isOptional = */ false);
> +  gGpu   = new Domain("gpu",   type, /* isOptional = */ true);
> +  gRam   = new Domain("ram",   type, /* isOptional = */ true);

Not that it matters much, but you're leaking all those. Use std::unique_ptr?

@@ +219,5 @@
> +  aPkg_J   = gPkg->JoulesSinceLastSample();
> +  aCores_J = gCores->JoulesSinceLastSample();
> +  aGpu_J   = gGpu->JoulesSinceLastSample();
> +  aRam_J   = gRam->JoulesSinceLastSample();
> +}

Seems to me like Init() and RAPLEnergyEstimates() could go in a class. Init would become a constructor. Then all those static variables could just be member variables.

@@ +266,5 @@
> +// core_energy_stat_t and pkg_energy_statistics_t are both from
> +// osfmk/i386/Diagnostics.c.
> +// - In 10.8.4 (xnu-2050.24.15) both structs were introduced, but with many
> +//   fewer fields.
> +// - In 10.8.5 (xnu-2050.48.11) both structs were both substantially expanded,

two many "both"s.

@@ +355,5 @@
> +  static const uint64_t diagCallNum = 0x4000001;
> +  uint64_t rv;
> +
> +  __asm__ __volatile__(
> +    "syscall"

I guess this doesn't work on i386. Doesn't using the syscall() function work?

@@ +473,5 @@
> +
> +    // Over-allocate by 1024 bytes to allow for the uncertainty around
> +    // core_energy_stat_t::gpmcs and for any other future extensions to that
> +    // struct. (The fields we read all come before the core_energy_stat_t
> +    // array, so it won't matter to us whether gpmcs is present or not.)

You might as well add a dummy large field to the core_energy_stat_t struct.

@@ +476,5 @@
> +    // struct. (The fields we read all come before the core_energy_stat_t
> +    // array, so it won't matter to us whether gpmcs is present or not.)
> +    size_t pkesSize = sizeof(pkg_energy_statistics_t) +
> +                      logicalcpu_max * sizeof(core_energy_stat_t) + 1024;
> +    mPkes = (pkg_energy_statistics_t*) malloc(pkesSize);

Then you can use new here, and use a std::unique_ptr for mPkes

@@ +618,5 @@
> +  double total = pkg_J + ram_J;
> +  NormalizeAndPrintAsWatts(totalStr, total);
> +
> +  printf("#%02d %s W = %s (%s + %s + %s) + %s W\n",
> +         sampleNumber++, totalStr, pkgStr, coresStr, gpuStr, otherStr, ramStr);

Wouldn't a timestamp be useful?

@@ +638,5 @@
> +"\n"
> +"- On Mac, this program can be run by any user.\n"
> +"\n"
> +"- On Linux, this program can only be run by the super-user unless the\n"
> +"  contents of /proc/sys/kernel/perf_event_paranoid is set to 0 or lower.\n"

Seems to me you could skip the non-native platform comment (iow, #ifdef __APPLE__ the mac part and #ifdef __linux__ the linux part).
Attachment #8649802 - Flags: review?(mh+mozilla)
(Assignee)

Comment 11

3 years ago
> - OS X debug built ok, but OS X opt failed, and the failure is this:
> 
> > error: inline asm error: This value type register class is not natively supported!
> > 21:48:57     INFO -      "syscall"
> > 21:48:57     INFO -      ^
> 
>   Something's very wrong with the inline asm parsing there.

This is actually a clang bug:

> 02:59:20     INFO -  /builds/slave/try-m64-0000000000000000000000/build/src/tools/power/rapl.cpp:359:5: error: inline asm error: This value type register class is not natively supported!
> 02:59:20     INFO -      "syscall"
> 02:59:20     INFO -      ^
> 02:59:20     INFO -  fatal error: error in backend: inline asm error: This value type register class is not natively supported!
> 02:59:20     INFO -  clang: error: clang frontend command failed with exit code 70 (use -v to see invocation)
> 02:59:20     INFO -  clang version 3.3 (branches/release_33 183744)
> 02:59:20     INFO -  Target: i386-apple-darwin11.2.0
> 02:59:20     INFO -  Thread model: posix
> 02:59:20     INFO -  clang: note: diagnostic msg: PLEASE submit a bug report to http://llvm.org/bugs/ and include the crash backtrace, preprocessed source, and associated run script.
> 02:59:20     INFO -  clang: note: diagnostic msg:
> 02:59:20     INFO -  ********************
> 02:59:20     INFO -  PLEASE ATTACH THE FOLLOWING FILES TO THE BUG REPORT:
> 02:59:20     INFO -  Preprocessed source(s) and associated run script(s) are located at:
> 02:59:20     INFO -  clang: note: diagnostic msg: /var/folders/vs/t_n93bks6pd2b2mjk9q8pmdr00000w/T/rapl-Fnogdr.cpp
> 02:59:20     INFO -  clang: note: diagnostic msg: /var/folders/vs/t_n93bks6pd2b2mjk9q8pmdr00000w/T/rapl-Fnogdr.sh
> 02:59:20     INFO -  clang: note: diagnostic msg:
> 02:59:20     INFO -  ********************

And the file gets compiled with |-arch i386|. I suspect that's related to the
problem. I don't know how/why this compiles successfully on debug builds when
|-arch i386| is used.

And I just discovered that my local Mac builds don't have |-arch i386| in the
clang command invocation, unlike the ones on try server.

I wonder if all this is related to the use of -arch in mozilla-central/build/macosx/universal/mozconfig.common.

glandium, any ideas about all this?
Flags: needinfo?(mh+mozilla)
(Assignee)

Comment 12

3 years ago
I've managed to reproduce the clang abort locally when building with |-arch i386|. So I just need to work out the right way to avoid that. This code will only work on x86-64.
cf. my review comment :)
Flags: needinfo?(mh+mozilla)
(Assignee)

Comment 14

3 years ago
Thank you for the review comments, glandium. I've addressed most of them.
https://hg.mozilla.org/mozilla-central/rev/67fdca00897d
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
status-firefox43: affected → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
(Assignee)

Comment 20

3 years ago
FWIW, I investigated the possibility of implementing rapl in Windows. Windows doesn't have any API that exposes the RAPL data, and it appears that reading MSRs required a driver. I asked aklotz and he said:

> 1) There is an undocumented system call in the NT Native API that used to
> allow for reading MSRs from user mode. I threw some code together to try it
> but I learned that unfortunately the call was deprecated in Windows Vista.
> Now it returns a "not implemented" error code. From what I've read, it would
> still work on Windows XP.
> 
> 2) Microsoft debuggers now package their own private device driver that they
> extract and load at runtime to provide interfaces to do this stuff. It sounds
> like the device driver route is the only way to access MSRs from user mode
> these days.
> 
> 3) WinRing0 looks OK, or I suppose we could roll our own. One annoyance with
> modern 64-bit Windows is that without signing the driver, you'll need to boot
> Windows in kernel debugging mode. That is accomplished by running the bcdedit
> utility. See
> https://msdn.microsoft.com/en-us/library/windows/hardware/ff542191%28v=vs.85%29.aspx
> for some documentation.

(WinRing0 is a driver that reads some MSRs.) I'm not sure what "modern 64-bit Windows" means exactly here, but it sounds like this information is enough to kill the idea of porting rapl to Windows.
You need to log in before you can comment on or make changes to this bug.