Open Bug 881389 (OneLogger) Opened 11 years ago Updated 3 days ago

[meta] Improve Gecko's logging story

Categories

(Core :: General, defect)

defect

Tracking

()

People

(Reporter: justin.lebar+bug, Unassigned)

References

(Depends on 7 open bugs, Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [fxos:media])

Attachments

(4 obsolete files)

Gecko's B2G logging story is a mess.

I know this might not seem like an important issue to tackle, but I think it
is.  Bear with me.

Right now we log messages in one of a few ways:

1) Unconditionally print to logcat with __android_log_print.  This is common in
   gonk-only code.

2) Use prlog, which prints to logcat if PR_LOGGING is enabled at build time (or
   FORCE_PR_LOGGING is defined in the given file), and if an environment variable
   is defined in b2g.sh.

3) Use hand-rolled logging macros (e.g. in ProcessPriorityManager), which
   usually require recompiling to enable.

Option (1) is fine for Gonk-only code, but if we over-use this approach, we can
clutter up the logs and potentially slow down the phone.  I also think I've
seen us using this idiom in non-Gonk-only code (because it's so convenient),
but that's very bad, because then there's no way to get those log messages on
desktop, short of editing the code.

Option (2) requires editing b2g.sh on the device and, if we don't use
FORCE_PR_LOGGING, also requires re-compiling.

Option (3) requires re-compiling.

Re-compiling is usually fine for developers, but it's not fine for many of the
people doing QA.  It's also not an option for a user who wants to submit a bug
report.

I'd also prefer not to make users or QA people edit b2g.sh.  Indeed, that
probably requires root.

There's a lot of data that's hidden in logs that use option (2) or (3) above,
and right now every time we want someone from QA to get that information, it's
difficult for everyone involved.  Asking a user to get this information is
going to be a non-starter for all but the most technically-inclined users.

I'd like to simplify and unify our logging story.  I'd like a solution that:

1) Doesn't require recompiling to enable logging of any module.
2) Works using the same code on dekstop and mobile.
3) Has convenient syntax (so developers don't have to work around it today, as
   we currently have to work around prlog).

prlog completely disables itself in release builds, because it doesn't want to
add overhead.  This new logging module is going to have the opposite default,
that log messages are (usually) infrequent enough that checking whether logging
is enabled isn't a big deal.
Suggestion:

4) Be able to enable logging without rebooting the phone or restarting processes, to help diagnose hard-to-reproduce bugs.
We should look at dynamic debug, a feature that exists in the kernel.

There was a really good paper written on this here:
https://www.kernel.org/doc/ols/2009/ols2009-pages-39-46.pdf
Which reminds me: can we include getting kernel messages inline with logcat somehow?
I would really like at the very least to annotate crashes with "OOM", etc in logcat, because asking people to look at dmesg is too complicated.

But I think this is orthogonal to what I'm trying to accomplish here.
I filed bug 805476 to get the OOM messages into logcat...
(In reply to Mike Habicher [:mikeh] from comment #1)
> 4) Be able to enable logging without rebooting the phone or restarting
> processes, to help diagnose hard-to-reproduce bugs.

That would be really nice.  I'll see what I can do!
Hm, compile-time hashing of the filename is going to be fun...
I have an in-progress patch for this.  It works pretty okay, and I'm hopeful the overhead will be very low.
Assignee: nobody → justin.lebar+bug
Alias: OneLogger
Summary: Improve B2G's logging story → Improve Gecko's logging story
Attached patch Part 1, WIP: One logger to rule them all (obsolete) — — Splinter Review
The main thing that isn't complete here is Android/B2G integration.  I plan to write code which will watch a file on the FS and use its contents to determine which logging statements to enable.

I'm also still tweaking the interface.  I've been going through and converting PR_LOG's and __android_log_print statements to the new interface in order to test that it's the right interface.  I think I'm mostly happy with it now.

I still need to measure the effect of these log statements on code size and speed.  I'm hoping it's negligible, or if not can easily be made negligible.

Anyway, there's a long comment in Logging.h that explains what's going on.
This is an illustrative patch, if you just want to see the interface.
Another illustrative patch.  I have a bunch more; pipe up if you want to see 'em.
Can this also incorporate thread names (as bug 885952 does) and timestamp to the log output?
(In reply to Honza Bambas (:mayhemer) from comment #12)
> Can this also incorporate thread names (as bug 885952 does) and timestamp to
> the log output?

Sure, that's a good idea.

These log messages are getting a lot of metadata:

 - log module name
 - filename + lineno of the log message
 - optionally "CRITICAL"
 - thread name
 - pid, if running in multiprocess mode
 - process name ("calculator", etc), if running in multiprocess mode

I need to spend some time figuring out how not to overload us with all this info, but that's on me.
(In reply to Justin Lebar [:jlebar] from comment #13)
>
>  - pid, if running in multiprocess mode

|logcat -v thread| or |logcat -v threadtime| will already show the pid and tid that generated the logcat entry, so no need to duplicate that on supporting platforms.
Except adb logcat --help doesn't list any of these things, so it might be more user-friendly to display them ourselves.
See bug 451283 and bug 884397 for tentative plans on the JS side of things. Not sure how much we care about having C++ and JS play together at this juncture.
I'd been explicitly not considering structured logging here; I figured that we cared only about consumers with heartbeats.

Should I care, or do we want structured logging only for test output?
I can't think of an adequate use case for structured logging in C++. Now, having a unified manner for quickly changing log emission settings that affects both JS and C++, that would be interesting.
I had an interesting conversation with dveditz and Lucas on IRC, in which we identified a few security issues.

For example, it would be bad if an app process could turn on logging and direct it to a file.  Then that app process could

  (a) fill up the phone's storage, and
  (b) if the file is globally-readable (which it probably is), the app could read sensitive data (e.g. URLs) out of the logs.

I think these are all moot if, on B2G, we don't allow logging to files.  jcranmer convinced me that we want log-to-file on desktop, but I'm not convinced we need it on B2G.  Everything goes through logcat right now anyway.  Using logcat exclusively also solves problems around interleaving messages between processes.
(In reply to Justin Lebar [:jlebar] from comment #19)
> jcranmer convinced me that we want log-to-file on desktop, 

If desktop means b2g desktop builds, then +1. 

We have issues when other processes are logging using NSPR now.  E.g. a flash plugin can just overwrite the file start on windows - really crappy.  Also, when a child process is spawned, it would be great to have the logs in one file.  Or, at least somehow nicely and automatically separated to multiple files.

I'm terrified a day I'll have to debug a deep problem on b2g directly in the phone w/o being able to get logs :(
As far as this bug is concerned, B2G desktop should behave just like Firefox desktop.
I'll be frank and point out that I mostly care about logging in terms of the logging used in mailnews, which is predominantly the protocol logs (this is basically the only way we have a hope of QAing many problems). This means my basic desiderata are few:

1. Ability to log to a file.
2. Log in both debug and release builds.
3. Ability to turn on logging via UI somehow.
4. Ability to give loggers simple names (i.e., IMAP, NNTP, etc.)
5. Turn on logging via an env variable (like NSPR_LOG_MODULES=IMAP:5 right now)--helps when debugging xpcshell test failures.
6. Logging from multiple threads needs to work (IMAP runs on its own thread).

I don't particularly care about structured logging, nor do I feel strongly about JS integration at this point (since you can work around them).
What I'm designing should meet all of those criteria.

I'm probably not going to move a ton of existing NSPR loggers over to this new thing, but it should be easy to do so for anything you care about.
Blocks: 887971
I think we should support C++ stream as well. C++ stream are useful when you're not sure which type flags you should use for printf like typdefs, uint64_t and objects. The primary concern is being able to lazy evaluate while using C++ stream. Perhaps we can have something like:

LOG("Count: " << myInt);
C++ streams would also let us do

  nsIURI* uri;
  LOG("URI: " << uri);

instead of

  nsIURI* uri;
  LOG("URI: %s", LOG_AS_STR(uri));

I think the former is clearly better.

It's harder to do field width specifiers with <<, but those are going to be pretty uncommon with log messages, I think.

So I guess I'll figure out if I can convert everything to using streams.
Have you looked at the prior art in the chromium loggers? In particular their API has LOG (always) and DLOG (debug-only):

http://src.chromium.org/svn/trunk/src/third_party/cld/base/logging.h

The client code is written as:

LOG(ERROR) << "Pickle error decoding Histogram: " << histogram_name; 

There is enough magic to avoid evaluating some of the arguments if the logging of that severity is disabled.
Does it prevent something like 'ExpensiveFunction()' in:
LOG(DEBUG) << "Tree Size: " << ExpensiveFunction();
from being evaluated?
(In reply to Benoit Girard (:BenWa) from comment #27)
> Does it prevent something like 'ExpensiveFunction()' in:
> LOG(DEBUG) << "Tree Size: " << ExpensiveFunction();
> from being evaluated?

It doesn't look like it to me, but I could be misreading it...
If LOG(ERROR) were defined as something along the lines of:

#define LOG(level) if (LoggingEnabled(level)) logstream

then ExpensiveFunction() would be inside an if and would only be evaluated if the if passes.
> #define LOG(level) if (LoggingEnabled(level)) logstream
>
> then ExpensiveFunction() would be inside an if and would only be evaluated if the if 
> passes.

How would that work?  Suppose I did

> LOG(DEBUG) << ExpensiveFunction();

in the #define above, LOG(DEBUG) doesn't evaluate to anything, regardless of whether LoggingEnabled(DEBUG).  But even if we made it evaluate to something, I don't see how that fixes things.
I'm not following.

Using my somewhat simple example if you had

#define LOG(level) if (LoggingEnabled(level)) logstream

then

LOG(DEBUG) << ExpensiveFunction();

would become

if (LoggingEnabled(DEBUG)) logstream << ExpensiveFunction();

And ExpensiveFunction() would only be called if LoggingEnabled(DEBUG) returned true.

If you got more exotic, you could make the LoggingEnabled portion actually be a function based on the debug level, so that in release builds, debug stuff gets optimized out.

if (0) logstream << ExpensiveFunction();

should be eliminated entirely.
Ah, I see what you mean.

That might work...
Sadly I didn't have a chance to finish this bug before leaving Mozilla.

In case someone wants to pick this up, here's the status: I had something which
was mostly complete, but jcranmer convinced me that an iostream-style interface
(i.e., using "<<") instead of a printf-style interface would be better.  I
never had a chance to do the conversion.

This patch builds a system where it's easy to tell, in a threadsafe and
hopefully efficient manner, whether logging for a file/module is enabled.  So
the remaining work is just converting the interface.  Hopefully it's clear how
all of this stuff fits together.
Attachment #766994 - Attachment is obsolete: true
Attachment #766995 - Attachment is obsolete: true
Attachment #766996 - Attachment is obsolete: true
I'd hate for this to fall through the cracks.
(In reply to ben turner [:bent] (needinfo? encouraged) from comment #34)
> I'd hate for this to fall through the cracks.

I can assign this to me, and then at least I'll be reminded of it. It may be several weeks before I can get to it...

If somebody else wants to take this on, let me know.
Assignee: justin.lebar+bug → dhylands
Added [fxos:media] whiteboard for bugs assigned to me so that they can be triaged/prioritized, etc.
Whiteboard: [fxos:media]
I can take a look at getting this landed.
(In reply to Eric Rahm [:erahm] from comment #38)
> I can take a look at getting this landed.

Sounds good to me.
Assignee: dhylands → erahm
Gabriele, you noted on dev-b2g that prlog is not an option for chrome code (such as RIL). Can you elaborate on what the issues are so that we can make sure that use case is covered?
Flags: needinfo?(gsvelto)
(In reply to Eric Rahm [:erahm] from comment #40)
> Gabriele, you noted on dev-b2g that prlog is not an option for chrome code
> (such as RIL). Can you elaborate on what the issues are so that we can make
> sure that use case is covered?

Since we're talking about pr logging, I ran into an issue with PR_Assert yesterday.

In particular, as part of bug 1049243, I wound up adding some calls to Mutex::AssertCurrentThreadOwns() which eventually calls down to PR_Assert:
http://dxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/io/prlog.c?from=PR_Assert#543

If I put a call to __android_log_print in front of the PR_LogPrint, then I would see the assert failure in logcat, bit I wouldn't see the output from PR_LogPrint.

This was for debug eng B2G build running on a flame.

I know that the ANDROID symbol was defined, because I could see that there were undefined symbols in the prlog.o file for the __android_log_print call I added as well as __android_log_write calls being made from here:
http://dxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/io/prlog.c?from=PR_Assert#105

Now I hadn't done anything to enable PR_LOGGING, but I would have expected PR_Assert's to show up regardless. I'm going to guess logfile was NULL and that's why PR_LogPrint didn't do anything.

Currently, you need to set environment variables to enable PR logging. Under android, I think it would be cool if you could also set an android property. Properties which start with persist. survive across reboots, and you can use getprop/setprop from an adb shell to manipulate properties.
(In reply to Eric Rahm (out 8/29 - 9/7) [:erahm] from comment #40)
> Gabriele, you noted on dev-b2g that prlog is not an option for chrome code
> (such as RIL). Can you elaborate on what the issues are so that we can make
> sure that use case is covered?

PR_LOG() is in general a decent interface for our C/C++ code. One can define modules (but not submodules) and it's got logging levels; and it's also relatively easy to pipe its output to the logcat (though it could be easier as Dave mentioned above).

Two drawbacks that it currently has are that it's a pure C interface - so it cannot be called by JavaScript chrome code obviously - and that it's not compiled in unless debugging is enabled (or logging is forced using the FORCE_PR_LOG/MOZ_LOGGING).

For Chrome code specifically we'd need a new JavaScript interface. Right now the most common idiom we use is the following with a conditional dump():

http://hg.mozilla.org/mozilla-central/file/3be45b58fc47/dom/system/gonk/ril_worker.js#l48

This is OK for developers but is really annoying to use by QA (they have to both edit preferences *and* manually modify omni.ja, see the horror here https://wiki.mozilla.org/B2G/QA/Tips_And_Tricks#RIL_Debugging ). It's also very crude as we don't have modules or levels which are at the very least available in PR_LOG().

I haven't had the time to analyze all of :jlebar's work here but I remember it didn't cover JS code so we'd still need to figure how to do that.

A few things we could do to improve the current situation without introducing disrupting changes are (in order of complexity):

- Make it easier to access PR_LOG() output from a device. This would consist first of all in better documenting how to get output in the logcat at all (my fault, I should have done this weeks ago) and as Dave mentioned make it easier to switch it on via properties.

- Evaluate enabling PR_LOG() in engineering builds. This would be *very* useful for debugging but it might have a measurable performance impact so it's important to figure out what the impact is. We regularly use engineering builds for performance testing but since there's already a performance gap between user and engineering builds we might as well add a little more overhead provided it's not too much.

- Modify PR_LOG() to leverage Android's logcat modules and levels interface. I think this would make filtering easier but might require some significant changes to PR_LOG(). Again performance impact sould also be evaluated.

- Expose PR_LOG() to chrome via a JavaScript interface. This would allow us to port the existing dump()-based code to use modules and levels which would be an improvement. Since currently having logging disabled is essentially free in JavaScript we should leave some high-level switch to globally enable/disable logging in JavaScript proper to avoid having all messages trickle down to PR_LOG() and be filtered there which would have quite a performance impact.
Flags: needinfo?(gsvelto)
FWIW i agree with working in PR_Log to make it the one true logger (although I haven't looked at jlebar's patch). It already has a pretty good feature set and I'd just missing a few things. Being able to enable per-module logging at runtime is the biggest one as mentioned above. Even FORCE_PR_LOG would make me happy as that can be used to enable per-file logging at compile time in non-debug builds, but that doesn't play well with unified compilation which we have started doing of late.

I would also like to see a stream interface or wrapper added for it. In the gfx code we have a bunch of different logging systems now but in general the stream interface ones seem to be gaining more traction because we often have to print many complex data structures. See gfx/layers/LayersLogging.* and gfx/2d/Logging.h as examples of what we do now. I particularly like the syntax in the Logging.h one and think it could be used for PR logging as well:
PRLog(module, DEBUG) << stuff << foo;
See Also: → 1057642
(In reply to Gabriele Svelto [:gsvelto] from comment #42)
> Two drawbacks that it currently has are that it's a pure C interface - so it
> cannot be called by JavaScript chrome code obviously - and that it's not
> compiled in unless debugging is enabled (or logging is forced using the
> FORCE_PR_LOG/MOZ_LOGGING).
> 
> For Chrome code specifically we'd need a new JavaScript interface. Right now
> the most common idiom we use is the following with a conditional dump():
> 
> http://hg.mozilla.org/mozilla-central/file/3be45b58fc47/dom/system/gonk/
> ril_worker.js#l48

Okay, so this is JS code. Implementing a unified JS/C++ logger is probably beyond the scope of this bug (which is vaguely make a better C++ logger that all of our C++ code will use). The discussion in dev-b2g seems to be leaning towards just using console.log for gaia code. I know it works for regular workers, is it available for chrome workers as well?
(In reply to (away|aug30-sep3) Kartikaya Gupta (email:kats@mozilla.com) from comment #43)
> FWIW i agree with working in PR_Log to make it the one true logger (although
> I haven't looked at jlebar's patch). 

It's worth taking a look at what he's done to see if these type of changes would fit into PR_Log (and pass review by the module owner). It's probably also worth adding a recap of the motivations/requirements and a quick overview of the current implementation, I can take care of that when I'm back in a week.
(In reply to Eric Rahm (out 8/29 - 9/7) [:erahm] from comment #44)
> Okay, so this is JS code. Implementing a unified JS/C++ logger is probably
> beyond the scope of this bug (which is vaguely make a better C++ logger that
> all of our C++ code will use).

If we implement a new C++ logger (whether on top of PR_LOG() or stand-alone) giving it an XPCOM interface wouldn't be too difficult and it would automatically make it usable to JavaScript chrome code.

> The discussion in dev-b2g seems to be leaning
> towards just using console.log for gaia code.

Yeah, we don't have much choice there. Still making it faster and allowing for an easy way to disable it (via props for example) so that it can be always enabled in the code would improve the situation there.

> I know it works for regular
> workers, is it available for chrome workers as well?

You mean console.log()?
(In reply to Dave Hylands [:dhylands] from comment #41)
> Now I hadn't done anything to enable PR_LOGGING, but I would have expected
> PR_Assert's to show up regardless. I'm going to guess logfile was NULL and
> that's why PR_LogPrint didn't do anything.

Yes, note that if NSPR_LOG_MODULES is not set then logFile won't be set either even though NSPR_LOG_FILE might be set:

http://dxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/io/prlog.c?from=PR_Assert#190
http://dxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/io/prlog.c?from=PR_Assert#247

This is annoying because it will prevent us from seeing the PR_LogPrint() direct callers, namel PR_Assert() and PR_Abort(). It might be worth filing a bug to ensure that those will at least go to the logcat (like they go to stderr) besides going through PR_LogPrint().

> Currently, you need to set environment variables to enable PR logging. Under
> android, I think it would be cool if you could also set an android property.
> Properties which start with persist. survive across reboots, and you can use
> getprop/setprop from an adb shell to manipulate properties.

+1, let's file a bug.
I've filed two bugs for the issues mentioned in my comment above.
See Also: → 1060292, 1060335
One more suggestion: it would be nice to start/stop/change modules at run time.  Sometimes people outside experience problems (including me) but no-one (normal users) of course turn logging on.  But when the problem is there, turning logs on would be so ultra mega helpful!
(In reply to Honza Bambas (:mayhemer) from comment #49)
> One more suggestion: it would be nice to start/stop/change modules at run
> time.  Sometimes people outside experience problems (including me) but
> no-one (normal users) of course turn logging on.  But when the problem is
> there, turning logs on would be so ultra mega helpful!

I wrote an extension that can turn on NSPR logging dynamically... it is very hacky though, and works by noticing that PRLogModule is a singly linked list of modules.
Do extensions work with OOP and B2G?

I think having a Web API (perhaps only for certified apps) and an app to update the settings would be great. Especially if the app could be used on regular Firefox.

For linux/B2G we could use mechanisms similar to what's being done for get-about-memory.
- Send commands over a named pipe
- Send a signal and have a file containing the ENV stuff that's reparsed
- For android builds, use properties and detect changes to the properties and have that trigger a reparse.

Using a file to contain a list of ENV_VAR=value lines seems like it could work on any platform. Then you just need a triggering mechanism which may be platform dependent.
(In reply to Dave Hylands [:dhylands] [off Fri Aug 29] from comment #51)
> I think having a Web API (perhaps only for certified apps) and an app to
> update the settings would be great. Especially if the app could be used on
> regular Firefox.

Depending on where we'd want to put this functionality one might use regular settings. The idea would be to have a setting that corresponds to the module list and verbose levels (or even multiple entries, one per module with the level as the pref value). To make it work one would need to watch for the pref changes and then implement a mechanism to re-submit the list of modules to the PR_LOG() machinery (which IIRC is not possible right now).
I'm planning on spinning off a few bugs from this to make life easier. I would have liked to remove NSPR from the equation, but it feels like there's more support for building on top of what we already have.

To sum up what we're looking for:

*Runtime configuration*
  - NSPR logging needs to expose an interface to modify the log level of a module
  - NSPR needs to expose an interface for loading config from a file/ENV var/string
  - Triggering change notifications on linux/B2G/android(/OSX) (comment 51)
  - Web API and app for modifying settings (this would be very useful for non-devs and would be useful on the gaia side as well)

*Streaming interface*
  ie: PR_LOG(module) << stuff_to_log
The PR_LOG interface is c-only, so this would have to be an external wrapper.

*Logging in ENG builds*
Investigate enabling logging outside of debug builds. Currently there's a --disable-logging flag that's kind of related but not, we could hook into that.

*Improved formatting*
Improved formatting including:
  - log level
  - module name
  - application name
  - perhaps others

*Integration with gaia*
I think this is going to end up being done through console logging, so there might not be anything PR_LOG related that needs to be done. Dynamically configuring log levels will probably share code though.
(In reply to Eric Rahm [:erahm] from comment #54)
> I'm planning on spinning off a few bugs from this to make life easier. I
> would have liked to remove NSPR from the equation, but it feels like there's
> more support for building on top of what we already have.

I don't think compatibility with NSPR is a necessary requirement on anyone's list, especially if it's rather easy to migrate from NSPR logging to $ONE_TRUE_LOGGER. I actually think trying to be explicitly compatible with NSPR logging is annoying, since most logging requests appear to want logging in release builds, and you can only get that in NSPR with FORCE_PR_LOG.
(In reply to Eric Rahm [:erahm] from comment #54)
>   - NSPR logging needs to expose an interface to modify the log level of a
> module

That would be a matter of exposing _PR_SetLogModuleLevel(), the biggest issue with it is that it's supposed to be called from the same thread that owns the module. If we're OK with every module owner to listen to log level changes and then adjust them on their own this should be fairly simple. However if we want the logging system to pick up changes automatically then this would be trickier as we'll need to ensure that the module's log level can be safely adjusted from another thread while the owner is in the process of logging. The level is a single field currently so this could be done asynchronously but it needs some proper verification.

>   - NSPR needs to expose an interface for loading config from a file/ENV
> var/string

You mean re-loading the configuration? Because currently it's read from an environment variable when calling _PR_SetLogModuleLevel() but environment variables are not a pretty way to store configuration that can change at runtime.

>   - Triggering change notifications on linux/B2G/android(/OSX) (comment 51)

+1 provided we can address the issues with threading I've pointed out above.

> *Streaming interface*
>   ie: PR_LOG(module) << stuff_to_log
> The PR_LOG interface is c-only, so this would have to be an external wrapper.

IMHO this isn't exactly a priority, making existing logging code work better is a more pressing issue.

> Investigate enabling logging outside of debug builds. Currently there's a
> --disable-logging flag that's kind of related but not, we could hook into
> that.

The only way to force PR_LOG() to keep all statements in non-debug build is via FORCE_PR_LOG. We already use it in a lot of places where we deem that the log statements have little impact on production code and are valuable to have in non-debug builds. We could try forcing all logging to be on in a production build to measure the impact and see if it's viable.
(In reply to Joshua Cranmer [:jcranmer] from comment #55)
> I actually think trying to be explicitly compatible with
> NSPR logging is annoying, since most logging requests appear to want logging
> in release builds, and you can only get that in NSPR with FORCE_PR_LOG.

I did a very unscientific search in the codebase (*cough* grep *cough) which shows 60 source files and 14 headers declaring FORCE_PR_LOG (some other headers contain a commented out declaration of FORCE_PR_LOG so I took those out of the count). 65 headers are declaring macros that use PR_LOG() and 439 source files are using PR_LOG() directly; from this data it seems to me that most of our logging is done for diagnostics which we don't need in release builds.
(In reply to Joshua Cranmer [:jcranmer] from comment #55)
> (In reply to Eric Rahm [:erahm] from comment #54)
> > I'm planning on spinning off a few bugs from this to make life easier. I
> > would have liked to remove NSPR from the equation, but it feels like there's
> > more support for building on top of what we already have.
> 
> I don't think compatibility with NSPR is a necessary requirement on anyone's
> list, especially if it's rather easy to migrate from NSPR logging to
> $ONE_TRUE_LOGGER. I actually think trying to be explicitly compatible with
> NSPR logging is annoying, since most logging requests appear to want logging
> in release builds, and you can only get that in NSPR with FORCE_PR_LOG.

I'm certainly open to not using NSPR logging, but there has been feedback in the bug that improving PR_LOG might be preferable. kats explicitly stated so in comment 43, gsvelto sounds like he's for it, I'm not sure where Dave stands. Those in favor of starting fresh would probably be you, me, jlebar. I don't really want to get into a fight over this, so it's probably best to be pragmatic and make a decision based on whether or not we can get what we want out of NSPR logging.

Cons:
* FORCE_PR_LOG is definitely a bit of a mess: any file that uses it gets excluded from unified builds, it's a build-time only option, it has to be done at a file level not a module level.
* c-only prevents us from directly implementing a streaming interface.
* NSRP logging just isn't thread safe (creating modules isn't thread safe, setting log levels isn't thread safe, etc). Arguably this is a bug that we might be able to fix.
* On the other hand, every time we log we grab a global lock
* Can't hook in directly to log level change notifications (whether we use settings, an FS watcher, etc)
* Purely logistical, making NSPR changes can be a *slow* process

Pros:
* It's already in use (there are 4347 instances of PR_LOG in our code base)
* If we switch away from it, NSPR will still be using it so we'll still end up with multiple log systems
(In reply to Gabriele Svelto [:gsvelto] from comment #57)
> (In reply to Joshua Cranmer [:jcranmer] from comment #55)
> > I actually think trying to be explicitly compatible with
> > NSPR logging is annoying, since most logging requests appear to want logging
> > in release builds, and you can only get that in NSPR with FORCE_PR_LOG.
> 
> I did a very unscientific search in the codebase (*cough* grep *cough) which
> shows 60 source files and 14 headers declaring FORCE_PR_LOG (some other
> headers contain a commented out declaration of FORCE_PR_LOG so I took those
> out of the count). 65 headers are declaring macros that use PR_LOG() and 439
> source files are using PR_LOG() directly; from this data it seems to me that
> most of our logging is done for diagnostics which we don't need in release
> builds.

One of the big requests here is that we can enable logging in non-debug builds at runtime. This is particularly useful because debug builds are perceived as too slow, so plenty of devs don't use them. Additionally it would be nice to be able to have users turn on logging (lets say in nightly/aurora) to help debug issues.

I do like having stats, so here are a few more datapoints:
  - printf_stderr  -  289 instances, 108 files
  - __android_log  -  426 instances, 145 files
  - fprintf(stderr - 5184 instances, 695 files
  - PR_LOG         - 4347 instances, 505 files
(In reply to Eric Rahm [:erahm] from comment #58)
> gsvelto sounds like he's for it

Note that I'm not strongly biased towards using NSPR logging; I'm just being pragmatic and since we're already using it everywhere it makes more sense to improve it than to replace it IMHO. We already have lots of duplicated utility functionality in Gecko and I don't really feel like adding more.

That being said if a new logging system that is measurably better than NSPR logging can be implemented in a reasonable amount of time I'm all for it.

It's just that I'd start by fixing the biggest snags in PR_LOG() and make sure we use the same logging system everywhere instead of our mix of logging methods. Note that we've also got a ton of raw printf()s with roughly half of them being in external code we import and the other half being our own stuff (ugh). BTW we've got a handful of g_printerr()s too.

Finally I'll add another point to your list of things that need to be addressed: the configure script option(s). The --disable-logging option defines NS_DISABLE_LOGGING which isn't used anywhere in our code, when it's not used it defines MOZ_LOGGING which is used in turn to define FORCE_PR_LOG in some files. Maybe we could turn that option into a 3-way choice:

- disable: doesn't define anything
- MOZ_LOGGING-only: defines MOZ_LOGGING so that you get logs from the files requesting it, this would correspond to our current release builds
- force: defines FORCE_PR_LOG globally
Do we consider redirecting stdout/stderr? On b2g we usually have both pointing to /dev/null, but sometimes we might want to see the content written to either or both.
(In reply to Cervantes Yu from comment #61)
> Do we consider redirecting stdout/stderr? On b2g we usually have both
> pointing to /dev/null, but sometimes we might want to see the content
> written to either or both.

This is an interesting idea. Where would you want to redirect the output to? I could see us outputing to a "stdout" module where we'd get the same formatting as other loggers.

Also can you point to me to where we're currently redirecting to /dev/null?
I did a test where I enabled PR_LOGGING on a Linux 64 opt build. I ran a baseline unmodified build against mochitest on tpbl 3X [1] and my modified build 3X [2]. I then went through the full log for each run and extracted the amount of time in seconds for the test suite to run (so excluding various VM setup steps).

tpbl tests seem to vary a fair amount, but with the average of the 3 runs we should get a reasonable idea of how much overhead is added by having logging enabled (not outputting anything). The results looked pretty good to me:

Mochitest	1	2	3	4	5	bc1	bc2	bc3	dt	oth
avg w/o logging	1690	715	422	552	502	1574	131	673	2315	942
avg w/  logging	1692	738	423	558	472	1437	138	646	2340	979
% change	        0.12%	3.26%	0.32%	1.21%	-5.97%	-8.73%	5.60%	-4.06%	1.09%	3.86%

So the greatest %increase was for bc2 (5.6%), but that came out to running 7 seconds longer. The greatest increase by time was on oth at 37 seconds. Other tests actually ran faster. 

With these results I don't see a compelling case _not_ to turn on logging for non-debug builds (although we may still want to disable for proper releases). To be clear by "turn on logging" I just mean have PR_LOG (or it's equivalent) not #defined to a no-op.

[1] https://tbpl.mozilla.org/?tree=Try&rev=6dffff85a149
[2] https://tbpl.mozilla.org/?tree=Try&rev=69d5cb95bb74
(In reply to Eric Rahm [:erahm] from comment #63)

Thanks for running the test, that's a great first step.  I like the idea of turning on logging by default because of the benefits that brings to everyone. However, I would argue that the threading issues have to be fixed first unless we are going to release with logging on.  The reason for that is that if all code acquires a global lock for every log call, then that will have an affect on the timing of all async operations that perform logging (and with the number of things going off-main-thread/out-of-process, that's almost everything. Therefore all the builds we test (with logging enabled) will have different async/non-deterministic behaviors than those that we release.

I want to minimize the differences in async/non-deterministic behavior between what we release and what we test as much as possible. And while I know that's not at all possible, we should at least endeavor not to make that divide worse.
(In reply to Clint Talbert ( :ctalbert ) from comment #64)
> (In reply to Eric Rahm [:erahm] from comment #63)
> 
> ... if all code acquires a global lock for every log call, then that
> will have an affect on the timing of all async operations that perform
> logging (and with the number of things going off-main-thread/out-of-process,
> that's almost everything. Therefore all the builds we test (with logging
> enabled) will have different async/non-deterministic behaviors than those
> that we release.

Luckily the situation isn't as bad as I implied, so hopefully this is a non-issue. By default a log module's output level is "None" -- even with logging "enabled" (in the build sense) nothing is output by default.

To elaborate on the global locking issue:
  - The lock is *not* needed to check if logging for that module is enabled [1].
  - The lock is only needed if we actually end up outputting something. This is in order to protect an internal buffer of log messages [2] and prevents multi-threaded access to the output file descriptor [3].

[1] http://hg.mozilla.org/mozilla-central/annotate/5e704397529b/nsprpub/pr/include/prlog.h#l165
[2] http://hg.mozilla.org/mozilla-central/annotate/5e704397529b/nsprpub/pr/src/io/prlog.c#l517
[3] http://hg.mozilla.org/mozilla-central/annotate/5e704397529b/nsprpub/pr/src/io/prlog.c#l509
(In reply to Eric Rahm [:erahm] from comment #65)
> To elaborate on the global locking issue:
>   - The lock is *not* needed to check if logging for that module is enabled
> [1].

If you want to enable logging dynamically, you'll need either a lock or an atomic access for this check.
(In reply to Joshua Cranmer [:jcranmer] from comment #66)
> (In reply to Eric Rahm [:erahm] from comment #65)
> > To elaborate on the global locking issue:
> >   - The lock is *not* needed to check if logging for that module is enabled
> > [1].
> 
> If you want to enable logging dynamically, you'll need either a lock or an
> atomic access for this check.

I think we can sidestep this by saying log levels can only be modified by the main thread, but read from any other thread.

To quote jlebar:
> This is safe because 32-bit [reads/]writes are atomic on all platforms we care about
(In reply to Eric Rahm [:erahm] from comment #67)
> (In reply to Joshua Cranmer [:jcranmer] from comment #66)
> > (In reply to Eric Rahm [:erahm] from comment #65)
> > > To elaborate on the global locking issue:
> > >   - The lock is *not* needed to check if logging for that module is enabled
> > > [1].
> > 
> > If you want to enable logging dynamically, you'll need either a lock or an
> > atomic access for this check.
> 
> I think we can sidestep this by saying log levels can only be modified by
> the main thread, but read from any other thread.
> 
> To quote jlebar:
> > This is safe because 32-bit [reads/]writes are atomic on all platforms we care about

Hell balls no it ain't safe--the compiler can still cache reads if they aren't atomic. Case in point:

void SomeThread() {
  PRLogModule *logger = PR_GetLogModule();
  while (!shutdown) {
     GetSomeEvent();
     PR_LOG(logger, event details);
     ProcessEvent();
  }
}

The compiler is perfectly within its rights to read the log level outside the loop.
I still don't think you need a lock.

You need the access to be volatile so that it doesn't get cached in a register, but reading or writing a 32-bit word is atomic (in that you will never read/write 16-bits of one value and 16-bits of another value). The 32-bit value you get will always be self-consistent. If the access is non-volatile, it may very well be stale (because you're reading a cached value).
It would also be nice to get rid of the LEVEL numbering and alter it with something smarter like layer chain.  One example is our http stack:  starts at nsHttpHandler, goes down to nsHttpChannel, then nsHttpTransaction, nsHttpConnection, nsSocketTransport and then down to another module (NSPR).  I'd like to find an API to express the dependencies chain (that nsHttpTransaction is under nsHttpChannel etc) and then be able to set a module with the level depth somehow like "network:transaction".

Just an idea..
(In reply to Dave Hylands [:dhylands] from comment #69)
> I still don't think you need a lock.
> 
> You need the access to be volatile so that it doesn't get cached in a
> register, but reading or writing a 32-bit word is atomic (in that you will
> never read/write 16-bits of one value and 16-bits of another value). The
> 32-bit value you get will always be self-consistent. If the access is
> non-volatile, it may very well be stale (because you're reading a cached
> value).

Is volatile enough to make sure the compiler/linker doesn't do optimizations like reading the value outside of the loop as suggested by jcranmer?

Either way I'm going to do two runs, which will hopefully make it clear how much this matters:
  - one w/ an atomic read of the module level
  - one that acquires a lock local to the module prior to reading
Adding results for using an atomic read [1] and using a lock to guard access [2] of the module log level.  

Locking seems a bit slower (though not blatantly so), atomics don't appear to add much overhead. Either way there's no significant increase, so to satisfy jcranmer's concerns using atomic read/write seems reasonable. 

> Mochitest	1	2	3	4	5	bc1	bc2	bc3	dt	oth
> w/o logging	1690	715	422	552	502	1574	131	673	2315	942
> w/     	1692	738	423	558	472	1437	138	646	2340	979
> w/ +atomics	1628	718	448	537	465	1435	133	685	2359	969
> w/ +locks	1713	744	441	558	469	1474	136	654	2433	1026

[1] https://tbpl.mozilla.org/?tree=Try&rev=8214b6280888
[2] https://tbpl.mozilla.org/?tree=Try&rev=3e383c1c3500
(In reply to Eric Rahm [:erahm] from comment #62)
> (In reply to Cervantes Yu from comment #61)
> > Do we consider redirecting stdout/stderr? On b2g we usually have both
> > pointing to /dev/null, but sometimes we might want to see the content
> > written to either or both.
> 
> This is an interesting idea. Where would you want to redirect the output to?
> I could see us outputing to a "stdout" module where we'd get the same
> formatting as other loggers.

We can redirect the content written to stdout or stderr to the logging system. Adding a tag or label saying that it's redirection of stdout or stderr would be good so we can filter the redirected messages easily.

> Also can you point to me to where we're currently redirecting to /dev/null?

We don't redirect stdout/stderr to /dev/null. I think it's what init (pid=1) does for us and is common practice for daemon processes.
(In reply to Eric Rahm [:erahm] from comment #63)
> With these results I don't see a compelling case _not_ to turn on logging
> for non-debug builds (although we may still want to disable for proper
> releases). To be clear by "turn on logging" I just mean have PR_LOG (or it's
> equivalent) not #defined to a no-op.

We should check how much this bloats the final executable but I don't think a bunch of strings and functions calls should make much of a difference.

BTW as an additional performance-mitigation feature we might want to use something like __builtin_expect() in PR_LOG_TEST() to inform the compiler that the check is biased towards not been taken. This shouldn't have any visible effect on desktop builds but might help for FxOS where we run on Cortex A5/A7 processors which have relatively small & weak branch predictors.

Also we should do some performance tests there because the impact might be measurably different than on the desktop (and the increase in executable size is more likely to have an impact).

Also would we want the logging to be enable in all builds (including production ones) or just nightly/aurora for desktop and eng/userdebug for FxOS? I'd be for the former choice because that's where it'd bring the most benefit without risking any burden to our production builds.
Depends on: 1073578
(In reply to Gabriele Svelto [:gsvelto] from comment #74)
> We should check how much this bloats the final executable but I don't think
> a bunch of strings and functions calls should make much of a difference.

Good point, stats in bytes below:
>				no logging	logging		delta	percent change
> compressed linux64 tarball	52,118,232	52,194,342	76,110	0.15%
> uncompressed linux64 tarball	121,528,320	121,794,560	266,240	0.22%
> uncompressed b2g		39,415,808	39,657,472	241,664	0.61%

So on desktop our "distro" is 0.22% larger when uncompressed and on b2g the /system/b2g dir is 0.61% larger. While not insignificant, I don't think it's large enough to warrant much concern when weighed against the benefits of being able to turn on logging in these builds.

> BTW as an additional performance-mitigation feature we might want to use
> something like __builtin_expect() in PR_LOG_TEST() to inform the compiler
> that the check is biased towards not been taken. This shouldn't have any
> visible effect on desktop builds but might help for FxOS where we run on
> Cortex A5/A7 processors which have relatively small & weak branch predictors.

This seems reasonable. I know we have MOZ_UNLIKELY or something to that effect, I'm not sure if NSPR has an equivalent but I'm sure we can port something over.

> Also we should do some performance tests there because the impact might be
> measurably different than on the desktop (and the increase in executable
> size is more likely to have an impact).

This seems pretty important, I'll see if I can work out how to do perf testing on a device.

> Also would we want the logging to be enable in all builds (including
> production ones) or just nightly/aurora for desktop and eng/userdebug for
> FxOS? I'd be for the former choice because that's where it'd bring the most
> benefit without risking any burden to our production builds.

Including only in nightly/aurora and eng/userdebug seems reasonable. We could just tie inclusion/exclusion to the |--disable-logging| flag and make sure production builds are using that.
Product: Firefox OS → Core
Depends on: 1073712
Depends on: 1072605
While looking through PR_LOG() related bugs I've stumbled upon bug 806819 (Define FORCE_PR_LOG globally so release builds have all NSPR logging available). We might want to move the discussion about enabling PR_LOG() globally there since it's been around for some time already.

BTW shall we turn this into a meta-bug to coordinate all logging-related activity?
Depends on: 806819
Depends on: 1074149
(In reply to Gabriele Svelto [:gsvelto] from comment #76)
> While looking through PR_LOG() related bugs I've stumbled upon bug 806819
> (Define FORCE_PR_LOG globally so release builds have all NSPR logging
> available). We might want to move the discussion about enabling PR_LOG()
> globally there since it's been around for some time already.

Perfect, I'm glad there is already some support for this. I have some blocking bugs due to build bustage I'll transfer over to that.

> BTW shall we turn this into a meta-bug to coordinate all logging-related
> activity?

That sounds good.
No longer depends on: 1072605
No longer depends on: 1073712
:hub can you (point me to someone who can) help running perf tests on a b2g device [1] to determine the performance impact of enabling PR_LOGGING in non-debug builds?

I attempted to run some gaia perf tests as documented [2] but was unable to get them working.

[1] See comment 74 for a bit more context.
[2] https://developer.mozilla.org/en-US/Firefox_OS/Platform/Automated_testing/Gaia_performance_tests
Flags: needinfo?(hub)
providing help on IRC
Flags: needinfo?(hub)
(In reply to Eric Rahm [:erahm] from comment #71)
> (In reply to Dave Hylands [:dhylands] from comment #69)
> > I still don't think you need a lock.
> > 
> > You need the access to be volatile so that it doesn't get cached in a
> > register, but reading or writing a 32-bit word is atomic (in that you will
> > never read/write 16-bits of one value and 16-bits of another value). The
> > 32-bit value you get will always be self-consistent. If the access is
> > non-volatile, it may very well be stale (because you're reading a cached
> > value).
> 
> Is volatile enough to make sure the compiler/linker doesn't do optimizations
> like reading the value outside of the loop as suggested by jcranmer?

No.  Volatile sucks.  It isn't what people think it is.  It is NOT a cross-thread safety tool.

C++/C is not ASM.  You do NOT know how the compiler will schedule reads (referring to the >> quotes)

The REALLY nasty thing would be that if you introduce a TSAN race (write X on Thread A and read X on Thread B/C/etc) without an atomic or barriers/locks, you can truly screw things.  The core of the TSAN data-race problem is that the compiler is allowed (by spec!) when writing 2 to X (which used to have 1) to temporarily use it to store a pointer to the nuclear launch codes.  Or your bank account number.  Or anything else.  And threads B/C/etc can end up reading this wrong value.  And no volatile doesn't save you.

Atomic works.  However, I'm told Atomic sucks perf-wise on most ARMs, so it can be used and it's better than a full lock, but don't go hitting it all the time.  Locks work, of course.  But they suck even more perf-wise (and IIRC even more so on windows).

> Either way I'm going to do two runs, which will hopefully make it clear how
> much this matters:
>   - one w/ an atomic read of the module level
>   - one that acquires a lock local to the module prior to reading

> Locking seems a bit slower (though not blatantly so), atomics don't appear to add much overhead.
> Either way there's no significant increase, so to satisfy jcranmer's concerns using atomic
> read/write seems reasonable. 

I'd say locking seems noticeably slower.

Please please PLEASE do your perf runs on all major OSes, and also on Android & B2G, and if possible on low-end/consumer HW not Xeons/core-i7 ivybridge.  And please DON'T use TBPL times.  Those are mostly VMs on AWS - they will not give very good data on real-world perf.  TBPL run-times are NOT great (or likely even good) perf tests!
(In reply to Randell Jesup [:jesup] from comment #80)
> (In reply to Eric Rahm [:erahm] from comment #71)
> > (In reply to Dave Hylands [:dhylands] from comment #69)
> > > I still don't think you need a lock.
> > > 
> > > You need the access to be volatile so that it doesn't get cached in a
> > > register, but reading or writing a 32-bit word is atomic (in that you will
> > > never read/write 16-bits of one value and 16-bits of another value). The
> > > 32-bit value you get will always be self-consistent. If the access is
> > > non-volatile, it may very well be stale (because you're reading a cached
> > > value).
> > 
> > Is volatile enough to make sure the compiler/linker doesn't do optimizations
> > like reading the value outside of the loop as suggested by jcranmer?
> 
> No.  Volatile sucks.  It isn't what people think it is.  It is NOT a
> cross-thread safety tool.

This specific question is not really about thread safety (well it was, but this is a slightly different point), but rather if it's enough to tell compilers to not do overzealous optimizations such as jcranmer proposed.

> 
> C++/C is not ASM.  You do NOT know how the compiler will schedule reads
> (referring to the >> quotes)
> 
> The REALLY nasty thing would be that if you introduce a TSAN race (write X
> on Thread A and read X on Thread B/C/etc) without an atomic or
> barriers/locks, you can truly screw things.  The core of the TSAN data-race
> problem is that the compiler is allowed (by spec!) when writing 2 to X
> (which used to have 1) to temporarily use it to store a pointer to the
> nuclear launch codes.  Or your bank account number.  Or anything else.  And
> threads B/C/etc can end up reading this wrong value.  And no volatile
> doesn't save you.

This is logging. We're talking about setting log levels, if it takes a few milliseconds to take hold that's not the end of the world. If it _never_ takes hold that's a bigger issue. If we can somehow get a corrupt value that's definitely a big issue.

> Atomic works.  However, I'm told Atomic sucks perf-wise on most ARMs, so it
> can be used and it's better than a full lock, but don't go hitting it all
> the time.  Locks work, of course.  But they suck even more perf-wise (and
> IIRC even more so on windows).

For ARM (read: b2g) I definitely want to do some perf testing (I'm still working out how to do that, our existing perf tests target primarily startup time).

As far as Windows goes "mutexes" are known to be slow, on the other hand CriticalSections are reasonably performant, I'll check what we're doing under the hood. We also have to remember that these locks will generally be uncontended.

> 
> > Either way I'm going to do two runs, which will hopefully make it clear how
> > much this matters:
> >   - one w/ an atomic read of the module level
> >   - one that acquires a lock local to the module prior to reading
> 
> > Locking seems a bit slower (though not blatantly so), atomics don't appear to add much overhead.
> > Either way there's no significant increase, so to satisfy jcranmer's concerns using atomic
> > read/write seems reasonable. 
> 
> I'd say locking seems noticeably slower.
> 
> Please please PLEASE do your perf runs on all major OSes, and also on
> Android & B2G, and if possible on low-end/consumer HW not Xeons/core-i7
> ivybridge.  And please DON'T use TBPL times.  Those are mostly VMs on AWS -
> they will not give very good data on real-world perf.  TBPL run-times are
> NOT great (or likely even good) perf tests!

If we (mozilla) have a test lab with these devices that would be super helpful (I'm going to look into this). In the meantime I have to work with what I have access to, which is TBPL (my macbook and some phones). Talos might be another option, although from what I can tell that's running on VMs as well.

Another option is to get a crew of people to run the same suite of tests locally on a diverse set of hardware.
(In reply to Eric Rahm [:erahm] from comment #58)
> * If we switch away from it, NSPR will still be using it so we'll still end
> up with multiple log systems

FWIW, I think we should move away from it, and make PR_LOG piggyback on top of whatever we end up with (and there are many ways in which we can do that without touching nspr).
(In reply to Randell Jesup [:jesup] from comment #80)
> No.  Volatile sucks.  It isn't what people think it is.  It is NOT a
> cross-thread safety tool.

Nobody said it was here, what volatile doesn't guarantee is ordering of regular memory accesses WRT volatile accesses (it only guarantees ordering of volatile accesses WRT other volatile accesses). But we don't care about memory ordering at all here (compiler or hardware). What was proposed was to set only a single field from a single thread - the main thread - and have others check that field when logging.

> Atomic works.  However, I'm told Atomic sucks perf-wise on most ARMs, so it
> can be used and it's better than a full lock, but don't go hitting it all
> the time.  Locks work, of course.  But they suck even more perf-wise (and
> IIRC even more so on windows).

If we cared about ordering issues here then atomics wouldn't do us any good either; we'd need a mutex or an explicit memory barrier to prevent hardware from reordering accesses around the level check.
(In reply to Gabriele Svelto [:gsvelto] from comment #84)
> (In reply to Randell Jesup [:jesup] from comment #80)
> > No.  Volatile sucks.  It isn't what people think it is.  It is NOT a
> > cross-thread safety tool.
> 
> Nobody said it was here, what volatile doesn't guarantee is ordering of
> regular memory accesses WRT volatile accesses (it only guarantees ordering
> of volatile accesses WRT other volatile accesses). But we don't care about
> memory ordering at all here (compiler or hardware). What was proposed was to
> set only a single field from a single thread - the main thread - and have
> others check that field when logging.
> 
> > Atomic works.  However, I'm told Atomic sucks perf-wise on most ARMs, so it
> > can be used and it's better than a full lock, but don't go hitting it all
> > the time.  Locks work, of course.  But they suck even more perf-wise (and
> > IIRC even more so on windows).
> 
> If we cared about ordering issues here then atomics wouldn't do us any good
> either; we'd need a mutex or an explicit memory barrier to prevent hardware
> from reordering accesses around the level check.

+1
(In reply to Gabriele Svelto [:gsvelto] from comment #84)
> If we cared about ordering issues here then atomics wouldn't do us any good
> either; we'd need a mutex or an explicit memory barrier to prevent hardware
> from reordering accesses around the level check.

Can you clarify what you mean her?  The whole point of atomics is to provide the necessary memory barriers without having to write nine different #ifdefs of architecture-specific code.
(In reply to Nathan Froyd (:froydnj) from comment #86)
> Can you clarify what you mean her?  The whole point of atomics is to provide
> the necessary memory barriers without having to write nine different #ifdefs
> of architecture-specific code.

This is a little OT but still: C++ atomics and similar compiler built-ins provide ready-made primitives for atomic operations on a single operand. However they make no guarantee of ordering WRT other memory accesses so for example you cannot implement a mutex with them unless you couple them with appropriate memory barriers (save maybe on x86 where LOCK-prefixed instructions also work as a memory barrier IIRC; on weakly ordered architectures like ARM or MIPS ldrex/strex and ll/sc couples are used to implement atomics but they don't enforce any ordering outside of the cacheline they touch).
(In reply to Gabriele Svelto [:gsvelto] from comment #87)
> (In reply to Nathan Froyd (:froydnj) from comment #86)
> > Can you clarify what you mean her?  The whole point of atomics is to provide
> > the necessary memory barriers without having to write nine different #ifdefs
> > of architecture-specific code.
> 
> This is a little OT but still: C++ atomics and similar compiler built-ins
> provide ready-made primitives for atomic operations on a single operand.
> However they make no guarantee of ordering WRT other memory accesses

C++11 section 1.10 explicitly disagrees with you. For example, clause 5:
The library defines a number of atomic operations (Clause 29) and operations on mutexes (Clause 30) that are specially identified as synchronization operations. These operations play a special role in making assignments in one thread visible to another.

The entire point of atomics is for cross-thread memory ordering rules.
(In reply to Gabriele Svelto [:gsvelto] from comment #84)
> (In reply to Randell Jesup [:jesup] from comment #80)
> > No.  Volatile sucks.  It isn't what people think it is.  It is NOT a
> > cross-thread safety tool.
> 
> Nobody said it was here, what volatile doesn't guarantee is ordering of
> regular memory accesses WRT volatile accesses (it only guarantees ordering
> of volatile accesses WRT other volatile accesses). But we don't care about
> memory ordering at all here (compiler or hardware). What was proposed was to
> set only a single field from a single thread - the main thread - and have
> others check that field when logging.

Sure - and that's almost the textbook example of a TSAN violation unless it's atomic.

https://software.intel.com/en-us/blogs/2013/01/06/benign-data-races-what-could-possibly-go-wrong
(In reply to Joshua Cranmer [:jcranmer] from comment #88)
> C++11 section 1.10 explicitly disagrees with you. For example, clause 5:
> The library defines a number of atomic operations (Clause 29) and operations
> on mutexes (Clause 30) that are specially identified as synchronization
> operations. These operations play a special role in making assignments in
> one thread visible to another.
> 
> The entire point of atomics is for cross-thread memory ordering rules.

All C++11 atomics provide an additional parameter to specify the type of ordering rules you want for other acesses; these range from no ordering at all (memory_order_relaxed) to sequential ordering (memory_order_seq_cst). When used appropriately these should introduce the necessary barriers for the selected ordering. To guarantee ordering (as a mutex would) you have to opt for sequential ordering (SequentiallyConsistent in MFBT speak) but that's practically as bad performance-wise as using a mutex.

To get back to the topic here: using atomics w/o a stronger memory ordering option (i.e. memory barriers) doesn't make the potential change here any more correct than using volatile. Coupling it with appropriate barriers is practically equivalent to using mutexes.
(In reply to Gabriele Svelto [:gsvelto] from comment #90)
> To guarantee ordering (as a mutex would) you have to
> opt for sequential ordering (SequentiallyConsistent in MFBT speak) but
> that's practically as bad performance-wise as using a mutex.

No, you're not quite right here. *Only* memory_order_relaxed provides no memory ordering guarantees (beyond cache coherency and intra-thread ordering). memory_order_release/memory_order_acquire pairs actually provide a suitably strong ordering guarantee most of the time.

> To get back to the topic here: using atomics w/o a stronger memory ordering
> option (i.e. memory barriers) doesn't make the potential change here any
> more correct than using volatile. Coupling it with appropriate barriers is
> practically equivalent to using mutexes.

Mutexes tend to be sequentially consistent. Release/Acquire pairing is generally faster (only two barriers instead of three)--hell, on x86, release/acquire requires no memory barriers!
(In reply to Randell Jesup [:jesup] from comment #89)
> Sure - and that's almost the textbook example of a TSAN violation unless
> it's atomic.

We're not talking about declaring a variable volatile, it's a field and it's not ordering access to a shared resource. In fact there's no sharing here at all except for the field itself.
(In reply to Joshua Cranmer [:jcranmer] from comment #91)
> No, you're not quite right here. *Only* memory_order_relaxed provides no
> memory ordering guarantees (beyond cache coherency and intra-thread
> ordering). memory_order_release/memory_order_acquire pairs actually provide
> a suitably strong ordering guarantee most of the time.

That's the "most of the time" part that is the issue. If you don't know that you actually need specifically release/acquire semantics (e.g. a produce/consume pattern) then it's not as good as using a mutex and for generic code can introduce subtle bugs. Besides the performance jump between relaxed and release/acquire is 1-2 orders of magnitude depending on the implementation on ARM. They're not in the same ballpark.

> Mutexes tend to be sequentially consistent. Release/Acquire pairing is
> generally faster (only two barriers instead of three)--hell, on x86,
> release/acquire requires no memory barriers!

Yeah but my point here is that nobody has pointed out why it would be needed here in the case of checking the logging level so it's impossible to establish which guarantee would be needed. My take is that we don't need any besides preventing the compiler from caching the loaded field as there's no sharing at all involved.
(In reply to Mike Hommey [:glandium] from comment #83)
> (In reply to Eric Rahm [:erahm] from comment #58)
> > * If we switch away from it, NSPR will still be using it so we'll still end
> > up with multiple log systems
> 
> FWIW, I think we should move away from it, and make PR_LOG piggyback on top
> of whatever we end up with (and there are many ways in which we can do that
> without touching nspr).

Can you elaborate on why you'd like to move away from it? It could be helpful to guide our final decision.
(In reply to Eric Rahm [:erahm] from comment #94)
> Can you elaborate on why you'd like to move away from it? It could be
> helpful to guide our final decision.

Because it has a horrible and broken API. Even fixing details like locking won't cut it. Part of it is that it forces you to have explicit initialization, which most of the time requires static initializers, or jumping through hoops not to use static initializers.
Just a reminder, we have a few bugs split out that might be better suited for some of the conversation going on right now:

- Bug 1074149 - Allow setting a PR_LOG() module's logging level at runtime
  - Do we need atomics, mutexes, or not for accessing log levels at runtime?
  - What's the perf impact of each of these options?
- Bug 806819 - Define FORCE_PR_LOG globally so release builds have all NSPR logging available
  - Should we do this? Should it be for all releases or just nightly/aurora?
  - What's the perf impact?
(In reply to Mike Hommey [:glandium] from comment #95)
> (In reply to Eric Rahm [:erahm] from comment #94)
> > Can you elaborate on why you'd like to move away from it? It could be
> > helpful to guide our final decision.
> 
> Because it has a horrible and broken API. Even fixing details like locking
> won't cut it. Part of it is that it forces you to have explicit
> initialization, which most of the time requires static initializers, or
> jumping through hoops not to use static initializers.

Oh, and most importantly, I'd rather see less use of NSPR than more.
(In reply to Mike Hommey [:glandium] from comment #97)
> (In reply to Mike Hommey [:glandium] from comment #95)
> > (In reply to Eric Rahm [:erahm] from comment #94)
> > > Can you elaborate on why you'd like to move away from it? It could be
> > > helpful to guide our final decision.
> > 
> > Because it has a horrible and broken API. Even fixing details like locking
> > won't cut it. Part of it is that it forces you to have explicit
> > initialization, which most of the time requires static initializers, or
> > jumping through hoops not to use static initializers.
> 
> Oh, and most importantly, I'd rather see less use of NSPR than more.

These are both good points, it sounds like one requirement you have is that whatever system we choose should discourage the use of static initializers. Do you have a proposal on how we can avoid this? Are function-level statics good enough, ie:
> getLog() { static log; return log; }
Do you have a more specific idea on what type of API would avoid this behavior?

Your second point, though not specific, highlights what feels like a general desire throughout the project to require less NSPR in our codebase. If you have any specific reasons or can link to previous bugs and discussion that would certainly be helpful.
See Also: → 1055227
(In reply to Eric Rahm [:erahm] from comment #94)

[ PR_Log ]
> Can you elaborate on why you'd like to move away from it? It could be
> helpful to guide our final decision.

I'd like to weigh in with some observations from bug 1060419 and bug 553032.
Those bugs are concerned with marking various functions with GCC's
printf attribute, to provide format checking.  Doing this uncovers a
large number of nit-picky bugs in the logging code (stuff like using %x
to format a pointer -- not 64-bit clean), but also a smaller number of actual
bugs in non-logging code.  So, it's a worthwhile thing to do.

However, it can't really be done, because PR_Log (and the JS copy of the
underlying code) doesn't actually interpret its flags the same way that
printf does.  For example, nspr takes "%l" to mean "64 bit integer", rather
than "long".

Also, nspr (and js) don't play well with mfbt/SizePrintfMacros.h.  Neither
of these implementations handle the 'z' or 'I' specifiers.

I think these are reasons to avoid nspr and instead, if a printf-like (and not
streaming) approach is used, to base it on the underlying standard *printf family.
In particular I was very surprised -- and I think others would be too -- to discover
that some "printf" calls in the tree use the same specifiers with meanings different
from their ordinary ones.


One other wrinkle here is that js adds the "%hs" format for printing char16_t*
strings.  There is no standard way to do this (whoops!).  Here a few choices
present themselves:

* Go ahead and add "%hs" and then write a compiler plugin to do the checking.
  Ouch, but it would work.

* Use a standard printf under the hood and require a converter when printing char16_t*;
  this would have to allocate but given the "ExpensiveFunction()" discussion above
  it seems doable.

* Only present a streaming API, where this is a non-issue.
(In reply to Tom Tromey :tromey from comment #100)

> Also, nspr (and js) don't play well with mfbt/SizePrintfMacros.h.  Neither
> of these implementations handle the 'z' or 'I' specifiers.

Bug 1088790 adds support for %zu -- though it hasn't landed yet.
Depends on: 1165515
Depends on: 1174785
Depends on: 1174972
Depends on: 1219461
Depends on: 1223222
Depends on: 1233881
Depends on: 1297323
Depends on: 1334042
Comment on attachment 798046 [details] [diff] [review]
WIP rewrite of part 1 (the main event): Add the One Logger.

We've diverged pretty far from this patch at this point and moved on to tracking improvements in blocking bugs.
Attachment #798046 - Attachment is obsolete: true
Moving this over to a meta bug.
Assignee: erahm → nobody
Summary: Improve Gecko's logging story → [meta] Improve Gecko's logging story
Depends on: 1259807
Depends on: 1617441
Severity: normal → S3
Depends on: about_logging
Depends on: 1828100, 1828156
Depends on: 1430810
Depends on: 1838086
Depends on: 1862989
See Also: → 1881519
See Also: → 1889052
Depends on: 1891789
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: