Closed Bug 1244306 Opened 8 years ago Closed 8 years ago

Use circular buffers for mozilla logging file

Categories

(Core :: XPCOM, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla51
Tracking Status
firefox51 --- fixed

People

(Reporter: jaws, Assigned: mayhemer)

References

Details

(Keywords: dev-doc-complete)

Attachments

(1 file, 5 obsolete files)

As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1239686#c10, networking logs can grow to unbounded size. We should limit these logs from overtaking too much harddrive space and also making the data harder to analyze.

One approach would be to use a circular buffer and restrict the size to some pref'd limit.
Whiteboard: [necko-would-take]
Moving to the right component.  This must happen inside xpcom/logging.
Component: Networking → XPCOM
Whiteboard: [necko-would-take]
Summary: Use circular buffers for networking logs → Use circular buffers for mozilla logging file
How should this work in the presence of multiple active logging modules?  Or is the expectation that we'll just hold the most recent N log messages in memory and flush them to disk at application exit or similar?
Flags: needinfo?(honzab.moz)
(In reply to Nathan Froyd [:froydnj] from comment #2)
> How should this work in the presence of multiple active logging modules?

It would be unrelated.

>  Or
> is the expectation that we'll just hold the most recent N log messages in
> memory and flush them to disk at application exit or similar?

No memory, no.  I have two ideas:

- have two files, we will write alternately to one or other with just some limit ; files will overwrite

- have a single file and when it reaches the limit, we simply start from the beginning, but not overwriting it ; simply write over the already written lines, this would mean to always use timestamp to find the beginning and probably also always append some trailing blank lines after the last written line.

Like:

...
2016-06-06 11:25:50.540000 UTC - [Main Thread]: D/nsHttp nsHttpAuthCache::ClearAll
2016-06-06 11:25:50.540000 UTC - [Main Thread]: D/nsHttp nsHttpAuthCache::ClearAll
2016-06-06 11:25:50.543000 UTC - [Main Thread]: D/cache2 CacheFileIOManager::Shutdown() [gInstance=0]

[[ LOG BEGIN ]]

che2 CacheFileIOManager::Init()
2016-06-06 11:24:42.493000 UTC - [Main Thread]: D/cache2 CacheFileHandles::CacheFileHandles() [this=c1f1f50]
2016-06-06 11:24:42.493000 UTC - [Main Thread]: D/cache2 CacheFileIOManager::CacheFileIOManager [this=c1f1f20]
...


the "\n\n[[ LOG BEGIN ]]\n\n" part would be written after every line we write to the log ; on every following line, we'd seek a bit back to simply overwrite it in file.  some small additional IO, but I think bearable and the result is the same set of files we have now.
Flags: needinfo?(honzab.moz)
(In reply to Honza Bambas (:mayhemer) from comment #3)
> the "\n\n[[ LOG BEGIN ]]\n\n" part would be written after every line we
> write to the log ; on every following line, we'd seek a bit back to simply
> overwrite it in file.  some small additional IO, but I think bearable and
> the result is the same set of files we have now.

Unfortunately doesn't work.  I forgot LogModuleManager::Print is not thread safe enough to do seek/write/write.  The method does only one write that is atomic by crt.  One option would be to add a lock.
I'm not really sure this is the patch we should land, but I'd like to get feedback anyway.

Other option would be to have an array of files with limited size (5MB?) like my.log.0001, my.log.0002 etc.. and we would simply delete the oldest files automatically.
Attachment #8762103 - Flags: feedback?(erahm)
Comment on attachment 8762103 [details] [diff] [review]
v1 (circulate one file, added lock, file is binary)

Review of attachment 8762103 [details] [diff] [review]:
-----------------------------------------------------------------

If we want to do this I'd prefer we go with the logrotate option.

Other points to address:
  - Should the log file size limit apply to the sum of all logs (content processes included) or should it be per log?
  - Should we consider cleaning up old logs that included the process PID? We might want to break out a follow up for this.

::: xpcom/base/Logging.cpp
@@ +232,5 @@
>      FILE* out = mOutFile ? mOutFile : stderr;
>  
> +    static const char kCircularTrailer[] = "\n[[HEAD]]\n\n";
> +    // Will be filled by fprintf result of writing the above string.
> +    static int circularTrailerSeekBack = 0;

This could just be a constant.

@@ +234,5 @@
> +    static const char kCircularTrailer[] = "\n[[HEAD]]\n\n";
> +    // Will be filled by fprintf result of writing the above string.
> +    static int circularTrailerSeekBack = 0;
> +
> +    OffTheBooksMutexAutoLock guard(mIOLock);

Up to this point we've been able to avoid locking while logging, I'd hate to have to change this.

@@ +299,5 @@
> +
> +    if (mAddTrailer) {
> +      MOZ_ASSERT(out != stderr);
> +      MOZ_ASSERT(mFileSizeLimit);
> +      circularTrailerSeekBack = fprintf(out, kCircularTrailer);

This whole thing is going to get weird if you enable appending. After session A where there was a wrap we'd have something like:

logfile:
  -- newer stuff from A --
  kCircularTrailer
  -- older stuff from A --

Then we have session B which will immediately wrap:

logfile:
  -- stuff from B --
  kCircularTrailer
  -- truncated newer stuff from A --
  kCircularTrailer
  -- older stuff from A --
Attachment #8762103 - Flags: feedback?(erahm)
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Attached patch v2 (rotate 4 files per process) (obsolete) — Splinter Review
- new module "rotate" added, the arg to it (the number after ':') is number of MBs the log per-process must not go over
- we create and rotate (up to) 4 files (the number of files is hardcoded and can be easily changed)
- when log path is /path/moz.log we create /path/moz.log.0, /path/moz.log.1 etc files each having rotate/4 max size
- when all files are exhausted, we start over from 0, rewriting the existing files
- all rotate log files are deleted prior the app start, including non-rotated (see comments in the code why)
- rotate implies timestamp
- rotate doesn't support append
- no new mutex added, using just Atomics and compareExchange


The rest should be explained by comments in the code.


https://treeherder.mozilla.org/#/jobs?repo=try&revision=f859fec900f9
Attachment #8762103 - Attachment is obsolete: true
Attachment #8764620 - Flags: review?(erahm)
Even better tries:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6f13143512df00a799950f3059ac7d630ddcd515 (confirmation of functionality)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3af5bbcc27190cdbe98f005eef6b7bb5b594b3ce (actually builds correctly)

I will update the patch after review (expecting more comments anyway)
Attached patch v2 [building, backup] (obsolete) — Splinter Review
:erahm, ping on review here, this is something we really would use for diagnosing network issues.
(In reply to Honza Bambas (:mayhemer) from comment #11)
> :erahm, ping on review here, this is something we really would use for
> diagnosing network issues.

Taking a look today, sorry for the delay!
Comment on attachment 8764620 [details] [diff] [review]
v2 (rotate 4 files per process)

Review of attachment 8764620 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is heading in the right direction, a few notes I think we should discuss first though.

::: xpcom/base/Logging.cpp
@@ +30,5 @@
>  // NB: Initial amount determined by auditing the codebase for the total amount
>  //     of unique module names and padding up to the next power of 2.
>  const uint32_t kInitialModuleCount = 256;
> +// When rotate option is added to the modules list, this is the hardcoded
> +// number of files we create and rotate.  When there is rotate:40000000,

The |rotate| param is in MB, right? As in the comment should be "rotate:40"

@@ +114,5 @@
> + * release() directly on this class, which happens on two places:
> + *  - when we run out of the one-file size limit
> + *  - when we are shutting down
> + */
> +class LogFile

I think instead of reinventing ref counting we could try using atomic RefCounted [1] and friends.

[1] https://dxr.mozilla.org/mozilla-central/rev/4c05938a64a7fde3ac2d7f4493aee1c5f2ad8a0a/mfbt/RefCounted.h#195

@@ +147,5 @@
> + * Every thread must enclose usage of global LogFile instance
> + * with this stack class.  It ensures the file lives while
> + * any thread is using it (writing to it).
> + */
> +class AutoLogFile MOZ_STACK_CLASS

Then we could get rid of this, and just use RefPtr.

@@ +169,5 @@
> +  operator LogFile*() const { return mFile; }
> +  LogFile* operator->() const { return mFile; }
> +};
> +
> +struct FreePolicy { void operator()(char* p) { free(p); } };

This can go if we use |UniqueFreePtr|.

@@ +195,5 @@
>      //     its destructor.
> +    LogFile* logFile = mOutFile.exchange(nullptr);
> +    if (logFile) {
> +      logFile->release();
> +    }

If |mOutFile| is a |RefPtr| we can get rid of this as well.

@@ +412,5 @@
> +        // number.  When no other thread shifted the global file number yet,
> +        // we are the thread to open the next file.
> +        if (mOutFileNum.compareExchange(file_num, next_file_num)) {
> +          // This is a sequential consistent assignment.
> +          mOutFile = OpenFile(false, next_file_num);

This seems okay, but I think the standard log rotation logic is usually the most recent log file is always 0. So something like:

> if overflow:
>   delete foo.log.1 // just assume kRotateFilesNumber == 1
>   rename foo.log.0 => foo.log.1
>   mOutFile = open(foo.log.0)

That's a bit more involved and I don't feel too strongly about it.

@@ +431,5 @@
> +  // to this member is made before we release() any previously stored objects
> +  // in it.
> +  Atomic<LogFile*, SequentiallyConsistent> mOutFile;
> +  // Just keeps the actual file path for further use.
> +  UniquePtr<char[], FreePolicy> mOutFilePath;

We can use |UniqueFreePtr| here.

::: xpcom/tests/gtest/TestNSPRLogModulesParser.cpp
@@ +56,5 @@
>  
>    for (size_t i = 0; i < MOZ_ARRAY_LENGTH(expected); i++) {
>      bool callbackInvoked = false;
>      mozilla::NSPRLogModulesParser(currTest->first,
> +        [&](const char* aName, mozilla::LogLevel aLevel, int32_t) {

Can you expand this test to check the raw value as well?
Attachment #8764620 - Flags: review?(erahm) → feedback+
(In reply to Eric Rahm [:erahm] from comment #13)
> Comment on attachment 8764620 [details] [diff] [review]
> v2 (rotate 4 files per process)
> 
> Review of attachment 8764620 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I think this is heading in the right direction, a few notes I think we
> should discuss first though.
> 
> ::: xpcom/base/Logging.cpp
> @@ +30,5 @@
> >  // NB: Initial amount determined by auditing the codebase for the total amount
> >  //     of unique module names and padding up to the next power of 2.
> >  const uint32_t kInitialModuleCount = 256;
> > +// When rotate option is added to the modules list, this is the hardcoded
> > +// number of files we create and rotate.  When there is rotate:40000000,
> 
> The |rotate| param is in MB, right? As in the comment should be "rotate:40"
> 
> @@ +114,5 @@
> > + * release() directly on this class, which happens on two places:
> > + *  - when we run out of the one-file size limit
> > + *  - when we are shutting down
> > + */
> > +class LogFile
> 
> I think instead of reinventing ref counting we could try using atomic
> RefCounted [1] and friends.
> 
> [1]
> https://dxr.mozilla.org/mozilla-central/rev/
> 4c05938a64a7fde3ac2d7f4493aee1c5f2ad8a0a/mfbt/RefCounted.h#195

We'd really strongly prefer that new uses of RefCounted<> not be added to the tree.  I led Eric astray on IRC by suggesting RefCounted<> here, since I think the goal is to try to avoid refcount logging...or something?  But using RefCounted<> here wouldn't solve that particular problem, and is strongly discouraged in any event.

If we really need to avoid refcount logging, or leak checking, or whatever, let's be clear about what we need to do and why we need to do it, and then try adapting the existing XPCOM primitives to do that, rather than writing one-off things.
Priority: -- → P3
(In reply to Nathan Froyd [:froydnj] from comment #14)
> > [1]
> > https://dxr.mozilla.org/mozilla-central/rev/
> > 4c05938a64a7fde3ac2d7f4493aee1c5f2ad8a0a/mfbt/RefCounted.h#195
> 
> We'd really strongly prefer that new uses of RefCounted<> not be added to
> the tree.  I led Eric astray on IRC by suggesting RefCounted<> here, since I
> think the goal is to try to avoid refcount logging...or something?  

No, nothing like that.  But lifetime of anything that RefCounted<> may use may be shorter than lifetime of the logging engine, potentially.

(In reply to Eric Rahm [:erahm] from comment #13)
> @@ +195,5 @@
> >      //     its destructor.
> > +    LogFile* logFile = mOutFile.exchange(nullptr);
> > +    if (logFile) {
> > +      logFile->release();
> > +    }
> 
> If |mOutFile| is a |RefPtr| we can get rid of this as well.

I plan to change this anyway, there is a potential race because of RefPtr<T> = mMember not being atomic.  There may be a crash.  My helper classes suffer from that as well.

> 
> @@ +412,5 @@
> > +        // number.  When no other thread shifted the global file number yet,
> > +        // we are the thread to open the next file.
> > +        if (mOutFileNum.compareExchange(file_num, next_file_num)) {
> > +          // This is a sequential consistent assignment.
> > +          mOutFile = OpenFile(false, next_file_num);
> 
> This seems okay, but I think the standard log rotation logic is usually the
> most recent log file is always 0. So something like:
> 
> > if overflow:
> >   delete foo.log.1 // just assume kRotateFilesNumber == 1
> >   rename foo.log.0 => foo.log.1
> >   mOutFile = open(foo.log.0)
> 
> That's a bit more involved and I don't feel too strongly about it.

Even harder to do w/o a lock.  First, you have to delete .3, rename .2 to .3, rename .1 to .2.  Then close .0 and rename .0 to .1, only then you can open a new file .0.  Other threads would have to wait until the last 3 steps are done.

> 
> @@ +431,5 @@
> > +  // to this member is made before we release() any previously stored objects
> > +  // in it.
> > +  Atomic<LogFile*, SequentiallyConsistent> mOutFile;
> > +  // Just keeps the actual file path for further use.
> > +  UniquePtr<char[], FreePolicy> mOutFilePath;
> 
> We can use |UniqueFreePtr| here.

Yep!

> 
> ::: xpcom/tests/gtest/TestNSPRLogModulesParser.cpp
> @@ +56,5 @@
> >  
> >    for (size_t i = 0; i < MOZ_ARRAY_LENGTH(expected); i++) {
> >      bool callbackInvoked = false;
> >      mozilla::NSPRLogModulesParser(currTest->first,
> > +        [&](const char* aName, mozilla::LogLevel aLevel, int32_t) {
> 
> Can you expand this test to check the raw value as well?

Will do.
Attached patch v2.1 (obsolete) — Splinter Review
- logic of the file numbering is unchanged
- I moved away from using any Auto/RefPtr classes for keeping the file alive
- I have now a single atomic counter in LogModuleManager that tracks number of threads that use the current file (actually number of enters to Print())
- when the file size limit is reached, the current file is saved to be released only after the counter drops to zero

Nathan, please check my use of Atomics is correct.  I think it should be.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=45355fb726b4
Attachment #8764620 - Attachment is obsolete: true
Attachment #8767145 - Attachment is obsolete: true
Attachment #8773432 - Flags: review?(erahm)
Attachment #8773432 - Flags: feedback?(nfroyd)
Comment on attachment 8773432 [details] [diff] [review]
v2.1

Review of attachment 8773432 [details] [diff] [review]:
-----------------------------------------------------------------

I think the atomic usage is OK, if a little frightening.  A few minor comments below.

I thought about suggesting that we just open all the log files at once, kind of like comment 5, but I don't know if that would make things any less complex, since you still have to do a dance with atomic values when you need to switch files.

::: xpcom/base/Logging.cpp
@@ +111,5 @@
> + *  - the order number it was created for when rotating (actual path)
> + *  - number of active references
> + *
> + * The reference starts at 1.  To close the file, just call
> + * Release() directly on this class, which happens on two places:

This function doesn't exist in the class?

@@ +250,5 @@
> +    if (mRotate > 0) {
> +      char buf[2048];
> +      snprintf_literal(buf, "%s.%d", mOutFilePath.get(), aFileNum);
> +
> +      // rotate doesn't support append.

Is it possible to notify people about this?

@@ +383,5 @@
> +          // and try to release two files at the same time.  Still rather
> +          // assert that.  If this would fail too often, we may have a linked
> +          // list inside LogFile class that would take all pending files to be
> +          // released along.
> +          MOZ_ASSERT(!mToReleaseFile);

I think it's going to be hard to tell how often this fails, given that we don't run automation tests with logging enabled...

@@ +391,5 @@
> +        }
> +      }
> +    }
> +
> +    if (--mPrintEntryCount == 0 && mToReleaseFile) {

Was going to ask what happens if a thread gets interrupted between these two tests when there's a file to be released, but I see the exchange below takes care of that case.
Attachment #8773432 - Flags: feedback?(nfroyd) → feedback+
(In reply to Nathan Froyd [:froydnj] from comment #17)
> Comment on attachment 8773432 [details] [diff] [review]
> v2.1
> 
> Review of attachment 8773432 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I think the atomic usage is OK, if a little frightening.  A few minor
> comments below.

Thanks for the quick feedback!

> 
> I thought about suggesting that we just open all the log files at once, kind
> of like comment 5, but I don't know if that would make things any less
> complex, since you still have to do a dance with atomic values when you need
> to switch files.

I had a similar idea.  But I think there is a chance we could loose some data when we switch the number on one thread while another thread writes lines to this newly picked file number before we do the truncation on the first thread.  Those lines would be lost.

> 
> ::: xpcom/base/Logging.cpp
> @@ +111,5 @@
> > + *  - the order number it was created for when rotating (actual path)
> > + *  - number of active references
> > + *
> > + * The reference starts at 1.  To close the file, just call
> > + * Release() directly on this class, which happens on two places:
> 
> This function doesn't exist in the class?

I have to update the comment.

> 
> @@ +250,5 @@
> > +    if (mRotate > 0) {
> > +      char buf[2048];
> > +      snprintf_literal(buf, "%s.%d", mOutFilePath.get(), aFileNum);
> > +
> > +      // rotate doesn't support append.
> 
> Is it possible to notify people about this?

NS_WARNING?  That's these days pretty quiet.  Not sure how we could shout this out loud enough.

> 
> @@ +383,5 @@
> > +          // and try to release two files at the same time.  Still rather
> > +          // assert that.  If this would fail too often, we may have a linked
> > +          // list inside LogFile class that would take all pending files to be
> > +          // released along.
> > +          MOZ_ASSERT(!mToReleaseFile);
> 
> I think it's going to be hard to tell how often this fails, given that we
> don't run automation tests with logging enabled...

It's there more to express it's expected to be null.  

Although, I have additional simple patch that releases all files that need to (effectively removing this assertion).  There is a simple linked list on the LogFile class that is deleted along.  It only needs to change mOutFile to be rel/acq, hence a little slower.

I can provide the interdiff for reference.

> 
> @@ +391,5 @@
> > +        }
> > +      }
> > +    }
> > +
> > +    if (--mPrintEntryCount == 0 && mToReleaseFile) {
> 
> Was going to ask what happens if a thread gets interrupted between these two
> tests when there's a file to be released, but I see the exchange below takes
> care of that case.

Yup!  mToReleaseFile is added to the condition just to prevent call to exchange() when null (99.9% cases) which is slow.
Comment on attachment 8773432 [details] [diff] [review]
v2.1

Review of attachment 8773432 [details] [diff] [review]:
-----------------------------------------------------------------

All I really have are super minor comments. I think this could work, but I also think it's rather fragile juggling a few atomics.

As much as it pains me to say, I'd almost prefer just using an OffTheBooksMutex here. So something like:

> AtomicRef<LogFile> mOutFile;
> OffTheBooksMutex mLogFileLock;
> ...
> 
> Print():
>   AtomicRef outFile(mOutFile);
>   ...
>   if needs_rotate(outFile) {
>     lock(mLogFileLock) {
>       if (outFile == mOutFile) {
>         // original LogFile loses a ref
>         mOutFile = rotate();
>       }
>     }
>   }
> 
>   // We drop out of scope, original log file eventually gets closed if we rotated

So in this scenario the rotate number doesn't need to be atomic, we don't need a |toBeReleased| var, we don't need a |printEntryCount| var (that would be handled by the AtomicRefPtr, so yes it's still there, just encapsulated). It's just a bit simpler and easier not to break in the future. Anyhow this is just me thinking about the overall design and maintainability, I can see the Atomics being a perf win (or my off the cuff draft being not quite good enough still).

What do you think? Maybe we could do a perf test with some of the netwerk tests with http logging enabled to see how much of a win all-atomics gets us?

Clearing review for now, but I'd probably r+ after the nits are cleaned up and we talk it over a bit more.

::: xpcom/base/Logging.cpp
@@ +117,5 @@
> + *  - when we are shutting down
> + */
> +class LogFile
> +{
> +  FILE* mFile;

I suppose we could still use |ScopedCloseFile| here.

@@ +121,5 @@
> +  FILE* mFile;
> +  uint32_t mFileNum;
> +
> +public:
> +  LogFile(FILE* file, uint32_t file_num)

nit: param names, aFooBar

@@ +132,5 @@
> +  {
> +    fclose(mFile);
> +  }
> +
> +  FILE* file() const { return mFile; }

nit: UpperCamel() for functions

@@ +196,5 @@
>              addTimestamp = true;
>            } else if (strcmp(aName, "sync") == 0) {
>              isSync = true;
> +          } else if (strcmp(aName, "rotate") == 0) {
> +            rotate = (aValue << 20) / kRotateFilesNumber;

nit: |20| could be a konst as well.

@@ +250,5 @@
> +    if (mRotate > 0) {
> +      char buf[2048];
> +      snprintf_literal(buf, "%s.%d", mOutFilePath.get(), aFileNum);
> +
> +      // rotate doesn't support append.

I think a warning would be fine, but really just putting in the docs would be good enough. Rotating isn't on by default.

@@ +363,5 @@
>        PR_smprintf_free(buffToWrite);
>      }
> +
> +    if (mRotate > 0 && outFile) {
> +      int32_t file_size = ftell(out);

min: style, vars should be lowerCamel
Attachment #8773432 - Flags: review?(erahm) → feedback+
(In reply to Eric Rahm [:erahm] from comment #20)
> Comment on attachment 8773432 [details] [diff] [review]
> v2.1
> 
> Review of attachment 8773432 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> All I really have are super minor comments. I think this could work, but I
> also think it's rather fragile juggling a few atomics.
> 
> As much as it pains me to say, I'd almost prefer just using an
> OffTheBooksMutex here. So something like:
> 
> > AtomicRef<LogFile> mOutFile;
> > OffTheBooksMutex mLogFileLock;
> > ...
> > 
> > Print():
> >   AtomicRef outFile(mOutFile);
> >   ...
> >   if needs_rotate(outFile) {
> >     lock(mLogFileLock) {
> >       if (outFile == mOutFile) {
> >         // original LogFile loses a ref
> >         mOutFile = rotate();
> >       }
> >     }
> >   }
> > 
> >   // We drop out of scope, original log file eventually gets closed if we rotated
> 
> So in this scenario the rotate number doesn't need to be atomic, we don't
> need a |toBeReleased| var, we don't need a |printEntryCount| var (that would
> be handled by the AtomicRefPtr, so yes it's still there, just encapsulated).
> It's just a bit simpler and easier not to break in the future. Anyhow this
> is just me thinking about the overall design and maintainability, I can see
> the Atomics being a perf win (or my off the cuff draft being not quite good
> enough still).
> 
> What do you think? Maybe we could do a perf test with some of the netwerk
> tests with http logging enabled to see how much of a win all-atomics gets us?

I'll do it.  Let's see.  But I'm anyway confident the code is stable enough and in this form has its advantages (must less influence on the calling code and timing - it often happens a problem cannot be reproduced with logging on because too much locking!).

> 
> Clearing review for now, but I'd probably r+ after the nits are cleaned up
> and we talk it over a bit more.
> 
> ::: xpcom/base/Logging.cpp
> @@ +117,5 @@
> > + *  - when we are shutting down
> > + */
> > +class LogFile
> > +{
> > +  FILE* mFile;
> 
> I suppose we could still use |ScopedCloseFile| here.

It works differently.  And my class fits here well.

> @@ +196,5 @@
> >              addTimestamp = true;
> >            } else if (strcmp(aName, "sync") == 0) {
> >              isSync = true;
> > +          } else if (strcmp(aName, "rotate") == 0) {
> > +            rotate = (aValue << 20) / kRotateFilesNumber;
> 
> nit: |20| could be a konst as well.

Hmm... maybe, but I like it be visible here at the first sight how we are converting the argument from the env var to actual internal number we then work with.  But if you insist on having some macros/consts around here, I can do it.

> @@ +363,5 @@
> >        PR_smprintf_free(buffToWrite);
> >      }
> > +
> > +    if (mRotate > 0 && outFile) {
> > +      int32_t file_size = ftell(out);
> 
> min: style, vars should be lowerCamel

(I hope one day we will make our formatting overall more modern one day..)
(In reply to Eric Rahm [:erahm] from comment #20)
> Comment on attachment 8773432 [details] [diff] [review]
> v2.1
> 
> Review of attachment 8773432 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> All I really have are super minor comments. I think this could work, but I
> also think it's rather fragile juggling a few atomics.
> 
> As much as it pains me to say, I'd almost prefer just using an
> OffTheBooksMutex here. So something like:
> 
> > AtomicRef<LogFile> mOutFile;
> > OffTheBooksMutex mLogFileLock;
> > ...
> > 
> > Print():
> >   AtomicRef outFile(mOutFile);

What is |AtomicRef| ?

> >   ...
> >   if needs_rotate(outFile) {
> >     lock(mLogFileLock) {
> >       if (outFile == mOutFile) {
> >         // original LogFile loses a ref
> >         mOutFile = rotate();

And mOutFile is what?  a RefPtr?

> >       }
> >     }
> >   }
> > 
> >   // We drop out of scope, original log file eventually gets closed if we rotated

Honestly, I don't think the way you suggest using the lock would be enough to make sure we are really swapping in an atomic way.  Please also read http://www.janbambas.cz/illusion-atomic-reference-counting/, written during development of this patch ;)
Flags: needinfo?(erahm)
(In reply to Honza Bambas (:mayhemer) from comment #21)
> > What do you think? Maybe we could do a perf test with some of the netwerk
> > tests with http logging enabled to see how much of a win all-atomics gets us?
> 
> I'll do it.  Let's see.  But I'm anyway confident the code is stable enough
> and in this form has its advantages (must less influence on the calling code
> and timing - it often happens a problem cannot be reproduced with logging on
> because too much locking!).


So, I have some results, all on an optimized build, windows 10, SSD, measuring how many writes it's capable in exactly 5 seconds.

My heavy atomicity code:
single thread: 725793 writes
two threads:   748282 writes

A locked code *):
single thread: 730716 writes
two threads:   528947 writes

*) The lock has been added over the whole block that is using mOutFile and anything locally copied from it.  Using a RefPtr doesn't make sense at all w/o a lock anyway.  See reference to my blog post in comment 22.

For fun, I've removed the lock altogether - crashes immediately.

For more fun, I've reintroduced the refcnting for LogFile class, making mOutFile RefPtr and grabbing a local copy to a RefPtr.  I added the lock only to the place where we delete and reassign mOutFile (when we switch to another file) - what you have suggested - crashes too, just takes a longer time to reproduce and is also slower than my atomicity code.  BTW, the crash was pretty odd to analyze, if you wouldn't know what's going on.

I did the same test (run for 120s) with my atomicity code and it didn't crash after four such trials.

So I think we have a winner.
Flags: needinfo?(erahm)
I've just updated according the last review comments and left the code based on using atomics (confirmed be faster and stable).

https://treeherder.mozilla.org/#/jobs?repo=try&revision=61300e6f354f
Attachment #8773432 - Attachment is obsolete: true
Attachment #8773876 - Attachment is obsolete: true
Attachment #8775979 - Flags: review?(erahm)
(In reply to Honza Bambas (:mayhemer) from comment #23)
> (In reply to Honza Bambas (:mayhemer) from comment #21)
> > > What do you think? Maybe we could do a perf test with some of the netwerk
> > > tests with http logging enabled to see how much of a win all-atomics gets us?
> > 
> > I'll do it.  Let's see.  But I'm anyway confident the code is stable enough
> > and in this form has its advantages (must less influence on the calling code
> > and timing - it often happens a problem cannot be reproduced with logging on
> > because too much locking!).
> 
> 
> So, I have some results, all on an optimized build, windows 10, SSD,
> measuring how many writes it's capable in exactly 5 seconds.
> 
> My heavy atomicity code:
> single thread: 725793 writes
> two threads:   748282 writes
> 
> A locked code *):
> single thread: 730716 writes
> two threads:   528947 writes
> 
> *) The lock has been added over the whole block that is using mOutFile and
> anything locally copied from it.  Using a RefPtr doesn't make sense at all
> w/o a lock anyway.  See reference to my blog post in comment 22.
> 
> For fun, I've removed the lock altogether - crashes immediately.
> 
> For more fun, I've reintroduced the refcnting for LogFile class, making
> mOutFile RefPtr and grabbing a local copy to a RefPtr.  I added the lock
> only to the place where we delete and reassign mOutFile (when we switch to
> another file) - what you have suggested - crashes too, just takes a longer
> time to reproduce and is also slower than my atomicity code.  BTW, the crash
> was pretty odd to analyze, if you wouldn't know what's going on.
> 
> I did the same test (run for 120s) with my atomicity code and it didn't
> crash after four such trials.
> 
> So I think we have a winner.

Excellent, thanks for doing the extra legwork.
Comment on attachment 8775979 [details] [diff] [review]
v2.2 (rotate 4 files, no lock code)

Review of attachment 8775979 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, lets finally get this landed! Thanks for taking the time to go through multiple revisions.
Attachment #8775979 - Flags: review?(erahm) → review+
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d4130ffb3209
Rotate mozlog files within a size limit. r=erahm
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/d4130ffb3209
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla51
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: