761564 - Memory leaks in Mozmill Python code which keeps references around and never completely releases the Mozmill object

Assignee

Description

•

12 years ago

Run the appropriate Mutt test and you will notice that Mutt gets stuck in an endless loop. You can only exit via Ctrl+Z.

mutt -b %app% -m tests/python/tests.ini

Result are endless line of:

ERROR | Traceback (most recent call last):
ERROR | Traceback (most recent call last):
ERROR | ERROR | Traceback (most recent call last):
ERROR | ERROR | ERROR | Traceback (most recent call last):
ERROR | Traceback (most recent call last):
ERROR | Traceback (most recent call last):
ERROR | ERROR | Traceback (most recent call last):
ERROR | Traceback (most recent call last):
ERROR | ERROR | Traceback (most recent call last):
ERROR | ERROR | ERROR | Traceback (most recent call last):
ERROR | ERROR | ERROR | ERROR | Traceback (most recent call last):
ERROR | ERROR | ERROR | ERROR | ERROR | Traceback (most recent call last):
ERROR | ERROR | ERROR | ERROR | ERROR | ERROR | Traceback (most recent call last):
ERROR | ERROR | ERROR | ERROR | ERROR | ERROR | ERROR | Traceback (most recent call last):

This is a basic test which tests Mozmill so I feel it should be fixed for Mozmill 2.0.

cmtalbert

Comment 1

•

12 years ago

If this is the mutt framework, then punt. But if this is the test itself, then please update it (which you'll need to do when you fix the CLI issue for 2.0)

Whiteboard: [mozmill-2.0?] → [mozmill-2.0-][mozmill-next]

Jeff Hammel

Comment 2

•

12 years ago

I also feel this should be fixed ASAP.  If we're going to maintain mozmill mutt tests, having them work is imperative.  If we're not, we should figure out (also ASAP) what we're going to do to ensure things don't break.

Jeff Hammel

Updated

•

12 years ago

Blocks: 698769

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 3

•

12 years ago

I will have a look into tomorrow. It was intermittent for me, so lets see if I can find the cause and fix it appropriately.

Assignee: nobody → hskupin

Status: NEW → ASSIGNED

OS: Mac OS X → All

Hardware: x86 → All

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 4

•

12 years ago

As discussed yesterday lets bring this back into the mozmill-2.0 schedule. It's important enough to have fixed to be able to run tests.

As what I have seen this infinite traceback loop is not related to the way how we call mozmill (CLI vs. API) but in analyzing the results.

The command line I have performed was:

mutt -b /Applications/Firefox/Nightly.app/ -m mutt/mutt/tests/python/tests.ini

After the python tests have been finished Mutt tried to run the JS tests where none have been set. This could be the cause. Here the output:

FAILED (errors=6)
INFO | Running JS Tests
INFO | 
ERROR | Traceback (most recent call last):

Summary: Mutt Python test usemozmill.py fails with infinite loop in handling traceback → Mutt Python tests cause infinite loop in handling traceback

Whiteboard: [mozmill-2.0-][mozmill-next] → [mozmill-2.0?]

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 5

•

12 years ago

So this actually happens whenever a Python test makes use of a logger. We overwrite the stdout and stderr streams with our own handlers but never unregister those again. Printing out the final results will then cause this infinite loop because we are still trying to log the output to a handler which doesn't exist anymore. An exception is thrown and it will be logged again.

So we should really reset sys.stdout and sys.stderr to their original value. Not sure when we should do that. Also should the consumer be reliable for it or can we do it inside the logger class? At least we can't have a __del__ method because it will never be called given existent references to the output handlers.

Jeff, given that you have written all the logger code what would you propose to do?

Summary: Mutt Python tests cause infinite loop in handling traceback → Mutt Python tests cause infinite loop in handling traceback when loggers are used

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 6

•

12 years ago

As the latest investigation shows it is a real problem with the logging module in Mozmill and we were only able to cover that because we make use of the API now. We probably wouldn't have caught it by using the CLI. So I think it was a good move!

Jeff Hammel

Comment 7

•

12 years ago

So confirmed: this is only a problem when running the python tests and JS tests.

Jeff Hammel

Comment 8

•

12 years ago

(In reply to Henrik Skupin (:whimboo) from comment #6)
> As the latest investigation shows it is a real problem with the logging
> module in Mozmill and we were only able to cover that because we make use of
> the API now. We probably wouldn't have caught it by using the CLI. So I
> think it was a good move!

I'm not looking to spawn an argument but I find this logic a bit flawed.  I am glad that we found this bug.  I am not glad that finding == 'Not only are all tests broken but the test harness hangs forever so running tests is impossible'. If we want to have the logging functionality tested, the appropriate way to do so is to write a test for it, not to have the test harness broken.  So no, I can't agree with your enthusiasm that this is a good move.

Jeff Hammel

Comment 9

•

12 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

A basic approach.  Please don't assign me this bug as I don't know when I will be able to get back to it

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 10

•

12 years ago

Comment on attachment 649877 [details] [diff] [review]
WIP

>+    def __del__(self):
>+        # replace sys stdout, stderr handles
>+        self.reset_streams()
>+

So this doesn't answer my question I have raised how it should be fixed for consumers. Keep in mind that the destructor will not be called until all references to other objects are killed. Which means you would have to call reset_streams() manually somewhere in the code of the consumer.

The approach you have here I also have implemented yesterday in my local repository, but simply miss the right behavior when we want to reset the streams.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 11

•

12 years ago

As I have discovered now while continuing working on our automation scripts, this issue blocks our further work because we have to create the Mozmill object multiple times for update and addon tests.

Summary: Mutt Python tests cause infinite loop in handling traceback when loggers are used → Infinite loop in handling traceback because registered loggers are not released correctly

Jeff Hammel

Comment 12

•

12 years ago

I can think of 3 ways to fix this:

1. Have code in Mozmill that particular checks for the LoggingHandler and resets these streams on either stop or exception.

2. Add a cleanup() method to all handlers and have code in Mozmill that call cleanup on either stop or exception (etc).

3. Remove the patching of sys.stdout and sys.stderr.  This will require changing or ignoring all code that expects logging to these channels.

FWIW, I'm in favor of 2. Also FWIW I think all three of them will be hard to do correctly.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 13

•

12 years ago

How would this affect logger instances which also log content before and after mozmill is active? Right now we instantiate the logger before the mozmill object is getting created. That means we already route output to our own stream handlers. Not sure if consumers also want to have this feature for clean-up tasks. So I wonder if Mozmill should take care of it or the consumers themselves. I would prefer Mozmill simply to the fact that it can cause an infinite loop. Shouldn't it then be possible to reset the stream handlers in the finish() method for each handler?

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 14

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #12)
> 3. Remove the patching of sys.stdout and sys.stderr.  This will require
> changing or ignoring all code that expects logging to these channels.

Out of curiosity, which code will that be? I'm not familiar with that code yet and would have to dive in first. So some pre-information would be kinda appreciated.

Jeff Hammel

Comment 15

•

12 years ago

> Shouldn't it then be possible to reset the stream handlers in the finish() method for each handler?

Yes, this is possible though the method is called .stop() not .finish(). https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/__init__.py#L73  

However, this ignores exceptions and is probably a reason that my patch does not fix this problem. We will have to ensure that any[*] place an exception is caught that .stop() is called correctly or this will remain a problem.

----
[*] and i do emphasize 'any'

Jeff Hammel

Comment 16

•

12 years ago

(In reply to Henrik Skupin (:whimboo) from comment #14)
> (In reply to Jeff Hammel [:jhammel] from comment #12)
> > 3. Remove the patching of sys.stdout and sys.stderr.  This will require
> > changing or ignoring all code that expects logging to these channels.
> 
> Out of curiosity, which code will that be? I'm not familiar with that code
> yet and would have to dive in first. So some pre-information would be kinda
> appreciated.

We currently log output from e.g. mozprocess, mozrunner, mozprofile, etc in the mozmill logs, let alone Firefox.  If we remove patching these streams, then all of this will be affected.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 17

•

12 years ago

Attached file Patch (remove logger from usemozmill.py) [checked-in] — Details

Pointer to Github pull-request

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 18

•

12 years ago

Comment on attachment 650495 [details]
Patch (remove logger from usemozmill.py) [checked-in]

There is really no need to use the logger in usemozmill.py. So lets remove its usage which fixes at least the mutt testrun.

Lets see if I can find a fix for the core issue today.

Attachment #650495 - Attachment description: Pointer to Github pull request: https://github.com/mozilla/mozmill/pull/86 → Patch (remove logger from usemozmill.py)

Attachment #650495 - Flags: review?(jhammel)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 19

•

12 years ago

So the problem here seems to be the Mozmill class. When I do not pass in the logger into the create factory, an added __del__ destructor for the logger gets executed and we can reset both output streams. But when passing in the logger into Mozmill, we somewhere keep a reference of the logger instance and therefor the destructor never gets called.

I haven't investigated the Mozmill class yet, but wanted to give an update. Could be easier as we thought.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 20

•

12 years ago

So this is most likely to happen due to a couple of memory leaks we have at least in Mozmill whenever you create an instance of Mozmill the refcount for it is 7 immediately. It is majorly related to the callback listeners we are registering and never releasing. I will step through all of those references now and figure out how to nail this down so our destructors are really getting called.

Keep in mind that right now even the Mozmill destructor is not getting called! I wonder how much comes via Mozrunner. But lets see.

Summary: Infinite loop in handling traceback because registered loggers are not released correctly → Memory leaks in Mozmill Python code which keeps references around and never completely releases the Mozmill object

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 21

•

12 years ago

Attached file WIP (obsolete) — Details

Pointer to Github pull-request

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 22

•

12 years ago

Comment on attachment 650838 [details]
WIP

So this WIP cleans-up all the left-over references we had on various places in the Mozmill class. It also changes the API by moving the finish() method from the TestResults class to the Mozmill class. That way we can make sure to safely clean-up all internal members so they can be destroyed by the garbage collector.

But the most important part here was that we never have reset the back_channel and bridge instances. In create_network() we created new instances for both each time when start_runner() was called, whereby the old ones with assigned listeners never got deleted.

Keep in mind that I do not have updated all the other tests yet. Largely because I want to have feedback on that change first.

Attachment #650838 - Attachment description: Pointer to Github pull request: https://github.com/mozilla/mozmill/pull/87 → WIP

Attachment #650838 - Flags: feedback?(jhammel)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

12 years ago

Attachment #649877 - Attachment is obsolete: true

Jeff Hammel

Comment 23

•

12 years ago

Comment on attachment 650495 [details]
Patch (remove logger from usemozmill.py) [checked-in]

I fear a little that we're fixing the symptom and not the cause here, but this is fine for now.  We should have tests for the logging listener (as well as handles in general) at some point, though there's no reason to do it in this particular file.

Attachment #650495 - Flags: review?(jhammel) → review+

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 24

•

12 years ago

(In reply to Jeff Hammel [:jhammel] from comment #23)
> I fear a little that we're fixing the symptom and not the cause here, but

Can you please explain, what you mean? We do not fix the symptoms only. We really make sure that the destructors are getting called for any of our created objects. That way we also reset the stdout/stderr streams. Probably I should have done the whole cleanup in a separate bug. But anyway please tell me what in your opinion the cause is.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 25

•

12 years ago

Please scratch my last comment. I mixed the review up with the feedback request for the real fix. This one will contain tests once ready to ensure it works.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 26

•

12 years ago

Comment on attachment 650495 [details]
Patch (remove logger from usemozmill.py) [checked-in]

Pushed as:
https://github.com/mozilla/mozmill/commit/8dd5c6931e2ed311cf8e20265a7a3929481e2552

Attachment #650495 - Attachment description: Patch (remove logger from usemozmill.py) → Patch (remove logger from usemozmill.py) [checked-in]

Jeff Hammel

Comment 27

•

12 years ago

(In reply to Henrik Skupin (:whimboo) from comment #19)
> So the problem here seems to be the Mozmill class. When I do not pass in the
> logger into the create factory, an added __del__ destructor for the logger
> gets executed and we can reset both output streams. But when passing in the
> logger into Mozmill, we somewhere keep a reference of the logger instance
> and therefor the destructor never gets called.
> 
> I haven't investigated the Mozmill class yet, but wanted to give an update.
> Could be easier as we thought.

In general, one shouldn't expect __del__ to be called any particular time in python.  While it is considered (by some) good practice to have __del__ point to a cleanup function, in general if you want cleanup to happen, you have to make it happen explicitly.

Jeff Hammel

Comment 28

•

12 years ago

> Keep in mind that right now even the Mozmill destructor is not getting called! I wonder how much comes via Mozrunner. But lets see.

Not sure what you mean here or how this is related?

Jeff Hammel

Comment 29

•

12 years ago

Comment on attachment 650838 [details]
WIP

This is a major API change. My general feeling is that impactful API
changes should be considered as more than just bug fixes.

The way that things work now, or were designed to work anyway, is that
you may want to run multiple mozmill instances.  Results should
persist across instances (that is, results should be additive).
If that is the desired use-case, the results object should be
instantiated before mozmill runs and the final reporting and shutdown
should be done after the last mozmill invocation.  This was done to
make mozmill not a one-stop runner and to be able to accumulate
results across runs.

I'm not crazy about this architecture but, at least at the time,
perceived the need to be able to run multiple instances of mozmill and
aggregate this into one results class and I think the above
architecture makes sense for doing this.  If we want to keep this
architecture, I am convinced the bug can be fixed without this
architectural and interface change.

If we want one set of results per mozmill instance, then I think there
is more to think about here.  We should not allow passing results into
the Mozmill class: it should just be instantiated there.  Since the
handlers are instantiated outside of the MozMill class and passed into
its constructor (unless we want to change that too), their API should
probably be changed so that only instantiation happens in the ctor and
other starting logic happens in e.g. ``start``.  Using the
LoggingHandler as an example, we currently patch the streams in __init__:

https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/logger.py

If we're going with "finish is done in Mozmill", this should go in its
own method.

The documentation will also need to be updated, e.g.

https://github.com/mozilla/mozmill/blob/master/mozmill/docs/architecture.txt

I feel like you're being overly concerned with python garbage
collection.  Really, the correct way of ensuring when state gets set
(e.g. reseting sys.stdout, sys.stderr) is to explicitly call a method
that does so, not relying on a reference count going to zero and
waiting for __del__ to be called.  I also wouldn't call this a memory
leak.  It is a "reference leak", although the general python attitude
about such things is not to worry about them until they're a problem.
I know of very few python programs that verbosely try to ensure that
they're cleaning up unused references and those that do are usually
because they absolutely need to or because they're doing something
that actually involves reference counting or executation tracing.  I'm
not worried at this point in time about the memory consumption of
mozmill. While it is better to clean up our references, I don't think
we should make functionality depend on this or on having __del__ be
called. There are ways to fix this bug other than that.  Similarly, I
find the test unnecessary.  I don't see what testing the reference
counting in mozmill achieves in production.  It is not testing API.
It is testing an an internal implementation.

TL;DR:
Moving results .stop events to be called from Mozmill vs externally
will tie one mozmill instance to one results instance and will be a
API and architecture change that should be carefully considered.  If
we do decide to do this, this patch isn't sufficient as is (I realize
its a WIP, but wanted to be explicit).  Not really related,
functionality, IMO, shouldn't rely on reference counting to go to zero
to ensure that __del__ is called.  Instead, methods should be called
explicitly to do so.

Attachment #650838 - Flags: feedback?(jhammel) → feedback-

Jeff Hammel

Comment 30

•

12 years ago

Thinking about it further, if it is desired to have one results set per mozmill instance, this could reduce code complexity.  The API will look something like

m = MozMill(...) # or MozMill.create(), instantiate the object
m.run(...)  # run some tests
m.run(...)  # you can keep running tests until you call finish
m.finish()  # do final reporting (handler.stop())

Once you call finish, in this incarnation, you can't really use the mozmill instance anymore.  I'm not at all sure this is good, either implementation wise or architecturally, but if it is what we want we could move to this model.

FWIW, I'm not sure which is "better": the current architecture or the new (and then, "better" for what purpose?). They both have their shortcomings and strengths.  The current architecture is more flexible, as you don't have a results instance per mozmill instance.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

12 years ago

Blocks: 764643

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

12 years ago

Whiteboard: [mozmill-2.0?] → [mozmill-2.0+]

Jeff Hammel

Comment 31

•

12 years ago

Attached patch example — Details — Splinter Review

This is similar to attachment 650838 [details] . This patch moves one result instance / mozmill instance as well as no longer logs stdout/stderr.

I was originally intending to do more but I realize that a lot of this comes down to priorities.

Currently (and as has always been the case), results are tailored for a fairly static mozmill environment.  This is mostly because we gather one instance of appinfo per results instance.  This will be inaccurate if we install any extensions or upgrade e.g. the browser. My patch does not fix this or make this better.  I worry a lot more about this in conjunction with "should mozmill be a one-shot runner or not" vs anything else.  We need to know when these things change.

One thing that having one results instance to one mozmill instance *does* fix is what handler shutdown methods should be called.  Currently this is a very loose architecture:

- you instantiate some handlers (including maybe a Results instance)
- you pass them into one or more mozmill instances
- you call the results instance .finish() method with the handlers you want to have their stop method called.  E.g. for reporting.

Some would say this is a good architecture.  Its certainly flexible.  OTOH, it is in theory easy to get wrong as you need to know what handlers you want to call stop() on.  Philosophical arguments aside, in practice this hasn't been a problem since we use one mozmill instance and a simple set of handlers (and presumedly if we used multiple mozmill instances each would get the same set of handlers).

Now, what do we lose by making mozmill (again) a one-shot runner?  Probably the most important thing is that since we pass the runner in, which contains implicitly the profile (and prefs and extensions), this can't really be changed in this model.  Think about carefully if we are fine with this.  

One thing this patch doesn't do is any sort of testing for one-shottedness.  That is, we don't reset the mozmill instance on finish().  We don't err out if you call run again even though we should.  If we do go with this approach, one of two things should be done IMHO:

1. Make __init__ idempotent.  In this model, .finish() calls __init__ which completely resets the class.

2. Mark all methods that aren't allowed after .finish() is called with a decorator that will raise an error if .finish() has already been called.

I'm in favor of 1, though that could be confusing for people.  Or maybe not.

Similarly, I think we probably *should* report the preferences and other environment as managed by mozrunner in the report.  We should likewise report when they change, etc, as we should appinfo.  This is going to be an issue regardless of whether or not we make mozmill a one-shot runner or not since tests can change preferences, etc.

So that is all issue A: whether to tie one results instance to each mozmill instance.  The short takeaway: in both cases we're doing it wrong now.  Since tests can drastically change the application under test and its environment, and since we don't record when this is done, we can't claim that e.g. reporting appinfo is right.  I'm happy to talk more about this at the Monday meeting or what not, but there are a lot of subtle issues here.  Personally, I would much rather get a really solid basis for what we're doing right now than add features to mozmill like the ability to run in parallel.  Every time we add a feature like this, it ends up causing problems later since we generally don't lay the groundwork first.  Let's reverse that trend.

The second issue is logging stdout/stderr.  There are a couple of questions here:

- do we actually care about this for mozmill right now?
- if we do, what priority is this?

IMHO, mozmill should move to mozlog and we should enhance mozlog to do this.  At some point, preferably yesterday, we will need multiprocess logging of stdout/stderr.  So far this hasn't been a priority for anyone.  ABICT, logging is only used for screen reading of human beings currently, so I am tempted to say just kill this for now, move to mozlog, fix mozlog, and have someone work on these issues there.  multiprocess logging is a "fun" problem with lots of race conditions and contingencies.  We will need it, but as I understand priorities, we don't need it now.

Attachment #661286 - Flags: feedback?(ctalbert)

Jeff Hammel

Comment 32

•

12 years ago

-        self.format = format
-        self.debug = debug

Of course I didn't mean to take out these lines.  This patch is an example anyway, not a ready-to-commit solution

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 33

•

12 years ago

Comment on attachment 661286 [details] [diff] [review]
example

I'm on PTO today but have seen you made an update on this bug. I want to mention that I'm still working on a patch, mostly yesterday. It contains most of your suggested changes and way more regarding weak references we have to use. So I would be happy from feedback from Clint. I hope that we can combine both patches to a final one.

Jeff, the removal of the stdout/stderr code will not affect the way how we display the output on the screen, right? We still see everything?

Jeff Hammel

Comment 34

•

12 years ago

> Jeff, the removal of the stdout/stderr code will not affect the way how we display the output > on the screen, right? We still see everything?

The only difference is that stdout will not be prefixed by

INFO | some output

and stderr will not be prefixed by

WARNING | some stderr output

cmtalbert

Comment 35

•

12 years ago

Looking at comment 31, and thinking about the other test harnesses, I really don't see any benefit to continuing down the path where we allow mozmill to be called in parallel. That combined with the focus issues we can't seem to get the developers to solve within Gecko itself just strikes me as way too much work for way too little gain.

I'm in favor of removing this ability entirely and going back to the one shot runner. After looking in more detail about python memory management, I think that's a symptom of the issue of how we're doing things and not at all the root cause of what's happening here.

I think the fact that we check the appinfo only once is not an issue either. That was only meant to be done for the reporting system so that the generated reports can understand what environment the test was initially run in. It was never intended to be some kind of mechanism to keep track of the changes to that environment that a test performs. If it were doing the latter, then it would need a complete overhaul also. It's entirely possible to construct the "final" state of the environment given the initial environment and the set of tests run. We currently leave that determination up to the human to perform, but I think for our purposes here that's entirely fine.

I think both patches are moving toward the same goal, namely: making mozmill a one shot runner. And second, ensuring mozmill.finish() closes down all our handlers. I don't think there's as much disagreement here as either of you think there is, at least that's the notion I get from looking at the patches.

So, in the interests of solving this issue and moving to a less complex test runner:
* Can we return to a one-shot test runner
* Leave appinfo as it stands - a report of the initial test run environment.
* Use mozlog for logging output rather than a mozmill one-off logging class? (would this actually help?)
* Ensure finish() cleans up our handlers
* Document the architecture change
* Change the mozmill-automation pieces to work this way using a one-shot API (though I'm assuming they never updated to use the parallel API so maybe this is already done).

cmtalbert

Comment 36

•

12 years ago

Comment on attachment 661286 [details] [diff] [review]
example

In general I like the approach. But the lack of INFO/WARNING headers on our log output will break all our log parsers.  Also, I don't see any return of the results from the mozmill.finish() call, that looks like an omission since you included that in your comment.

But f- for those two things. I do like the overall approach and I think that it is a step toward converging on a solution here.

Attachment #661286 - Flags: feedback?(ctalbert) → feedback-

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 37

•

12 years ago

Thanks Clint! As mentioned earlier I was working on the memory related stuff last Thursday but wasn't able to finish up due to its complexity. I hope that I have something later for you both. This patch will not care about the logging misbehavior (infinite loop), but would fix it by a correct handling of references.

The only thing you should never do is:

logger = LoggerListener(log_file=log_file,
                        file_level="DEBUG", debug=True)
m = mozmill.MozMill.create(handlers=(logger,))
logger = LoggerListener(log_file=log_file,
                        file_level="DEBUG", debug=True)

In such a case the formerly created instance of LoggerListener will still be around when you are trying to create the second instance. Exactly this will cause an infinite loop too.

I will still try to find a better solution for that, because I have to second Clint that I don't want to miss the output of the test states on stdout/stderr.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 38

•

12 years ago

Attached file Patch v1 (reference handling) — Details

After reading a lot of memory related articles in the web I was able to rework my patch. As Jeff pointed out the WIP wasn't that good because I didn't know the root cause at this point. But know its clear that in our case we have had circular references due to strong references between the Mozmill and handler classes. Those were causing the GC not to free the memory. By using weak references and making sure that strong references are getting reset if not needed anymore, all objects can now be safely destroyed if their scope is left.

Keep in mind that this patch does not change the behavior of the stdout and stderr output handling. I should better file a new bug on it, given that this bug drifted away a bit now, which is clearly my fault. Sorry for that.

So what's the change:

1. As discussed earlier we changed Mozmill so that the results class will be initialized inside of Mozmill and no results can be passed around between multiple Mozmill instances.

2. Handlers supplied to the Mozmill class and internal handlers (TestResults) are now stored in the cache via a weak reference. A strong reference only exists to objects which are alive longer than the Mozmill class. For internal handlers we do correctly reset the reference in finish().

3. We clear the listener and handler caches when finishing the Mozmill results. That will ensure that ref counts are decreased. 

4. The back_channel and bridge instances were not freed up correctly for the last iteration. By setting those to None in stop() ensures that resources are cleaned-up.

5. The CLI class was using a list for the handlers, while the Mozmill.create() factory method expects a tuple.

6. For safety we reset the stdout and stderr output handlers to the default ones once the logger gets destroyed. Again, the real fix will be taken care of in another bug.

7. Docs and tests have been updated to match the API change.

8. A new test has been added which checks the ref count for the mozmill, logger, and results instances. With those three checks we can make sure to not leak memory for those classes.

Attachment #650838 - Attachment is obsolete: true

Attachment #662548 - Flags: review?(jhammel)

Attachment #662548 - Flags: feedback?(ctalbert)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

12 years ago

Blocks: 792412

Jeff Hammel

Comment 39

•

12 years ago

> 5. The CLI class was using a list for the handlers, while the Mozmill.create() factory method expects a tuple.

MozMill.create() and MozMill.__init__() expect an iterable, which can be a tuple or a list or other:

https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/__init__.py#L197

As long as list.extend() can handle the type, we don't care what it is.  A (empty) tuple is used as the default because function argument defaults should not be mutable.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 40

•

12 years ago

Got it. Thanks Jeff. Haven't known that this is possible. So I pushed another commit to the pull request.

Jeff Hammel

Comment 41

•

12 years ago

Comment on attachment 662548 [details]
Patch v1 (reference handling)

So I'm probably not the person to review this.  I don't think weakref and counting on .e.g. __del__ is the answer here.  I think that handler cleanup should be done in .stop(), or if there are cases where that won't be called (e.g. exceptions), then there should be an idempotent .cleanup() method that should be called by .stop() and possibly by __del__ .  I'm going to give an r-, but I don't really have time to argue about this right now.  I don't think that depending on __del__ or reference counting is a good pattern.  Feel free to get someone else to review if you believe this to be contentious.

(I'll also note that I wrote out a much lengthier review last night, but the code change so much since then it is no longer applicable.  This is one thing I don't like about pull requests for review.  the logger handler __del__ method is gone, probably moved to another bug, and test_references.py is back, which again I don't think is what we should be concentrating on.  We are not testing python's garbage collection.)

Attachment #662548 - Flags: review?(jhammel) → review-

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 42

•

12 years ago

Comment on attachment 662548 [details]
Patch v1 (reference handling)

(In reply to Jeff Hammel [:jhammel] from comment #41)
> So I'm probably not the person to review this.  I don't think weakref and
> counting on .e.g. __del__ is the answer here.

The latest code doesn't rely on any __del__ method anymore. Not exactly sure which version of the pull you was looking at. The one which was in for the LoggerListener is not necessary anymore given my patch on bug 792412, which handles that in a way cleaner way.

> I think that handler cleanup
> should be done in .stop(), or if there are cases where that won't be called
> (e.g. exceptions), then there should be an idempotent .cleanup() method that
> should be called by .stop() and possibly by __del__ .  I'm going to give an

Which clean-up for handlers you are referring here? Is it the following?

https://github.com/mozilla/mozmill/pull/99/files#L1R447

The handlers don't know that Mozmill sets a property, so I don't think we should let the handler do that in the stop method. Another reason is that there would still be circular references around for those handlers which do not have a stop() method. In __init__ we set the mozmill property on all the handlers while in finish() we only call stop() for those which have such a method.

> think that depending on __del__ or reference counting is a good pattern. 
> Feel free to get someone else to review if you believe this to be
> contentious.

Ref counting is important here to ensure that we do not build up the above mentioned cyclic references. If you want to integrate the Mozmill API into a daemon like process, do you want that it grabs all the memory over time only because not used Mozmill instances can't be garbage collected?

But you are probably right with the one weakref instance I'm making use of. Given that we reset the handler cache in finish() now we could even store strong references in the cache.

> test_references.py is back, which again I don't think is what we should be
> concentrating on.  We are not testing python's garbage collection.)

This test ensures that we do not have cyclic references which are not unbound before the instance should be removed, e.g. by leaving the current scope the instance was defined in. I have chosen results, the logger listener, and Mozmill because they are the default handlers.

So I will flip the review to Clint then.

Attachment #662548 - Attachment mime type: text/plain → text/html

Attachment #662548 - Flags: feedback?(ctalbert) → review?(ctalbert)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 43

•

12 years ago

So I made some further checks and as what I have mentioned in my last review we are good to remove the weak references. But only because we clean-up everything now in the Mozmill.finish() method.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

12 years ago

Severity: critical → normal

cmtalbert

Comment 44

•

12 years ago

Comment on attachment 662548 [details]
Patch v1 (reference handling)

Ok, this version looks great. I made two comments on the pull request - one to fix a comment that still indicated that we used a weakref, and another to ensure that we always call .close() on the bridge and the backchannel objects; I think it's quite important that we ensure the asyncore object (what Bridge/Backchnnel inherit from) has its close method called.  

Otherwise, I think we have properly solved this issue, and I think we can count this as done.  Good work both Jeff and Henrik for pressing forward to a good solution in the end.

Attachment #662548 - Flags: review?(ctalbert) → review+

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 45

•

12 years ago

Thanks Clint. I have fixed all the remaining issues in another commit.

Landed on master:
https://github.com/mozilla/mozmill/commit/e341328345688a33e2bd79b669d1a488b35a97b0

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Nobody; OK to take it and work on it

Updated

•

8 years ago

Product: Testing → Testing Graveyard

WIP 12 years ago Jeff Hammel 1.51 KB, patch		Details \| Diff \| Splinter Review
Patch (remove logger from usemozmill.py) [checked-in] 12 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 346 bytes, text/html	k0scist : review+	Details
WIP 12 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 346 bytes, text/html	k0scist : feedback-	Details
example 12 years ago Jeff Hammel 5.28 KB, patch	cmtalbert : feedback-	Details \| Diff \| Splinter Review
Patch v1 (reference handling) 12 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 378 bytes, text/html	k0scist : review- cmtalbert : review+	Details