Closed Bug 1126544 Opened 9 years ago Closed 9 years ago

Reduce hazard analysis memory usage

Categories

(Core :: JavaScript: GC, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
firefox39 --- fixed

People

(Reporter: sfink, Assigned: sfink)

References

Details

Attachments

(2 files)

Forked off from bug 1049290. It's killing slaves more and more often these days and pissing off the sheriffs. Pasting in bhackett's earlier reponse in bug 1049290 comment 60:

- What has memory usage for xmanager been over time?  When you first started with whole browser builds, do you remember what it was like?

- sixgill tries to keep track of the allocations it performs and you can call PrintAllocs() to get this data.  You could try modifying handle_event in xmanager.cpp to, say, call this function once a minute and see if it indicates where the high memory usage is coming from.

- There should be a -memory-limit option which you can try using.  If we get to high VM usage then xmanager should flush out the entries it keeps in memory for callgraph edges, though I don't remember whether xmanager is even populating these databases.  After the frontend runs, are there body_caller/callee.xdb databases?  What databases and other files do exist after the frontend runs, and how large are they typically?

- I don't think I've tried this before, but the frontend should be able to recover from an interrupted build, if things are shut down cleanly.  When memory usage gets high, you could try killing make, killing xmanager with SIGINT (which tells it to shut down cleanly) and then restarting the build.
(In reply to Steve Fink [:sfink, :s:] from comment #0)
> - What has memory usage for xmanager been over time?  When you first started
> with whole browser builds, do you remember what it was like?

I don't know, but I know it has climbed over time. I dropped the make -j flag down to 4 for the b2g build when it was failing too frequently, then later the same thing happened with the browser build. I could drop it further, but it has little further effect since the bulk of the memory is used by xmanager, not the concurrent compiles.

I wish our builds tracked the peak system memory usage as a matter of course.

> - sixgill tries to keep track of the allocations it performs and you can
> call PrintAllocs() to get this data.  You could try modifying handle_event
> in xmanager.cpp to, say, call this function once a minute and see if it
> indicates where the high memory usage is coming from.

I had to modify a couple of things to get this to work, and it's still a little weird -- it seems to work fine for xmanager, but the xdb.so from the same build crashes from an invalid free. (I guess when gcc allocates and passes it to xdb.so to free, it isn't happy.)

Maybe I'll push forward with this by creating a hybrid sixgill package that has the memory tracking binary for xmanager but not for xdb.so. I don't know how much time I want to put into getting it all working happily (or fixing up the build to avoid the tracking for the plugin.)

> - There should be a -memory-limit option which you can try using.  If we get
> to high VM usage then xmanager should flush out the entries it keeps in
> memory for callgraph edges, though I don't remember whether xmanager is even
> populating these databases.  After the frontend runs, are there
> body_caller/callee.xdb databases?  What databases and other files do exist
> after the frontend runs, and how large are they typically?

No, I'm not getting callgraph xdbs. I see

file_preprocess.xdb
file_source.xdb
src_body.xdb
src_comp.xdb
src_init.xdb

The -memory-limit option isn't enabled for xmanager, but I've enabled it now and will try using it. (After rebuilding and uploading to tooltool and pushing a custom mozharness script and all that fun stuff.)

> - I don't think I've tried this before, but the frontend should be able to
> recover from an interrupted build, if things are shut down cleanly.  When
> memory usage gets high, you could try killing make, killing xmanager with
> SIGINT (which tells it to shut down cleanly) and then restarting the build.

I'm very leery of this. I've attempted to do the analysis incrementally, and have run into weird problems with xdbs getting corrupted. I'm sure there was only one xmanager running, and I always pass in -fplugin-arg-xgill-remote, so I don't see how it could have had multiple things writing to any of the xdbs.

It's true that it would be cool to get that working, though. Incremental analysis would be slick.
Attached file leak.log
Er, then again, perhaps this is just a huge memory leak. I'm attaching a valgrind log claiming that xmanager leaked 69MB when only compiling jsalloc.cpp. I'm trying to look at the leaks, but the sixgill memory management is confusing me. I think I maybe sort of get the basic idea behind the Persist/UnPersist mechanism, but I don't understand how it is getting used in practice.
Flags: needinfo?(bhackett1024)
(In reply to Steve Fink [:sfink, :s:] from comment #2)
> Created attachment 8556752 [details]
> leak.log
> 
> Er, then again, perhaps this is just a huge memory leak. I'm attaching a
> valgrind log claiming that xmanager leaked 69MB when only compiling
> jsalloc.cpp. I'm trying to look at the leaks, but the sixgill memory
> management is confusing me. I think I maybe sort of get the basic idea
> behind the Persist/UnPersist mechanism, but I don't understand how it is
> getting used in practice.

Never mind, that seems to be an artifact of turning on memory tracking. Then again, -limit-memory seems to require memory tracking. I'm not sure if it's real, or just the overrides confusing valgrind.
Flags: needinfo?(bhackett1024)
Final memory dump for a b2g build http://ftp.mozilla.org/pub/mozilla.org/b2g/try-builds/sfink@mozilla.com-48e3e21ba7cb/try-linux64-b2g-haz/b2g_try_linux64-b2g-haz_dep-bm76-try1-build3323.txt.gz

22:04:29     INFO -  Allocations: 4648 mB
22:04:29     INFO -    HashCache: 7 mB
22:04:29     INFO -    HashCons: 572 mB
22:04:29     INFO -    HashObject: 2531 mB
22:04:29     INFO -    StreamInfo: 49 mB
22:04:29     INFO -    StreamInfoKey: 120 mB
22:04:29     INFO -    Vector: 409 mB
22:04:29     INFO -    Net: 3692 mB

(Note that the build failed later, but it completed the compile & link fine. The failure is probably due to some unrelated mozharness changes I'm testing.)

3.7GB of memory is a lot!
I've been trying to weave my way through the sixgill web, and am good and lost.

First, the -memory-limit thing isn't going to work, because the stuff that gets flushed is not stuff we're generating anyway. (I tried flushing anyway during an analysis build, and it didn't change the memory usage at all.)

Why does it keep all of those HashObjects in memory? Are those in the variable storage for the transaction processor or whatever it's called? (The thing in transaction.cpp and implemented in the backend/* stuff.)

I've experimenting with killing the build in the middle and restarting it. It does seem to reset the memory usage, and I ended up with valid xdbs. I don't know if they were missing anyway.

But it's hard to do that in a nice way. There are multiple gcc invocations in flight that already have the socket open. I could have the compiler wrapper periodically shut down the xmanager (via a final transaction maybe?), perhaps after a wait long enough to ensure any concurrent builds are complete (and using a global signal so all other compiler invocations would wait). Then it would start up a new one, rewrite the xgill.config file and have the other waiting invocations read the new port number, and continue.

But that's a flaky mess. Is there some way I could garbage collect xmanager or something? I don't understand what's going on and the data model well enough yet.

Maybe I could come up with something where xmanager re-execs itself but hangs onto the file descriptor for the listening socket. But that still feels hacky. Is there an easy way to just throw out everything and start over? I don't understand what's hanging onto the data, nor whether I need to do the reset in between the right transactions or something. I guess I could make the plugins start up subtransactions or something so the manager knows that none of them are in the middle of things?
Flags: needinfo?(bhackett1024)
(In reply to Steve Fink [:sfink, :s:] from comment #5)
> But that's a flaky mess. Is there some way I could garbage collect xmanager
> or something? I don't understand what's going on and the data model well
> enough yet.

This sounds like a good idea.

The data model used by xmanager and the other sixgill processes is that each process hash-conses the various things it allocates, i.e. it uses hash tables (HashCons in the list above) to make sure that it only has a single copy of particular strings, IR data, and other immutable stuff.  Basically the same as what we do with string atomization.  The things stored in the HashCons tables are HashObjects, which are using up most of the memory in the list above.  Probably most of the 1 GB in untracked allocations are things hanging off of HashObjects.

Now, it used to be that HashObjects were reference counted and would be destroyed when they were no longer in use.  Managing references was a colossal headache though so I eventually removed these and now the HashCons tables fill up but entries that are no longer in use are never removed.

If we could keep track of the roots in the graph, and had trace methods on the HashObjects, then the HashCons tables could be GC'ed.  I don't think that xmanager has all that many roots, and most of the things in the HashCons are created for temporary use (i.e. comparing newly compiled functions with ones fetched from a database), but I don't remember if this is the case for sure.

I can look at this myself but realistically I won't be able to get to it for at least a month.
Flags: needinfo?(bhackett1024)
(In reply to Brian Hackett (:bhackett) from comment #6)
> the same as what we do with string atomization.  The things stored in the
> HashCons tables are HashObjects, which are using up most of the memory in
> the list above.  Probably most of the 1 GB in untracked allocations are
> things hanging off of HashObjects.

Er, what 1 GB? If you're talking about the discrepancy between "Allocations: 4648 mB" and "Net: 3692 mB", that's because "Allocations:" is the raw number of allocations, without subtracting out anything that got freed.

The actual memory size isn't fully explained by these tracked numbers, but iirc it's pretty close.
(In reply to Steve Fink [:sfink, :s:] from comment #7)
> Er, what 1 GB? If you're talking about the discrepancy between "Allocations:
> 4648 mB" and "Net: 3692 mB", that's because "Allocations:" is the raw number
> of allocations, without subtracting out anything that got freed.

The 'Allocations' value should account for freed things.  When allocation is tracking is on, heap allocated things have a preheader with the size of the thing, which we decrement when it is freed.  See g_alloc_total in alloc.h
Finally landing this. It revealed several other random problems along the way (some in the new memory management code, some in new code constructs that tripped up gcc). But it's working as well now as the pervious version did, plus it should consume much less memory.
Assignee: nobody → sphink
Status: NEW → ASSIGNED
https://hg.mozilla.org/mozilla-central/rev/8d29d5a1dd31
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: