Investigate changing the Web IDL binding generator to only read in the parser data once

RESOLVED FIXED in mozilla24

Status

()

defect
RESOLVED FIXED
6 years ago
3 months ago

People

(Reporter: gps, Assigned: bzbarsky)

Tracking

(Blocks 1 bug)

unspecified
mozilla24
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite -

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [buildfaster:?])

Attachments

(1 attachment, 5 obsolete attachments)

Reporter

Description

6 years ago
I was investigating parts of the build system where Python is consuming all the CPU and the Web IDL binding generator piqued my interest.

I put it under a profiler and Configuration.getDescriptors() is taking about ~50% of the wall execution time for the webidl files I tested. Lines 119 and 120 are the hottest:

119                getter = lambda x: getattr(x, key)
120            curr = filter(lambda x: getter(x) == val, curr)

I suspect it's because they are being called *a lot* - ~250,000 times on the profile I'm looking at right now.

Given the call frequency, I wouldn't be surprised if the probe overhead of the profiler was contributing to the "heat" of these lines. That being said, these lines appeared to be called a lot more frequently than others - like a magnitude more.

I haven't measured how much total time the web IDL binding generator consumers during builds, but if the binding generator could be optimized in the lines above, you are probably looking at gains of 25-50% of run time.

Keep in mind that function calls are expensive in Python. Looking at the offending function, you may consider inlining the filter + lambda calls. It will result in more lines of code, sure. But it should execute much, much faster. I'm not familiar with the code, but eliminating calls to getDescriptors() or caching the output might also do wonders.

Until someone says otherwise, I think the priority of this is "nice to have."
Reporter

Updated

6 years ago
Whiteboard: [buildfaster:?]
Reporter

Comment 1

6 years ago
make -C dom/bindings export

real    0m31.863s
user    3m42.130s
sys     0m14.377s

That's a decent chunk of time. Performed on an i7-2600K (4 core + 4 HT) w/ -j8. (-j12 showed similar results and I confirmed we're CPU bound in Python.)

I'd consider this a moderate component of total build times. It's not as bad as say C++ compilation. But it is long enough to cause me to raise an eyebrow. And, I /think/ I've heard that we're only going to convert more things to Web IDL over time. Or, has that happened already? If we're only say 1/3 where we want to be with Web IDL deployment, I may start insisting someone look at this. If we're 95% there and/or expect slow growth, it's not super high on the priority list. But if someone has some spare cycles, this would be a nice win...
Fast growth is still expected, yes. I definitely think this is worth doing.
Hmm.  When I'd profiled this stuff, I didn't see getDescriptors show up at all; the time I saw was in actual declare() doing all the string bits and in unpickling the parser data.  I wonder why we're seeing such different numbers...

In any case, we should try making this bit faster and seeing what the impact is, I guess.
Reporter

Comment 4

6 years ago
I was using the built-in cProfile module. e.g. python -m cProfile <program with arguments>. It may bloat time in frequently called functions depending on how it's implemented. Honestly, I can't remember.

I think the important takeaway from this bug is "Web IDL binding generation is starting to take a lot of time - let's look at optimizing it."

I also noticed that we invoke the binding generator twice - once for headers and another for .cpp. Perhaps we could combine the invocations to reuse the parser context? Also, new process overhead on Windows can add up, so process reduction is generally something to strive for in the build system.
Yeah, cProfile is what I used too.

But yes, we absolutely need to optimize the bindings stuff; we've known that for a few months now.  :(

What Kyle has wanted to do was use several worker processes that keep the data model in memory so we don't have to keep pickling and unpickling.  That wouldn't help getDescriptors, obviously...
Ok, so I sat down to profile this some more.  Specifically, I did the following:

1)  Modify the $(binding_header_files) rule in the makefile to start with:

  PYTHONDONTWRITEBYTECODE=1 $(PYTHON) -m cProfile -o /tmp/$*Binding.prof $(topsrcdir)/config/pythonpath.py

2)  touch dom/bindings/Codegen.py && time make -C ../obj-firefox/dom/bindings/ export

3)  When this finishes, grab the list of .prof files in /tmp and load them all into pstats via p = pstats.Stats(*[list of filenames here]).

4)  p.strip_dirs().sort_stats('cumulative').print_stats(20):

  128477649 function calls (128330830 primitive calls) in 194.724 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      318    0.003    0.000  194.725    0.612 pythonpath.py:7(<module>)
      318    0.005    0.000  194.721    0.612 pythonpath.py:10(main)
  636/318    1.687    0.003  194.698    0.612 {execfile}
      318   21.405    0.067  194.502    0.612 BindingGen.py:5(<module>)
      318    0.035    0.000  154.003    0.484 BindingGen.py:41(main)
      318   85.959    0.270   85.959    0.270 {cPickle.load}
      318    1.845    0.006   63.348    0.199 Configuration.py:15(__init__)
   145326   14.459    0.000   23.979    0.000 Configuration.py:191(__init__)
   145326    6.529    0.000   23.586    0.000 {all}
      318   10.204    0.032   18.762    0.059 Configuration.py:5(<module>)
 63119502   17.061    0.000   17.061    0.000 Configuration.py:60(<genexpr>)
      318    8.038    0.025    8.555    0.027 WebIDL.py:5(<module>)
   122234    2.231    0.000    6.670    0.000 Configuration.py:469(getTypesFromDescriptor)
      318    0.047    0.000    4.537    0.014 BindingGen.py:10(generate_binding_header)
 14687760    4.352    0.000    4.352    0.000 WebIDL.py:2306(isMethod)
      318    0.049    0.000    3.917    0.012 Codegen.py:7167(__init__)
15087/13704    0.541    0.000    3.897    0.000 {filter}
     4094    0.021    0.000    3.331    0.001 Configuration.py:98(getDescriptors)
   138321    1.927    0.000    2.893    0.000 Configuration.py:487(getFlatTypes)
  2279604    0.881    0.000    2.810    0.000 Configuration.py:120(<lambda>)


so at first glance getDescriptors accounts for about 3s of the 195s total CPU time.

Things that jump out at me:

* Unpickling is about 86s.
* Configuration.__init__ is about 63s; of that time 23s is descriptor init, and 17s is
  the O(N^2) loop trying to determine which descriptors share the same native type(!).
* I'm not sure what Configuration.py:5(<module>) means.  Is that the time needed to
  import those symbols from the WebIDL module???

Tentative conclusion: we could cut out about 3/4 of the CPU time here if we simply did the configuration setup in memory once and only once....

I'll try profiling the .cpp generation next.
So following the same procedure as above, but now I've munged the $(binding_cpp_files) rule.  Now I get:

  136303169 function calls (135818443 primitive calls) in 203.065 seconds

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
      318    0.003    0.000  203.065    0.639 pythonpath.py:7(<module>)
      318    0.005    0.000  203.062    0.639 pythonpath.py:10(main)
  636/318    1.718    0.003  203.039    0.638 {execfile}
      318   21.993    0.069  202.844    0.638 BindingGen.py:5(<module>)
      318    0.036    0.000  161.394    0.508 BindingGen.py:41(main)
      318   86.719    0.273   86.719    0.273 {cPickle.load}
      318    1.887    0.006   65.164    0.205 Configuration.py:15(__init__)
   145326   14.918    0.000   24.682    0.000 Configuration.py:191(__init__)
   145348    6.766    0.000   24.483    0.000 {all}
      318   10.355    0.033   19.127    0.060 Configuration.py:5(<module>)
 63119502   17.721    0.000   17.721    0.000 Configuration.py:60(<genexpr>)
      318    8.242    0.026    8.769    0.028 WebIDL.py:5(<module>)
      312    0.045    0.000    8.712    0.028 BindingGen.py:26(generate_binding_cpp)
50937/13704    1.064    0.000    8.549    0.001 {filter}
   122234    2.258    0.000    6.756    0.000 Configuration.py:469(getTypesFromDescriptor)
     8205    0.035    0.000    5.485    0.001 Configuration.py:98(getDescriptors)
      312    0.004    0.000    5.181    0.017 Codegen.py:7391(define)
29641/746    0.078    0.000    5.106    0.007 Codegen.py:318(join)
126035/2786    0.050    0.000    5.098    0.002 Codegen.py:319(<genexpr>)
29564/1386    0.057    0.000    5.054    0.004 Codegen.py:400(define)

A bit more Codegen.py here, but this is still dominated by the pickle and Configuration bits.  The entirety of the actual codegen is 9s or so, out of 203s total....
Reporter

Comment 8

6 years ago
I wonder why my (partial set) profile yielded completely different results. Most weird. Chalk up another point to being thorough.

In case it's not obvious, Python's pickling API isn't the most efficient serializer in the world. I believe in many cases the built-in json serializer will outperform it. Unless you really need to effortlessly shuffle around native Python types, it's probably not worth using.
Unfortunately, what we're pickling here is the object graph representing the data model produced by the parser.  So we really are shuffling around Python objects.  :(

I think we really need to move to some sort of model where we have a few (how many?) worker processes that take filenames as input and spit out files.  The question is how to drive that sort of thing from a makefile.
Reporter

Comment 10

6 years ago
Regarding pickling, if you are pickling class instances instead of built-in primitive types (like str, int, tuple, list, dict), you should consider implementing __setstate__ and __getstate__ (http://docs.python.org/2/library/pickle.html#pickling-and-unpickling-normal-class-instances) to define custom pickling behavior that is more optimal (only pickles built-in primitive types).

Regarding a process worker pool, it's complicated. Python has multiprocessing.Pool. However, it has a lot of sharp edges and flat out doesn't work on BSDs (because CPython hasn't implemented the process locking primitive - derp). We have bug 819048 to add a framework that actually works (we'll likely need to roll our own).

Even if you have a decent worker process pool, hooking it up to the build system with proper dependencies would not be fun. You would ideally only pass targets that need rebuilt. However, I don't believe there is an easy way to do that in make. Although, maybe there is a neat trick to collect all out-of-date targets and pass them in a single process call. Joey? Mike?
I am probably missing some context here, but how is a "process worker pool" that takes "filenames as input and spits out files" while handling "proper dependencies" something different than a build system? Is this all to avoid the startup cost of python?
The binding generator needs the entire WebIDL data model to do anything useful.

Right now we put that data model in a pickle and then unpickle it in each binding generator process.  Unpickling it takes about 300-500ms on my machine, and will get slower as we convert more stuff.

We end up with 636 invocations of the binding generator on trunk as of today (318 headers and 318 implementations).  That means several hundred seconds of CPU time is spent just unpickling.

So the point would be to have a single process that has the data model in memory all the time and isn't doing all this pickling and unpickling....
(In reply to Boris Zbarsky (:bz) from comment #12)
> The binding generator needs the entire WebIDL data model to do anything
> useful.
> 
> Right now we put that data model in a pickle and then unpickle it in each
> binding generator process.  Unpickling it takes about 300-500ms on my
> machine, and will get slower as we convert more stuff.

So in order to generate (as an example) URLBinding.h from URL.webidl, we have to parse every other .webidl file? Why? That sounds like the real problem to me.
> we have to parse every other .webidl file? 

Yes.

> Why?

The conceptual model of the webidl setup in the specs is that there are no separate "files".  There is just one single IDL definition thing.  That means that any interface can refer to any other interface, or to anything else defined in any other spec, for that matter.  It means that you can do this:

  interface A : B {
  };
  interface B {};

and it's fine.

So for example, to generate URLBinding.h and especially URLBinding.cpp we need to know things like:

1) Does any other interface inherit from URL?
2) Are Blob and MediaStream are ever used on the right-hand side of "implements"
   statements?
3) Are there any objects that could implement both Blob and MediaStream (this comes down
   to "implements" plus knowing whether Blob and MediaStream inherit from each other).

Now this conceptual model in the specs is a huge PITA for any sort of incremental rebuild prospects.  So what we have is something that tries to preserve the "one big ball of string" behavior while doing dependency tracking and allowing partial rebuilds.  The bad things happen if we change Bindings.conf or Codegen.py, which mean we _do_ have to regenerate everything, because now we have to pay the serialize/deserialize overhead for this big data blob for each separate file we want to be able to build incrementally.
(In reply to Boris Zbarsky (:bz) from comment #14)
> 
> The conceptual model of the webidl setup in the specs is that there are no
> separate "files".  There is just one single IDL definition thing.  That
> means that any interface can refer to any other interface, or to anything
> else defined in any other spec, for that matter.  It means that you can do
> this:
> 
>   interface A : B {
>   };
>   interface B {};
> 
> and it's fine.
> 
> So for example, to generate URLBinding.h and especially URLBinding.cpp we
> need to know things like:
> 
> 1) Does any other interface inherit from URL?
> 2) Are Blob and MediaStream are ever used on the right-hand side of
> "implements"
>    statements?
> 3) Are there any objects that could implement both Blob and MediaStream
> (this comes down
>    to "implements" plus knowing whether Blob and MediaStream inherit from
> each other).
> 
> Now this conceptual model in the specs is a huge PITA for any sort of
> incremental rebuild prospects.

Well it's really awful for full builds too, hence this bug :). Any sort of "anything-depends-on-everything model" is going to suck. Incremental builds will be bad, because to generate a single file, we have to read the contents of N others (which is all bundled in the pickle file in this case, but still N "units" of information in one file). So something that should be O(1) is O(N). Similarly, full builds should be O(N) but are O(N^2). I don't think there's much that can really be done, short of changing the spec. Is that possible?

> preserve the "one big ball of string" behavior while doing dependency
> tracking and allowing partial rebuilds.  The bad things happen if we change
> Bindings.conf or Codegen.py, which mean we _do_ have to regenerate
> everything, because now we have to pay the serialize/deserialize overhead
> for this big data blob for each separate file we want to be able to build
> incrementally.

If things are really "one big ball of string", why not just treat it as such and do all the .webidl generation in a single pass (similar to .ipdl)? As a quick test, I compared the 317 BindingGen.py invocations (for creating headers) using the current one-at-a-time invocation to one where all .webidl files are passed into a single BindingGen.py invocation:

One-at-a-time: 1m32s (not parallelized - I ran them in a separate shell script instead of from make)
All at once: 1.377s

For reference, a single .webidl header generation is .300s - so as I understand it, the whole point of adding a python job farm with separate dependency tracker that somehow has to integrate with make is to get the .300s time when a single .webidl file has changed, but get the 1.377s time when doing a full build (or Bindings.conf changes). Is that really worth the effort and complexity when we could get 1.377s for both cases by just passing in all webidl files at once all the time? Yes, we lose ~1s in the incremental case, but that's what happens when a crappy spec says "before doing anything, read everything".

Thanks for taking the time to explain this.
> Well it's really awful for full builds too, hence this bug :)

I don't think so.  If we were willing to just always generate all the .cpp and .h, we'd parse the IDL once, then generate them all; it would be a lot faster than what ends up happening now in full-build situation.  :(  The biggest problem would be that it'd all happen on one core in a naive implementation.

> Is that possible?

It's not changing "the spec".  It's changing the entire web platform's worth of specs... somehow.  It's not even clear how.  Not going to happen.  :(

> why not just treat it as such and do all the .webidl generation in a single pass

And do what in that pass?  That's the question.

There are three separate things that need to happen here:

1)  Generate all the headers.
2)  Generate all the .cpp files.
3)  Compile all the .cpp files.

The numbers earlier in this bug are for steps 1 and 2: when something changes that causes them to actually be done (as in, when parts of the binding codegen change, or in a clobber build) those take about 150s of CPU time each for me.

But step 3 involves compiling 300+ .cpp files.  I just measured, and if I'm not getting ccache hits for them that takes about 950s of CPU time for me.

So we could make steps 1 and 2 much faster (down to 75s or less of CPU time instead of 300s easily, just looking at those profiles) by doing the entire codegen every time.  But then we'd have those 950s of .cpp compiling, modulo ccache on every single build, no?  Including dep builds.  _That_ is what the current setup is trying to avoid.  :(
Reporter

Comment 17

6 years ago
We should reimplement the WebIDL binding generator in Rust :)
Taking that as a serious suggestion, for the moment, I don't see how that would help.

As I see it, the constraints for an ideal solution here are:

1)  We only want to output .cpp and .h files that actually need to be modified.
2)  We want to do this via dependency tracking, instead of what we used to have where we
    would always generate them and sometimes not output them (because that caused us to
    have to generate-and-throw-away every time).
3)  We want to do this generation from a small number of processes so that the webidl data
    model can be created/deserialized a small number of times.

All of which points to somehow passing the exact list of things that need to be rebuilt to something that then parcels it out to said processes or something.  The question is how to do that in make, or whether it's possible at all.  :(
(In reply to Boris Zbarsky (:bz) from comment #18)
> Taking that as a serious suggestion, for the moment, I don't see how that
> would help.

Rust solves everything! ;)

> 1)  We only want to output .cpp and .h files that actually need to be
> modified.
> 2)  We want to do this via dependency tracking, instead of what we used to
> have where we
>     would always generate them and sometimes not output them (because that
> caused us to
>     have to generate-and-throw-away every time).
> 3)  We want to do this generation from a small number of processes so that
> the webidl data
>     model can be created/deserialized a small number of times.
> 
> All of which points to somehow passing the exact list of things that need to
> be rebuilt to something that then parcels it out to said processes or
> something.  The question is how to do that in make, or whether it's possible
> at all.  :(

Can we derive better dependencies from the data model?  The deps files current say that a .cpp/.h depends solely on the immediate .webidl file, which really isn't right.  Comment 14 makes it sound like there are some dependencies that aren't just "this interface depends on that one", but maybe there's a good way of representing those too (auxiliary files)?
(In reply to Boris Zbarsky (:bz) from comment #16)
> > Well it's really awful for full builds too, hence this bug :)
> 
> I don't think so.  If we were willing to just always generate all the .cpp
> and .h, we'd parse the IDL once, then generate them all; it would be a lot
> faster than what ends up happening now in full-build situation.  :(  The
> biggest problem would be that it'd all happen on one core in a naive
> implementation.

At the moment, parsing all the IDLs and generating them all in a single BindingGen.py instance is on the order of compiling a single cpp file (a few seconds), so I don't think it makes sense to worry about it running on a single core.

> 
> > Is that possible?
> 
> It's not changing "the spec".  It's changing the entire web platform's worth
> of specs... somehow.  It's not even clear how.  Not going to happen.  :(

Sadness :(

> 
> > why not just treat it as such and do all the .webidl generation in a single pass
> 
> And do what in that pass?  That's the question.
> 
> There are three separate things that need to happen here:
> 
> 1)  Generate all the headers.
> 2)  Generate all the .cpp files.
> 3)  Compile all the .cpp files.
> 
> The numbers earlier in this bug are for steps 1 and 2: when something
> changes that causes them to actually be done (as in, when parts of the
> binding codegen change, or in a clobber build) those take about 150s of CPU
> time each for me.

My 1.3s number was actually just for part 1 - when I add cpp generation into the mix and do them all at once, I get 3.9s. Still way worse than it *should* be, but a lot better than the 150s number.

> 
> But step 3 involves compiling 300+ .cpp files.  I just measured, and if I'm
> not getting ccache hits for them that takes about 950s of CPU time for me.
> 
> So we could make steps 1 and 2 much faster (down to 75s or less of CPU time
> instead of 300s easily, just looking at those profiles) by doing the entire
> codegen every time.  But then we'd have those 950s of .cpp compiling, modulo
> ccache on every single build, no?  Including dep builds.  _That_ is what the
> current setup is trying to avoid.  :(

Doesn't the replaceFileIfChanged() call mean that not all .cpp files will be modified if we codegen them every time? So if all the .webidl generation happens in a single BindingGen.py instance, you shouldn't account for compiling all .cpp files every time - if only URLBinding.cpp was changed, it should be the only one that gets compiled.

Right now we have:

*.webidl -> GlobalGen.py -> ParserResults.pkl
for each *.webidl:
   x.webidl and ParserResults.pkl -> BindingGen.py -> xBinding.h
   x.webidl and ParserResults.pkl -> BindingGen.py -> xBinding.cpp

I'm suggesting to try:

*.webidl -> BindingGen.py -> *Binding.h *Binding.cpp

It's definitely not ideal, but this brings the total generation time down to a few seconds. Using writeifmodified should allow incremental builds to only compile the changed files.

Is there really no way to have BindingGen.py only pull in the information it needs when it is needed (ie: ditch ParserResults.pkl)? So parsing URL.webidl will bring in Blob.webidl and MediaStream.webidl when it finds those identifiers, but ignore all unrelated .webidl files. This is the ideal solution, in my opinion.
> So parsing URL.webidl will bring in Blob.webidl and MediaStream.webidl when it finds
> those identifiers, but ignore all unrelated .webidl

Again, we need to know whether some other webidl file has an interface that implements both Blob and MediaStream.  So we have to parse them all to find out whether they're unrelated or not...
So I think this is pretty much what comment 20 describes.

What I observe on my machine is that without this patch I have:

mozilla% make -C ../obj-firefox/dom/bindings/                                   
mozilla% time make -C ../obj-firefox/dom/bindings/
4.176u 0.142s 0:04.31 100.0%    0+0k 0+0io 0pf+0w
mozilla% touch dom/bindings/Codegen.py && time time make -C ../obj-firefox/dom/bindings/
372.880u 46.915s 1:52.84 372.0% 0+0k 560+4052io 0pf+0w
mozilla% touch dom/webidl/XMLHttpRequest.webidl && time make -C ../obj-firefox/dom/bindings/
7.021u 0.381s 0:07.23 102.3%    0+0k 0+972io 0pf+0w

(those last two last commands rebuilt some .cpp files, but hit ccache).

With this patch I have:

mozilla% make -C ../obj-firefox/dom/bindings/
mozilla% time make -C ../obj-firefox/dom/bindings/
5.358u 0.860s 0:04.87 127.5%    0+0k 0+0io 0pf+0w
mozilla% touch dom/bindings/Codegen.py && time time make -C ../obj-firefox/dom/bindings/
13.136u 0.950s 0:12.79 110.0%   0+0k 0+4io 0pf+0w
mozilla% touch dom/webidl/XMLHttpRequest.webidl && time make -C ../obj-firefox/dom/bindings/
21.374u 1.749s 0:15.93 145.0%   0+0k 6+658io 1pf+0w

(I actually changed the XMLHttpRequest.webidl so that the .cpp would change, not just touched it, and this did NOT hit ccache.)

So this seems promising.  Esp if we can somehow make it so that the no-op "parse the makefile" bit doesn't take 4s (maybe by removing some of the stuff from the makefile that we might no longer need now?)

As we add bindings that 13s for codegen might go up, but for now this seems like a clear win.
And note that we may be able to speed up things from that 13s.  For example, in the new setup the lambda in getDescriptors (hey, that thing that started this bug) is about 5s (cumulative) out of the total 10.176s that cProfile reports spent in python.  So suddenly speeding it up makes a lot of sense.  ;)  Especially if we can nix the 4s we spend parsing the makefile or whatever that no-op make overhead is.
Reporter

Comment 24

6 years ago
From what I think I understand from reading the comments, we should work towards the following:

  If *any* .webidl has changed we invoke BindingGen.py *once* to generate
  *all* the .h and .cpp files that need changing.

According to the bug comments, the worst case is 13s with our current file set. Presumably we will add files over time. Presumably we can eliminate hotspots from BindingGen.py to compensate for that.

If we set up make dependencies properly, we should be able to avoid regenerate-the-world if a single .webidl changes (assuming it's not a commonly used one) to attain less than 13s regen time for non-clobber builds. We should try to do that. However, I don't believe not having it would be a huge loss. Comments indicate that we go from over 3 minutes CPU-core time to 15s. That's huge. I /think/ that will more than offset losses from unpruned make dependencies. That being said, we should still aim for proper dependencies.

My conclusion is based on the understanding that bulk-processing the .webidls in a single process call is on the order of a magnitude faster than separate process calls. Even if we only utilize a single core, you'd still need a 10+ core machine to come out ahead wall time. Even then, the CPU is performing a lot more work. Since the build system could be performing other work on those cores, that's wasted CPU and bad for the overall build system.
> According to the bug comments, the worst case is 13s with our current file set. 

On my particular (fairly fast) hardware, but yes.

> If we set up make dependencies properly, we should be able to avoid regenerate-the-world
> if a single .webidl changes

The key to that would be passing some set of webidl files smaller than "all of them" to that one invocation of BindingGen.py.  If we can figure out how to do that.

But yes, I agree that for now that seems like the right goal to me.  Kyle, thoughts?

By the way, I would love whatever we can do to make the no-op make here not take 4s.  I know for a full build it doesn't matter, but for a modify/build/debug cycle that's like 1/3 of the wait time!
Flags: needinfo?(khuey)
Reporter

Comment 26

6 years ago
Looking at the output of |make -C dom/bindings -d export|, it appears reading the .pp files and then processing the result is consuming much of those "4s". Although, |make -C dom/bindings export| only takes 2.5s on my machine. Although, a full no-op make takes 5.17s.

Anyway, khuey has experience making .pp dependencies suck less. And I know for a fact it is very fresh in his brain :D
Hmm.  How are you getting amounts of time out of the -d output?
Reporter

Comment 28

6 years ago
(In reply to Boris Zbarsky (:bz) from comment #27)
> Hmm.  How are you getting amounts of time out of the -d output?

I'll confess to eyeballing it. I think it's pretty obvious when you compare the actual output against what your eyes saw - the "Reading makefile `.deps/..." messages are on the terminal a disproportionate amount of time!
I wonder whether we should morph this bug into "stop spending time on unpickling" and file a new one on the getDescriptors issue.... I have some patches for that one.  ;)
Summary: Investigate optimizing Configuration.getDescriptors in the Web IDL binding generator → Investigate changing the Web IDL binding generator to only read in the parser data once
Posted patch WIP (obsolete) — Splinter Review
Assignee: nobody → bzbarsky
So with that patch things sort of work, I think.  If I touch Codegen.py I see toRegenerate in BindingGen.py end up with all the webidl files.  If I just touch XMLHttpRequest.webidl, I see it end up with just XMLHttpRequest.webidl.

The one remaining issue I have is that if I touch XMLHttpRequestEventTarget.webidl then I see _two_ calls to BindingGen.py.  The first passes "ParserResults.pkl XMLHttpRequestEventTarget.webidl" for $?.  The second passes "XMLHttpRequest.webidl XMLHttpRequestUpload.webidl".  Ideally we would have only _one_ call that passes all four filenames...

The second call is presumably due to the relevant .pp files for XMLHttpRequest and XMLHttpRequestUpload.  But why do we end up with two separate build-steps for the ".BindingGen" target, with different values of $?

I just tried experimenting with dropping the dependency from the $(binding_dependency_trackers) rule.  If I do that, then I still see two calls to BindingGen.py, but now the first passes in just ParserResults.pkl and the second passes the three binding filenames....  I suppose I could just work on making that first call a very fast no-op, but I'd like to understand why it's happening at all.
Flags: needinfo?(gps)
Looks like the first call to BindingGen.py happens during export and the second happens during libs.  Only the second one seems to be taking the .pp files into account...

That's actually bad: we want to regenerate the right headers during export, not during libs.
Ah, I bet this rule in rules.mk is relevant:

1616 ifneq (,$(filter-out all chrome default export realchrome tools clean clobber clobber_all distclean realclean,$(MAKECMDGOALS)))
1617 MDDEPEND_FILES		:= $(strip $(wildcard $(foreach file,$(OBJS) $(PROGOBJS) $(HOST_OBJS) $(HOST_PROGOBJS) $(TARGETS) $(XPIDLSRCS:.idl=.h) $(XPIDLSRCS:.idl=.xpt),$(MDDEPDIR)/$(notdir $(file)).pp) $(addprefix $(MDDEPDIR)/,$(EXTRA_MDDEPEND_FILES))))
Yeah, ok, that totally did it...
Flags: needinfo?(khuey)
Flags: needinfo?(gps)
Attachment #744040 - Attachment is obsolete: true
Specifically, see https://tbpl.mozilla.org/?tree=Try&rev=a887419f5f59 for the Windows issue.
I guess one possible issue here is the size of the command line passed to BindingGen.py... Not sure whether that's what Windows is hitting.
Reporter

Comment 38

6 years ago
Result code of 101120 is interesting. I found bug 770558, but that shouldn't apply.

Bug 837618 is for a too long argument list and it has a different error message (explicitly stating argument list is too long).
Attachment #744050 - Attachment is obsolete: true
Attachment #744050 - Flags: review?(khuey)
Blocks: 867903
So your patch builds fine on my system.

In the logs I see 
0 [main] sh 2448 handle_exceptions: Exception: STATUS_ACCESS_VIOLATION
897 [main] sh 2448 open_stackdumpfile: Dumping stack trace to sh.exe.stackdump

So my guess is that we're really close to the maximum shell command length somewhere and tryserver's longer paths push it over the line.
Reporter

Comment 41

6 years ago
If running in pymake (ifdef .PYMAKE), commands prefixed with '%' are executed as native Python function calls, without involving a shell. The syntax is |% module function args|. I reckon you can put this knowledge to use.
I could also just stick the relevant data into a file and read it from there...  Though doing that with $? might be interesting.  :(
Posted patch This is green on try and all (obsolete) — Splinter Review
Attachment #745482 - Flags: review?(khuey)
Attachment #744429 - Attachment is obsolete: true
Attachment #744429 - Flags: review?(khuey)
Or at least green on try on Windows.
Whiteboard: [buildfaster:?] → [need review][buildfaster:?]
Comment on attachment 745482 [details] [diff] [review]
This is green on try and all

Review of attachment 745482 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/bindings/BindingGen.py
@@ +51,5 @@
> +            file.close()
> +        return contents
> +    allWebIDLFiles = readFile(args[2]).split()
> +    changedDeps = readFile(args[3]).split()
> +    

Super-nit: extraneous whitespace.

::: dom/bindings/Makefile.in
@@ +59,5 @@
> +# test cycle much faster, which is why we're doing it.
> +#
> +# XXXbz Could we cheat even more and only include our CPPSRCS when
> +# $(MAKECMDGOALS) contains libs, so that we can skip loading all those .o.pp
> +# when trying to make a single .cpp file too?

That would break | make FooBinding.obj | no?

@@ +112,1 @@
>  $(CPPOBJS): PrototypeList.h

It does.  This is an artifact from when including all the pp files was blowing out the shell command line limit on Windows, and as a result there was no pp data available there.  It can be removed now.
Attachment #745482 - Flags: review?(khuey) → review+
Fixed the whitespace, updated the comment, removed the CPPOBJS thing, and https://hg.mozilla.org/integration/mozilla-inbound/rev/712d3684efe4
Flags: in-testsuite-
Whiteboard: [need review][buildfaster:?] → [buildfaster:?]
Target Milestone: --- → mozilla23
Reporter

Updated

6 years ago
Whiteboard: [buildfaster:?]
Attachment #745482 - Attachment is obsolete: true
Attachment #739167 - Attachment is obsolete: true
Whiteboard: [buildfaster:?]
From #developers:

13:58:53 - NeilAway: eek, bz's Makefiles have spaces on some of their indentation :-s
(In reply to Ed Morley [:edmorley UTC+1] from comment #49)
> From #developers:
> 
> 13:58:53 - NeilAway: eek, bz's Makefiles have spaces on some of their
> indentation :-s

http://hg.mozilla.org/integration/mozilla-inbound/rev/5784ad28aa83#l4.83
(In reply to Ed Morley [:edmorley UTC+1] from comment #50)
> (In reply to Ed Morley [:edmorley UTC+1] from comment #49)
> > From #developers:
> > 
> > 13:58:53 - NeilAway: eek, bz's Makefiles have spaces on some of their
> > indentation :-s
> 
> http://hg.mozilla.org/integration/mozilla-inbound/rev/5784ad28aa83#l4.83

Actually,

http://hg.mozilla.org/integration/mozilla-inbound/rev/5784ad28aa83#l4.179
https://hg.mozilla.org/mozilla-central/rev/5784ad28aa83
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Reporter

Updated

6 years ago
Blocks: 874923

Comment 54

6 years ago
Comment on attachment 748865 [details] [diff] [review]
With a fix for the Linux PGO problem

>diff --git a/dom/bindings/Makefile.in b/dom/bindings/Makefile.in
>--- a/dom/bindings/Makefile.in
>+++ b/dom/bindings/Makefile.in
>+.BindingGen: $(bindinggen_dependencies) $(binding_dependency_trackers)
>+	# Make sure .deps actually exists, since we'll try to write to it from
>+	# BindingGen.py but we're typically running in the export phase, which
>+	# is before anyone has bothered creating .deps.
>+	$(MKDIR) -p .deps
>+        # Pass our long lists through files to try to avoid blowing
>+        # out the command line
>+	echo $(all_webidl_files) > .all-webidl-file-list
>+	echo $? > .changed-dependency-list
>+	# BindingGen.py will examine the changed dependency list to figure out
>+	# what it really needs to regenerate.

>+	$PYTHONDONTWRITEBYTECODE=1 $(PYTHON) $(topsrcdir)/config/pythonpath.py \

Typo: the extraneous '$' ahead of PYTHONDONTWRITEBYCODE=1 results that 'YTHONDONTWRITEBYCODE=1' gets exported during the shell commands. You can see this in every tb build log. My homebrew build even braked because the '$' was substituted by something else (which might be due to my special build setup). Anyway, exporting a false env variable is probably undesired.
Walter, thank you for catching that!

Pushed http://hg.mozilla.org/integration/mozilla-inbound/rev/ed1fdc585eb1 to address that issue and while I was there comment 52.
khuey, or bz, can we fix the following issue?

* many many many many lines of:

# Just bring it up to date, if it's out of date, so that we'll know that
# we have to redo binding generation and flag this prerequisite there as
# being newer than the bindinggen target.
# Just bring it up to date, if it's out of date, so that we'll know that
# we have to redo binding generation and flag this prerequisite there as
# being newer than the bindinggen target.
# Just bring it up to date, if it's out of date, so that we'll know that
# we have to redo binding generation and flag this prerequisite there as
# being newer than the bindinggen target.


output to console during a build, this is because each comment line is its own makefile "command", we should instead dedent them and throw them above the Make target in the makefile.....

It needlessly bloats log length which can be a problem for buildbot after too much of it, and can also be annoying for sheriffs who need to load these logs regularly
Flags: needinfo?(khuey)
Flags: needinfo?(bzbarsky)
I'm happy to review a patch to do that, sure.  I never noticed it, because I always build with make -s....
Flags: needinfo?(bzbarsky)
Reporter

Updated

6 years ago
Blocks: 890744
Depends on: 893798
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.