Closed Bug 430863 Opened 16 years ago Closed 16 years ago

Treehydra: commit extra JS libs and analysis frameworks

Categories

(Developer Infrastructure :: Source Code Analysis, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dmandelin, Assigned: dmandelin)

References

Details

Attachments

(1 file, 1 obsolete file)

Attached patch Code written for outparams (obsolete) — Splinter Review
This bug is to add the potentially reusable code from the outparams analysis to the Treehydra distribution as libraries. Here's the code, please review.
Attachment #317771 - Flags: review?(tglek)
It doesn't look like CALL_EXPR stuff in that code will work on a mac :(
(In reply to comment #1)
> It doesn't look like CALL_EXPR stuff in that code will work on a mac :(

 :), this is why we have to get everything into mainline.

Dave, bunch of good work there. It's a shame we haven't started syncing up later. I suppose it's my fault for not prompting you sooner.

In the future, please share all of the gcc func ports upstream asap. In fact I think we should take out all of the gcc compat functions from treehydra.js and move them to gcc_compat.js which would be included from treehydra.js

I frown at 2 incompatible Map implementations between the two of us. Lets try to join efforts by implementing the es4 spec as close as possible since our implementations implement different aspects of the same datastructure.

The equals stuff is hacky. I'm not sure why you implemented it as a toplevel func with half it in the Map. I'm pretty sure that structurally_equals() from system.js is what you want here, as it's general.

As for the rest of the utility functions, we need to go through the functions one by one and try to define a utility api out of them. I a couple of the same of the same funcs with more or less features.  Trouble with this approach is then we have to rewrite existing code, which you have a lot more of than I do. I'd like to do this because it will result in a higher quality codebase and once we converge on a common library of utility functions, it should make writing scripts easier for other users.
(In reply to comment #2)
> (In reply to comment #1)
> > It doesn't look like CALL_EXPR stuff in that code will work on a mac :(
> 
>  :), this is why we have to get everything into mainline.
> 
> Dave, bunch of good work there. It's a shame we haven't started syncing up
> later. I suppose it's my fault for not prompting you sooner.

Each of these points entails significant design decisions. I didn't really want to worry about that at the same time as I prototyped the analysis and my needs were evolving rapidly. 

> In the future, please share all of the gcc func ports upstream asap. In fact I
> think we should take out all of the gcc compat functions from treehydra.js and
> move them to gcc_compat.js which would be included from treehydra.js

Sounds good. I assume this includes ports of both macros and functions.

> I frown at 2 incompatible Map implementations between the two of us. Lets try
> to join efforts by implementing the es4 spec as close as possible since our
> implementations implement different aspects of the same datastructure.

Those are good goals. But there's a bunch of stuff going on here:

- I created the simplest Map implementation that would meet my needs. The API is pretty close to ES4, I think the difference there is little more than a few function names.

- The big difference is that the ES4 spec allows non-unique hashes and uses an equality function to take care of the difference. This means more implementation complexity (and potentially worse performance), so I avoided it for now. It would be possible to use my implementation with the same API, but it would require unique hashes. (I'm also not sure about the type of the ES4 hashcodes, or whether it matters here.)

- I absolutely need to be able to customize the notion of equality, as ES4 map allows.

> The equals stuff is hacky. I'm not sure why you implemented it as a toplevel
> func with half it in the Map. I'm pretty sure that structurally_equals() from
> system.js is what you want here, as it's general.

No. structurally_equal() dies horribly on cyclic/large data structures. And my maps contain GCC DECLs. And that is just a specific instance of the general problem that the notion of equality needs to be customizable, thus my reliance on an equals() method defined per-object. Parts of some objects should be excluded for performance (e.g., fields of State other than .substates), and parts of others for semantics (e.g., by definition, Substates are equal if their abstract value maps are equal, blames don't matter).

> As for the rest of the utility functions, we need to go through the functions
> one by one and try to define a utility api out of them. I a couple of the same
> of the same funcs with more or less features.  Trouble with this approach is
> then we have to rewrite existing code, which you have a lot more of than I do.
> I'd like to do this because it will result in a higher quality codebase and
> once we converge on a common library of utility functions, it should make
> writing scripts easier for other users.

How do you want to plan this? The general space seems to range over something like:

1. Hold 420933 until this code is fully integrated with Treehydra and brought up to a level of generality expected for standard libraries.

2. Commit this code as part of 420933. Then, an API feature at a time, improve the libraries, move them into Treehydra, and update outparams accordingly.

3. Commit this code and 420933. Then, an API feature at a time, improve Treehydra libs and update outparams accordingly.

Here's what I suggest:

a. Split the libs into two categories: "alpha" and "scary". 

"alpha" means it works pretty well, but we might expect significant evolution. (E.g., map). So, a new user wouldn't go too far astray by using "alpha" features, but should expect them to change--after all, Treehydra is pretty new. And I don't think we have any other users that are hitting it too hard just yet, so it should be no problem for now.

"scary" means it's not too easy for anyone else to figure out what to do with it, and we'd like to discourage people from using it (unless they're interested in actively collaborating on its developement). I'm not sure anything necessarily falls in this category yet, but link_switches comes to mind as being scary even to me.

b. Check everything in to Treehydra with some separation that users can see between the two sets. I was already thinking we have a lot of JS now to be living in the dehydra-gcc dir, so perhaps a 'libs' directory with an 'unstable' subdir.

c. Change the Map API to more closely track the ES4 proposal.

d. Defer other library changes until we have real evidence of practical need. 
> > In the future, please share all of the gcc func ports upstream asap. In fact I
> > think we should take out all of the gcc compat functions from treehydra.js and
> > move them to gcc_compat.js which would be included from treehydra.js
> 
> Sounds good. I assume this includes ports of both macros and functions.

Yes. In fact I think we should start from there.

> 
> > I frown at 2 incompatible Map implementations between the two of us. Lets try
> > to join efforts by implementing the es4 spec as close as possible since our
> > implementations implement different aspects of the same datastructure.
> 
> Those are good goals. But there's a bunch of stuff going on here:
> 
> - I created the simplest Map implementation that would meet my needs. The API
> is pretty close to ES4, I think the difference there is little more than a few
> function names.

I noticed that.

> 
> - The big difference is that the ES4 spec allows non-unique hashes and uses an
> equality function to take care of the difference. This means more
> implementation complexity (and potentially worse performance), so I avoided it
> for now. It would be possible to use my implementation with the same API, but
> it would require unique hashes. (I'm also not sure about the type of the ES4
> hashcodes, or whether it matters here.)
string representations of ints make good hashcodes, or so says my experience. I'm not sure which part of es4 map scares you performance-wise. It's likely we could specialize certain Map methods for hashcode/equals.

> 
> - I absolutely need to be able to customize the notion of equality, as ES4 map
> allows.

That's a frequent requirement.
> 
> > The equals stuff is hacky. I'm not sure why you implemented it as a toplevel
> > func with half it in the Map. I'm pretty sure that structurally_equals() from
> > system.js is what you want here, as it's general.
> 
> No. structurally_equal() dies horribly on cyclic/large data structures. And my
> maps contain GCC DECLs. And that is just a specific instance of the general
> problem that the notion of equality needs to be customizable, thus my reliance
> on an equals() method defined per-object. Parts of some objects should be
> excluded for performance (e.g., fields of State other than .substates), and
> parts of others for semantics (e.g., by definition, Substates are equal if
> their abstract value maps are equal, blames don't matter).

Fair enough.

> 
> > As for the rest of the utility functions, we need to go through the functions
> > one by one and try to define a utility api out of them. I a couple of the same
> > of the same funcs with more or less features.  Trouble with this approach is
> > then we have to rewrite existing code, which you have a lot more of than I do.
> > I'd like to do this because it will result in a higher quality codebase and
> > once we converge on a common library of utility functions, it should make
> > writing scripts easier for other users.
> 
> How do you want to plan this? The general space seems to range over something
> like:
> 
> 1. Hold 420933 until this code is fully integrated with Treehydra and brought
> up to a level of generality expected for standard libraries.


> 
> 2. Commit this code as part of 420933. Then, an API feature at a time, improve
> the libraries, move them into Treehydra, and update outparams accordingly.

I don't like holding up code. I think we both agree, that it's better to commit what works and then incrementally modify it.


> 
> 3. Commit this code and 420933. Then, an API feature at a time, improve
> Treehydra libs and update outparams accordingly.
> 
> Here's what I suggest:
> 
> a. Split the libs into two categories: "alpha" and "scary". 
> 
> "alpha" means it works pretty well, but we might expect significant evolution.
> (E.g., map). So, a new user wouldn't go too far astray by using "alpha"
> features, but should expect them to change--after all, Treehydra is pretty new.
> And I don't think we have any other users that are hitting it too hard just
> yet, so it should be no problem for now.
> 
> "scary" means it's not too easy for anyone else to figure out what to do with
> it, and we'd like to discourage people from using it (unless they're interested
> in actively collaborating on its developement). I'm not sure anything
> necessarily falls in this category yet, but link_switches comes to mind as
> being scary even to me.

> 
> b. Check everything in to Treehydra with some separation that users can see
> between the two sets. I was already thinking we have a lot of JS now to be
> living in the dehydra-gcc dir, so perhaps a 'libs' directory with an 'unstable'
> subdir.

So we should add a libs/ dir with an alpha subdir which is enabled by the user using the new include functionality?


> 
> c. Change the Map API to more closely track the ES4 proposal.
> 
> d. Defer other library changes until we have real evidence of practical need. 

Right, so that would imply that duplicate functionality in our code should be merged.
(In reply to comment #4)
> > - The big difference is that the ES4 spec allows non-unique hashes and uses an
> > equality function to take care of the difference. This means more
> > implementation complexity (and potentially worse performance), so I avoided it
> > for now. It would be possible to use my implementation with the same API, but
> > it would require unique hashes. (I'm also not sure about the type of the ES4
> > hashcodes, or whether it matters here.)
> string representations of ints make good hashcodes, or so says my experience.
> I'm not sure which part of es4 map scares you performance-wise. It's likely we
> could specialize certain Map methods for hashcode/equals.

What I was secretly thinking as I typed that:

1. If hash functions are one-to-one (i.e., hash(u) == hash(v) => u == v), then there is no need for searching a chain and checking key equality, avoiding a potential performance hit and making the Map implementation simpler. The more general semantics of ES4 Map require searching and equality testing.

2. In my code, the hashcode is a string, which is used directly as a property on the underlying JS object in the implementation. It works perfectly well, but diverges from ES4, where I am told hashcodes will be Number, interpreted as integers. Note that if we go to integer hashcodes, it's not practical to make the hash functions injective. 

It's kind of a tough call. The present version is significantly simpler than a more exact dup of ES4, so I'm thinking it's best for now to use injective string hash functions, but encapsulate that fact so that it's easy to change later.

> > b. Check everything in to Treehydra with some separation that users can see
> > between the two sets. I was already thinking we have a lot of JS now to be
> > living in the dehydra-gcc dir, so perhaps a 'libs' directory with an 'unstable'
> > subdir.
> 
> So we should add a libs/ dir with an alpha subdir which is enabled by the user
> using the new include functionality?

How about:

  libs/
    treehydra.js (contains core treehydra reqs, e.g., EnumValue, and
                  includes enums.js, useful_arrays.js, gcc_compat.js)
    gcc_compat.js
    # The next group is considered alpha, but the system as a whole is
    # at an alpha stage right now so no need to mark that specially. But
    # the user does have to include them specifically.
    gcc_print.js
    map.js
    set.js

    # The need to type "include('unstable/esp.js')" should be enough of a clue.
    unstable/
      analysis.js
      esp.js
      util.js   (should rename/move things around a bit)      

> > c. Change the Map API to more closely track the ES4 proposal.
> > 
> > d. Defer other library changes until we have real evidence of practical need. 
> 
> Right, so that would imply that duplicate functionality in our code should be
> merged.

I just audited treehydra.js, and the only duplication I see is decl_name with my decl_name_string. decl_name_string is used in 2 files: gcc_print.js and outparams.js. 

For gcc_print, it should be replaced by decl_name, as decl_name is basically a better version of decl_name_string.

In outparams.js, I think the same replacement might work OK, but I don't want to do that because outparams is using it for semantic purposes, so mods to decl_name_string that make sense for a pretty-printer could break outparams. outparams.js should use the semantics of decl_name_string.

I think both names need to be changed: I don't want to have both DECL_NAME and decl_name, and the one used by outparams.js shouldn't have a name that makes it seem like a pretty-printer. But I'm having a hard time coming up with names I really like. The pretty-printing one is also hard because it specifically pretty-prints the 'name' part only of the DECL, but what it prints isn't necessarily the actual DECL_NAME either, if it's anonymous. Ugh. How about:

  decl_id_string   "convert the decl's identifier to a string"

  decl_name_string "convert the decl's name, if any, to a string"

Still kind of confusing. Maybe need to use "label" or "display" or "print" or "pp" as a prefix for pretty-printing stuff?

Anyway, I didn't see any other dups (unless there are dup'd macro ports, but it's obvious what to do with those). Was there anything else you had noticed?
Comment on attachment 317771 [details] [diff] [review]
Code written for outparams


>+/** Return true if 'predicate' is true for every element of 'array',
>+ *  where 'predicate' is a function of one argument. */
>+function array_all(array, predicate) {
>+  for (let i = 0; i < array.length; ++i) {
>+    if (!predicate(array[i])) return false;
>+  }
>+  return true;
>+}
>+

is this the same as Array.every?
Attached patch Revised patchSplinter Review
Yep, array_all is the same as Array.every. Those post-1.5 methods sneak out of my sight in the docs.

I only made a few revisions so far: removing array_all and renaming methods to make the Map thing more compatible with ES4. 

I'm still waiting for more comments on directory packaging and Map equality/hashcode semantics before going too far there. 

Personally, I'd like to get the directory packaging sorted out, as well as the content of the gcc_compat lib, before committing this. But the Map/Set stuff and various improvements to the pretty-printing that would be nice I think can wait a little longer and be filed as separate bugs.
Attachment #317771 - Attachment is obsolete: true
Attachment #317771 - Flags: review?(tglek)
I just whipped up a nearer clone to ES4 map, with the equals and hashcode parameters. It's not too complicated. 

I then compared performance in the outparams analysis on moz. I just replaced the "innermost" state map type with the new map and ran on 200 files or so. In those files, 1856 functions get analyzed, in 217 seconds with the old map, 240 for the new map. (These times are just the sum of the times taken, so it suggests that the total time would about 40 minutes on all the files, or maybe 10-15 if run in parallel, which I think is about the cost I see on dm-oink01 in previous runs.) So it would be about an extra 5 minutes on all of moz, maybe about 2 on a parallel machine. So, not too bad.

I think I'm going to set up an ADT factory that I can use to switch everything between the injective-hash map and the general map and see what happens there. If the cost of general map is non-trivial, then I'd like to just keep both around and use the factory to control it.
> I only made a few revisions so far: removing array_all and renaming methods to
> make the Map thing more compatible with ES4. 

Cool.

> I'm still waiting for more comments on directory packaging and Map
> equality/hashcode semantics before going too far there. 
> 
> Personally, I'd like to get the directory packaging sorted out, as well as the
> content of the gcc_compat lib, before committing this. But the Map/Set stuff
> and various improvements to the pretty-printing that would be nice I think can
> wait a little longer and be filed as separate bugs.
> 

I'm happy with your include dir proposal. Feel free to commit.

Also, I think we agree that following ES4 map proposal exactly without having es4, may not be the best idea. Perhaps we should just settle for somewhat es4-like, but faster in spidermonkey approach that works for both of us. Either way, that should be a separate bug. Lets do the initial commits and then turn this into a tracking bug for the rest of the api work.

Also, I don't like holding up utility code from being committed. Lets commit and improve API/perf/etc incrementally. I just want to be able to make changes so scripts using the libs should be expected to adapt to changes instead of holding us back.
> In outparams.js, I think the same replacement might work OK, but I don't want
> to do that because outparams is using it for semantic purposes, so mods to
> decl_name_string that make sense for a pretty-printer could break outparams.
> outparams.js should use the semantics of decl_name_string.
> 
> I think both names need to be changed: I don't want to have both DECL_NAME and
> decl_name, and the one used by outparams.js shouldn't have a name that makes it
> seem like a pretty-printer. But I'm having a hard time coming up with names I
> really like. The pretty-printing one is also hard because it specifically
> pretty-prints the 'name' part only of the DECL, but what it prints isn't
> necessarily the actual DECL_NAME either, if it's anonymous. Ugh. How about:
> 
>   decl_id_string   "convert the decl's identifier to a string"
> 
>   decl_name_string "convert the decl's name, if any, to a string"
> 
> Still kind of confusing. Maybe need to use "label" or "display" or "print" or
> "pp" as a prefix for pretty-printing stuff?
> 
> Anyway, I didn't see any other dups (unless there are dup'd macro ports, but
> it's obvious what to do with those). Was there anything else you had noticed?
> 

I'm not sure why mods to how the names are printed would break your analysis. I'm not convinced that you need 2 functions to show variable names.

I looked at the rest of the code, I don't understand the analysis parts as I'd need to try to use it first.
However the pattern 
function wrapper (foo) {
 return func(foo)
} is inefficient. I suggest const wrapper = func pattern.
OK, I'm going to start committing chunks. I just wanted to mention I finished some perf tests with the outparams with different maps.

                           wall clock 
No-op script                 17m 20s                  
Outparams with fast map      26m 45s
Outparams with slow map      27m 30s

This was for 1970 files of moz on 4 processors.
Pushed, with most but not all suggested changes included. I'll file bugs for the others. 

It's a pretty big change, so be sure to pull before doing any JS hacking.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Product: Core → Firefox Build System
Product: Firefox Build System → Developer Infrastructure
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: