Closed Bug 427115 Opened 12 years ago Closed 2 years ago

Rewriting+Analysis pass: strip code which is dead if allocators can't fail

Categories

(Firefox Build System :: Source Code Analysis, defect)

x86
Linux
defect
Not set

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 550991

People

(Reporter: benjamin, Assigned: dmandelin)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

Bug 427109 will make it so that normal allocators (je_malloc and operator new) don't ever fail. Once this happens, we should rewrite our codebase so that code which branches based on allocators failing will be removed:

   nsFoo *foo = new nsFoo();
-  if (!foo) {
-    // CLEAN UP
-    return NS_ERROR_OUT_OF_MEMORY;
-  }

We should also create a dehydra/treehydra pass which warns about code that is dead because of never-failing allocators.... perhaps that's the first step, even, and the results of that can be fed back into pork to produce a stripping patch.
My first question is, what's an allocator? operator new, obviously, and malloc(), but there must be others. Is there a list, or should I just see what callee-return-values things like 'return NS_ERROR_OUT_OF_MEMORY' are control-dependent on?
Another question: A lot of methods that used to be able to return NS_ERROR_OUT_OF_MEMORY won't, so their callers should have any code devoted to handling that condition ripped out. But I assume that error code won't leave entirely, because the big allocations can fail. Do we need a separate analysis to sort this issue out?
well... let's use this list of allocators to start:

alloc()
NS_Alloc()
PR_Malloc()
realloc()
NS_Realloc()
PR_Realloc()
strdup()

We can add more later if desirable.

For now I think we should ignore second-order effects.
and of course ::operator new and ::operator new[]
Random thought here. GCC has an option -Wunreachable-code that prints warnings for unreachable code. This might be used to identify code that is made dead by changes to the allocator. But it doesn't look easy to use: it's not smart enough to know that new never returns NULL (e.g., in the standard C++ setup), and it's not smart enough to figure out what functions never return NULL, even inline functions. 

Ideally, I'd like to replace allocator functions and uses of operator new with versions or code snippets that never return null and let GCC figure it out. But it looks like it can only be done with macros, and not easily (or at all?) there (esp. for new). Anyone have any experience with that kind of trickery that can say if it's possible or how to do it?
Dave, 
Generally GCC kind sucks at these analyses because it is a compiler and has to do time vs benefit tradeoff. I haven't looked at this particular analysis, but I suspect it isn't something to write home about(it'll definetly miss all of the app-specific allocation points).
 I think this is a perfectly reasonable analysis to do in Treehydra(for all cases) + Elsa(for most easy cases).

I'd be happy to do this one or at least the elsa part.
Just wondering: if you do the analysis pass in gcc (treehydra), how would you do the rewriting? Pass the location information to an elsa/oink tool for the actual rewriting?
What I've started on is the 'NS_ENSURE_TRUE(ptr, NS_ERROR_OOM)' pattern, as it seems to be easy. If I can prove in Treehydra that there is a "dead branch" (I think it's technically a "constant conditional expression") on the line of the macro invocation, then it's safe to delete the line. 

After that, I'll see what's the next simplest pattern and see if some repeat of the approach works. I'm thinking that if I output from Treehydra the line number of a constant conditional and the line numbers of statements on the unreachable branch, then that's enough to figure out the exact statements and positions in Elsa.
(In reply to comment #7)
> Just wondering: if you do the analysis pass in gcc (treehydra), how would you
> do the rewriting? Pass the location information to an elsa/oink tool for the
> actual rewriting?
> 

I was thinking of writing the logic separately in elsa. Elsa would deal with simple to detect/rewrite patterns and treehydra would ensure that there is no more OOM code left.

(In reply to comment #8)
> What I've started on is the 'NS_ENSURE_TRUE(ptr, NS_ERROR_OOM)' pattern, as it
> seems to be easy. If I can prove in Treehydra that there is a "dead branch" (I
> think it's technically a "constant conditional expression") on the line of the
> macro invocation, then it's safe to delete the line. 
> 
> After that, I'll see what's the next simplest pattern and see if some repeat of
> the approach works. I'm thinking that if I output from Treehydra the line
> number of a constant conditional and the line numbers of statements on the
> unreachable branch, then that's enough to figure out the exact statements and
> positions in Elsa.

It looks to me like in this case it's easier to detect NS_ENSURE_TRUE(*, oom) if you make it an inline func, then specialize the oom-error-code case. If you keep NS_ENSURE_TRUE as a macro things get really really messy when you try to pass the info to Elsa.
(In reply to comment #9)
> (In reply to comment #7)
> > Just wondering: if you do the analysis pass in gcc (treehydra), how would you
> > do the rewriting? Pass the location information to an elsa/oink tool for the
> > actual rewriting?
> > 
> 
> I was thinking of writing the logic separately in elsa. Elsa would deal with
> simple to detect/rewrite patterns and treehydra would ensure that there is no
> more OOM code left.

It could work, but it seems so much easier to do the analysis in Treehydra, I figured I'd give it shot and see how it goes.

> (In reply to comment #8)
> > What I've started on is the 'NS_ENSURE_TRUE(ptr, NS_ERROR_OOM)' pattern, as it
> > seems to be easy. If I can prove in Treehydra that there is a "dead branch" (I
> > think it's technically a "constant conditional expression") on the line of the
> > macro invocation, then it's safe to delete the line. 
> > 
> > After that, I'll see what's the next simplest pattern and see if some repeat of
> > the approach works. I'm thinking that if I output from Treehydra the line
> > number of a constant conditional and the line numbers of statements on the
> > unreachable branch, then that's enough to figure out the exact statements and
> > positions in Elsa.
> 
> It looks to me like in this case it's easier to detect NS_ENSURE_TRUE(*, oom)
> if you make it an inline func, then specialize the oom-error-code case. If you
> keep NS_ENSURE_TRUE as a macro things get really really messy when you try to
> pass the info to Elsa.

That's a pretty good idea, but I don't know if it's needed. For NS_ENSURE_TRUE, you don't pass anything to Elsa, you just delete that line. Also, if you make it macro, you alter the control flow semantics of the method, so I figured I would just do it the other way.
(In reply to comment #10)
> > I was thinking of writing the logic separately in elsa. Elsa would deal with
> > simple to detect/rewrite patterns and treehydra would ensure that there is no
> > more OOM code left.
> 
> It could work, but it seems so much easier to do the analysis in Treehydra, I
> figured I'd give it shot and see how it goes.

I 100% agree with this approach.
> 
> > (In reply to comment #8)
> > > What I've started on is the 'NS_ENSURE_TRUE(ptr, NS_ERROR_OOM)' pattern, as it
> > > seems to be easy. If I can prove in Treehydra that there is a "dead branch" (I
> > > think it's technically a "constant conditional expression") on the line of the
> > > macro invocation, then it's safe to delete the line. 
> > > 
> > > After that, I'll see what's the next simplest pattern and see if some repeat of
> > > the approach works. I'm thinking that if I output from Treehydra the line
> > > number of a constant conditional and the line numbers of statements on the
> > > unreachable branch, then that's enough to figure out the exact statements and
> > > positions in Elsa.
> > 
> > It looks to me like in this case it's easier to detect NS_ENSURE_TRUE(*, oom)
> > if you make it an inline func, then specialize the oom-error-code case. If you
> > keep NS_ENSURE_TRUE as a macro things get really really messy when you try to
> > pass the info to Elsa.
> 
> That's a pretty good idea, but I don't know if it's needed. For NS_ENSURE_TRUE,
> you don't pass anything to Elsa, you just delete that line. Also, if you make
> it macro, you alter the control flow semantics of the method, so I figured I
> would just do it the other way.
> 

I'm not sure I see what the harm in altering the semantics is. Watch out about "just delete" approach.

a) this NS_ENSURE_TRUE could be nested within another macro, which elsa detects and warns about. No way to do it with GCC.

b) You'd break code like NS_ENSURE_TRUE(); foo(assuming it exists). Lack of end-of-node position info sucks :)
(In reply to comment #11)
> I'm not sure I see what the harm in altering the semantics is. 

I'm not sure either, but if you don't alter the semantics then you don't have to think about it :-). One possibility is how it works with loops, or code further down, but it probably doesn't actually cause any problems with those things either.

> Watch out about "just delete" approach.
> 
> a) this NS_ENSURE_TRUE could be nested within another macro, which elsa detects
> and warns about. No way to do it with GCC.
> 
> b) You'd break code like NS_ENSURE_TRUE(); foo(assuming it exists). Lack of
> end-of-node position info sucks :)

True, but I have the complete list of all the occurrences of this pattern, and those things don't happen. It's pretty easy to guarantee--just test the line to be deleted against a regexp.
Depends on: 433121
I have an analysis that seems to be working fairly well at detecting branches that are constant because their condition depends on an allocated pointer being null. Right now I'm at the kind of fuzzy point of figuring out how to validate the results and what do with them in terms of patch generation.

As a first step, I've found that the following macro definitions contain code that needs to be stripped. It's probably best to just do them by hand. bsmedberg--do you want me to prepare a patch for this, or would you prefer someone else to do it. I looked at them and it looks fairly easy even for somebody that's not an expert on this code.

      5 INIT_HANDLER
      1 NS_GENERIC_AGGREGATED_CONSTRUCTOR
      3 NS_GENERIC_AGGREGATED_CONSTRUCTOR_INIT
    300 NS_GENERIC_FACTORY_CONSTRUCTOR
     70 NS_GENERIC_FACTORY_CONSTRUCTOR_INIT
      3 NS_GENERIC_FACTORY_SINGLETON_CONSTRUCTOR
     48 NS_IMPL_ELEMENT_CLONE
     56 NS_IMPL_ELEMENT_CLONE_WITH_INIT
     56 NS_IMPL_NS_NEW_SVG_ELEMENT
     17 NS_INTERFACE_MAP_ENTRY_CONTENT_CLASSINFO_IF_TAG
     27 NS_NSS_GENERIC_FACTORY_CONSTRUCTOR
      5 NS_NSS_GENERIC_FACTORY_CONSTRUCTOR_INIT
      8 NS_REGISTER_FIRST_COMMAND
     37 NS_REGISTER_ONE_COMMAND
     21 NS_REGISTER_STYLE_COMMAND
      3 NS_REGISTER_TAG_COMMAND
This patch is against mozilla-central and removes about 900 if statements that are dead if allocators can't fail. I read through it by hand and also confirmed that it builds on Linux. I can't seem to get the try server to run right now--the basic one seems to use CVS, so my patch doesn't apply, and the repo one bombs trying to run client.py:

  added 15290 changesets with 85835 changes to 37372 files
  34208 files updated, 0 files merged, 0 files removed, 0 files unresolved
  Usage: client.py [options] update_nspr tagname | update_nss tagname

  client.py: error: no such option: -m

I'm going to post a patch early next week that strips some code from macros and from a few sites that the autopatcher can't handle, which should pick up another 700 or so places (90% in about 10 macros).
Attachment #324112 - Flags: review?
Attachment #324112 - Flags: review? → review?(benjamin)
As per the API in bug 427109, we need the following changes:

* we need to globally override ::operator new so that it uses the xmalloc call... I can do this in a separate bug
* when you're removing a null-check for malloc, we need to replace the call with xmalloc
* nsCRT::strdup currently forwards to PL_strdup (which is silly!), which can return null... we should probably fix this to use the xallocator somehow, but I'm not sure how

But this is great!
BTW, I filed bug 437792 about the try server hg bustage.
(In reply to comment #16)
> As per the API in bug 427109, we need the following changes:

How do you want to plan this?

1. It seems to me that it's OK to check in this patch before overriding operator new, etc, because the OOM handling code doesn't really work anyway. But maybe I'm wrong about that. In any case, I can always redo an automatic run at a later time, but it still takes time to setup and some manual review of the results is needed.

2. On malloc/xmalloc (and I assume the other x functions as well): Do you want me to do both stripping and x-adding in one patch, or do you think it's better to do a separate renaming patch?
1) land the xmalloc functions
2) global override of ::operator new
3) land the "new Foo" dead-code stripping
4) land the malloc dead-code stripping and rename to xmalloc at the same time

Sound reasonable?
> 2. On malloc/xmalloc (and I assume the other x functions as well): Do you want
> me to do both stripping and x-adding in one patch, or do you think it's better
> to do a separate renaming patch?
> 

I suggest writing a proper elsa renaming tool instead of doing this step in a script. It's easy, I can do that if you want.
(In reply to comment #19)
> 1) land the xmalloc functions
> 2) global override of ::operator new
> 3) land the "new Foo" dead-code stripping
> 4) land the malloc dead-code stripping and rename to xmalloc at the same time

OK. I'll just code up the renamer like Taras said and then hold onto it until the xmalloc stuff lands.

Assignee: nobody → dmandelin
Attachment #324112 - Flags: review?(benjamin)
Blocks: 583074
No longer blocks: 583074
Blocks: 583074
Product: Core → Firefox Build System
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 550991
You need to log in before you can comment on or make changes to this bug.