Closed Bug 585814 Opened 10 years ago Closed 10 years ago

add perl NYTPROF environment to tinderbox

Categories

(Infrastructure & Operations :: RelOps: General, task)

x86
macOS
task
Not set

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: bear, Assigned: justdave)

References

Details

Attachments

(1 file)

In order for us to see what is happening inside of the tinderbox perl cgi runs, rhemler worked up the following to get apache to call the perl cgi code with profiling enabled:

SetEnv NYTPROF "addpid=1:file=/tmp/nytprof.out"
SetEnv PERL5OPT "-d:NYTProf"

This will create a file per PID /tmp/nyprof.out.${PID}

These will allow us to gather profile data across a number of runs - I would like to see this added and then the files bundled into a tarball so we can review them.

sound like the beginnings of a plan or pure gibberish? thanks!
as rhelmer just noted in IRC "it will definitely slow things down on production"... so maybe this is done for a couple of hours to get a snapshot of data - but I think we need to see the data from it running on the production server to get a view as to what the true bottleneck is.
I tested this with http://search.cpan.org/~timb/Devel-NYTProf-4.04/lib/Devel/NYTProf.pm

I think even a staging run, as long as it has production data, would be very useful.. it would be nice to see where showbuilds.cgi is spending it's time.
Installed:
  perl-Devel-NYTProf.x86_64 0:4.04-1.el5.rf                                     

Dependency Installed:
  perl-JSON-Any.x86_64 0:1.22-1.el5.rf
  perl-JSON-XS.x86_64 0:2.27-1.el5.rf
  perl-common-sense.x86_64 0:3.0-1.el5.rf

Added the config to production and restarted httpd.

and I'm not seeing any log files generated....
(In reply to comment #3)
> Installed:
>   perl-Devel-NYTProf.x86_64 0:4.04-1.el5.rf                                     
> 
> Dependency Installed:
>   perl-JSON-Any.x86_64 0:1.22-1.el5.rf
>   perl-JSON-XS.x86_64 0:2.27-1.el5.rf
>   perl-common-sense.x86_64 0:3.0-1.el5.rf
> 
> Added the config to production and restarted httpd.
> 
> and I'm not seeing any log files generated....

This is what I am using on my dev (Ubuntu) box, and I get "/tmp/nytprof.out.${PID}" after hitting showbuilds.cgi:

       <Directory "/var/www/tinderbox">
                AllowOverride None
                Options +ExecCGI -MultiViews +SymLinksIfOwnerMatch
                Order allow,deny
                Allow from all
                SetEnv NYTPROF "addpid=1:file=/tmp/nytprof.out"
                SetEnv PERL5OPT "-d:NYTProf"
        </Directory>
(In reply to comment #4)
> (In reply to comment #3)
> > Installed:
> >   perl-Devel-NYTProf.x86_64 0:4.04-1.el5.rf                                     

I installed from CPAN but looks like the same exact version.
justdave got this working, here's a bit of data... looks like this was a dynamic JSON query, just what we're looking for:

Calls    P        F        Exclusive Time       Inclusive Time         Subroutine
1        1        1        12.4s                14.1s                  main::::load_scrape main::load_scrape
1        1        1        10.5s                10.5s                  main::::make_build_table main::make_build_table
1        1        1        5.58s                5.58s                  Data::Dumper::::DumpxsData::Dumper::Dumpxs (xsub)
1        1        1        2.09s                34.4s                  main::::do_json main::do_json
363065   6        1        1.66s                1.66s                  main::::CORE:readline main::CORE:readline (opcode)

I am surprised at how long load_scrape takes! scrape.dat must be very large, but I didn't expect it to dominate so much.

make_build_table is actually building a bunch of info we don't really need for the JSON (it's basically a data structure to hold the table that Tinderbox makes) so we could probably just avoid it altogether.

One approach would be to either make an alternative tb_load_data, or make tb_load_data optionally load some of this data.
Assignee: server-ops → justdave
Is the profiling still active in production (dm-webtools02)? If so, can it be disabled, please? munin is showing a lot of extra load over the past many hours.
(In reply to comment #7)
> Is the profiling still active in production (dm-webtools02)? If so, can it be
> disabled, please? munin is showing a lot of extra load over the past many
> hours.

Alrighty, disabled.
(In reply to comment #8)
> (In reply to comment #7)
> > Is the profiling still active in production (dm-webtools02)? If so, can it be
> > disabled, please? munin is showing a lot of extra load over the past many
> > hours.
> 
> Alrighty, disabled.

Thanks, I didn't realize it was still going... would you mind zipping up /tmp/nytprof* and putting it somewhere we can grab?
Also if you could enable it on tinderbox-stage that would be fanatastic...
(In reply to comment #9)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > Is the profiling still active in production (dm-webtools02)? If so, can it be
> > > disabled, please? munin is showing a lot of extra load over the past many
> > > hours.
> > 
> > Alrighty, disabled.
> 
> Thanks, I didn't realize it was still going... would you mind zipping up
> /tmp/nytprof* and putting it somewhere we can grab?

Apparently those files disappeared when it was turned off, according to fox2mike :(.
Blocks: 582246
(In reply to comment #11)
> (In reply to comment #9)
> > (In reply to comment #8)
> > > (In reply to comment #7)
> > > > Is the profiling still active in production (dm-webtools02)? If so, can it be
> > > > disabled, please? munin is showing a lot of extra load over the past many
> > > > hours.
> > > 
> > > Alrighty, disabled.
> > 
> > Thanks, I didn't realize it was still going... would you mind zipping up
> > /tmp/nytprof* and putting it somewhere we can grab?
> 
> Apparently those files disappeared when it was turned off, according to
> fox2mike :(.

Ah, no worries. Right now I am more concerned with comparing json relative to json2 with the patches from bug 585876 (which justdave put on stage last night), I think we should be able to do most profiling on staging anyway.
(In reply to comment #8)
> Alrighty, disabled.

I don't see the load drop at 6am PDT, could we please double check it is disabled ?
justdave says that the profiling code was only in place for 5 minutes, so dunno what happened at comment #8. fox2mike ?

So something else is causing all the iostat load in this graph, starting from about 1:30am PDT Tuesday ? 

I'm also seeing slow processing of mail right now (ie builds take a long time to appear on the waterfall). Eg http://tinderbox.mozilla.org/showlog.cgi?log=Firefox3.5/1281494131.1281494599.10760.gz ran between 19:35:31 and 19:43:08 according to buildbot, yet tinderbox says 19:34 to 20:13.
(In reply to comment #14)
> Created attachment 464690 [details]
> CPU usage on dm-webtools02
> 
> justdave says that the profiling code was only in place for 5 minutes, so dunno
> what happened at comment #8. fox2mike ?

The following lines :

SetEnv NYTPROF "addpid=1:file=/tmp/nytprof.out"
SetEnv PERL5OPT "-d:NYTProf"

were still there and so I commented them out. Not sure if Dave disabled it elsewhere before this.
(In reply to comment #15)
> The following lines :
> 
> SetEnv NYTPROF "addpid=1:file=/tmp/nytprof.out"
> SetEnv PERL5OPT "-d:NYTProf"
> 
> were still there and so I commented them out. Not sure if Dave disabled it
> elsewhere before this.

It didn't work anyway, perl ignores those when running in setuid mode.

They have to be hard-coded in the perl script itself to work in our environment, and that's what I removed when I disabled it.

In general this is a wontfix (cantfix actually).  I can run specific tests on individual files if you need it, it'll be on a one-off basis.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Component: Server Operations: RelEng → RelOps
Product: mozilla.org → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.