Bugzilla

Comment 2

•

13 years ago

For reference (WINNT 5.2 x86 tinderbox pgo):
2011-12-16: 2887.55 MB / 3027816448 bytes (from bug 710712 comment 16)
2011-01-04: 2886.54 MB / 3026759680 bytes (inbound rev 5025534b9d88)

Comment 3

•

13 years ago

And that should of course read 2012-01-04 (first of many instances of doing that this month I'm sure :-))

Comment 4

•

13 years ago

In lieu of having the graph server set up for this yet, I'll continue posting numbers periodically (at least whilst the memory of the ever so fun sheriffing weekend of bug 709193 is fresh in the mind):

2011-01-10: 2886.26 MB / 3026460672 bytes (inbound rev 01d69766026d)

Comment 5

•

12 years ago

2012-01-31: 2902.54 MB / 3043528704 bytes (inbound rev 5a8ff4828791)

(16mb increase in the last 3 weeks)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 6

•

12 years ago

2012-02-27: 2785.78 MB / 2921103360 bytes (inbound rev 0714ec049da2)

(Down 116 MB from 4 weeks ago, due to the MSVC 2010 switch perhaps?)

Comment 7

•

12 years ago

(In reply to Ed Morley [:edmorley] from comment #6)
> 2012-02-27: 2785.78 MB / 2921103360 bytes (inbound rev 0714ec049da2)
> 
> (Down 116 MB from 4 weeks ago, due to the MSVC 2010 switch perhaps?)

Right.

Phil Ringnalda (:philor)

Comment 8

•

12 years ago

3021553664 bytes (inbound rev 0831ce6ba72f on the retrigger after the first one died with "compiler is out of heap space in pass 2")

Comment 9

•

12 years ago

I'll start doing some bisecting to see if we can work out where this fairly hefty increase has come from...

Comment 10

•

12 years ago

Oh and we've just had another on inbound:
ac1504ff8740
https://tbpl.mozilla.org/php/getParsedLog.php?id=11354573&tree=Mozilla-Inbound

This is looking bad :-(

Severity: normal → critical

Priority: P3 → P1

Updated

•

12 years ago

Blocks: 750661

Comment 11

•

12 years ago

Attached file April inbound linker max virtual size values — Details

Comment 12

•

12 years ago

Hopefully bug 750717 should take away some of the pain of not having this at least for now.

Severity: critical → major

cmtalbert

Comment 13

•

12 years ago

So here is my armchair quarterback summary of what I think needs to happen here:
1. Write buildbot step to send this data from the buildslave to the graphserver system.
2. Modify the graphserver to accept this value from the builders (jmaher and/or rhelmer - can you file the database modification bug needed for this?)
3. Ensure the networking flows are in place between the builders and the graphserver systems (I can file this but I need the vlan numbers for all the releng builders - I'm sort of assuming they are separate from the test slaves - if they are on the same vlan as the slaves then maybe we don't need this).
4. Update the script for the dev.treemanagement auto-emailer to send this new data. Catlee, can you take this? You're the only person I know of that knows where the code for that mailer is and how to re-deploy a new version of it.

Please if I have any details wrong, do add a comment and correct me.

Chris AtLee [:catlee]

Comment 14

•

12 years ago

(In reply to Clint Talbert ( :ctalbert ) from comment #13)
> So here is my armchair quarterback summary of what I think needs to happen
> here:
> 1. Write buildbot step to send this data from the buildslave to the
> graphserver system.

we have similar code in place to submit leak info / codesighs already

> 2. Modify the graphserver to accept this value from the builders (jmaher
> and/or rhelmer - can you file the database modification bug needed for this?)

> 3. Ensure the networking flows are in place between the builders and the
> graphserver systems (I can file this but I need the vlan numbers for all the
> releng builders - I'm sort of assuming they are separate from the test
> slaves - if they are on the same vlan as the slaves then maybe we don't need
> this).

no need - we already submit info to graph server from the build machines

> 4. Update the script for the dev.treemanagement auto-emailer to send this
> new data. Catlee, can you take this? You're the only person I know of that
> knows where the code for that mailer is and how to re-deploy a new version
> of it.

it will automatically get picked up

Comment 15

•

12 years ago

it looks like we just need to solve 2 things:
1. Write buildbot step to send this data from the buildslave to the graphserver system.
2. Modify the graphserver to accept this value from the builders (jmaher and/or rhelmer - can you file the database modification bug needed for this?)

I can work on the graph server database mods.  What is the name we want to use for this test?  'libxul_link'?

cmtalbert

Comment 16

•

12 years ago

(In reply to Joel Maher (:jmaher) from comment #15)
> I can work on the graph server database mods.  What is the name we want to
> use for this test?  'libxul_link'?

Thanks joel, that works for me.  Do you also have to add all the machine names for the builders or are the machine names in the graphserver db populated at run-time?

Chris AtLee [:catlee]

Comment 17

•

12 years ago

For whatever reason, build metrics like this use a generic platform name as the machine name, e.g. http://hg.mozilla.org/graphs/file/2018284ed6e7/sql/data.sql#l1171

so you can use those names (with "_leak_test" == debug), and just add the new test to the database.

Comment 18

•

12 years ago

Attached patch graphserver definition for libxul_link (1.0) — Details — Splinter Review

Attachment #622059 - Flags: review?(rhelmer)

Robert Helmer [:rhelmer]

Updated

•

12 years ago

Attachment #622059 - Flags: review?(rhelmer) → review+

Comment 19

•

12 years ago

landed graph server definition: http://hg.mozilla.org/graphs/rev/6bc547cd2202

Updated

•

12 years ago

Depends on: 753767

Comment 20

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+1] from comment #6)
> 2012-02-27: 2785.78 MB / 2921103360 bytes (inbound rev 0714ec049da2)
> 
> (Down 116 MB from 4 weeks ago, due to the MSVC 2010 switch)

2012-09-28: 3,217.13 MB / 3373408256 bytes (inbound rev 938e09d5a465)

Comment 21

•

12 years ago

So, we're at 3.4 gigs.  Any chance we can get those graphs before The Next Big Surprise?  :-)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 22

•

12 years ago

Could this be fixed by tbpl turning orange after a certain threshold?
Could a tool be written to  grab the information and create a graph?

Comment 23

•

12 years ago

Well TBPL already turns red after a certain threshold.  The point is to see what is causing the increases, not when we pass some arbitrary threshold.

Comment 24

•

12 years ago

The easiest way I think is to submit this data to the graph server and graph it there.  The point here is knowing when the problem is approaching before it does.

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 25

•

12 years ago

Oh!
It seems we only do TinderboxPrint!

Anyone know what is required to post the data in the graphs server?

I can see that this was added:
insert into tests values (NULL,"libxul_link","LibXUL Memory during link",0,1,NULL);
and IT run it on the DB.

Comment 26

•

12 years ago

Armen, look for usage of GraphServerPost in buildbotcustom/process/factory.py.

Assignee

Updated

•

12 years ago

Component: Release Engineering → Release Engineering: Automation (General)

QA Contact: catlee

Comment 27

•

12 years ago

(In reply to Ed Morley [:edmorley UTC+0] from comment #20)
> (In reply to Ed Morley [:edmorley UTC+1] from comment #6)
> > 2012-02-27: 2785.78 MB / 2921103360 bytes (inbound rev 0714ec049da2)
> > 
> > (Down 116 MB from 4 weeks ago, due to the MSVC 2010 switch)
> 
> 2012-09-28: 3,217.13 MB / 3373408256 bytes (inbound rev 938e09d5a465)

2012-01-07: 3,701.80 MB / 3881619456 bytes (m-c rev 795632f0e4fe)

500MB more in 3 months!

We need to fix this sooner rather than later (and ideally import all the old nightly figures from the logs, so we can more easily see what has bumped it up so much).

At current rate of increase we have only a couple of months before we hit this again.

Keywords: sheriffing-P1

Comment 28

•

12 years ago

Sorry, s/2012-01-07/2013-01-07/

Comment 29

•

12 years ago

(In reply to comment #27)
> (In reply to Ed Morley [:edmorley UTC+0] from comment #20)
> > (In reply to Ed Morley [:edmorley UTC+1] from comment #6)
> > > 2012-02-27: 2785.78 MB / 2921103360 bytes (inbound rev 0714ec049da2)
> > > 
> > > (Down 116 MB from 4 weeks ago, due to the MSVC 2010 switch)
> > 
> > 2012-09-28: 3,217.13 MB / 3373408256 bytes (inbound rev 938e09d5a465)
> 
> 2012-01-07: 3,701.80 MB / 3881619456 bytes (m-c rev 795632f0e4fe)
> 
> 500MB more in 3 months!
> 
> We need to fix this sooner rather than later (and ideally import all the old
> nightly figures from the logs, so we can more easily see what has bumped it up
> so much).
> 
> At current rate of increase we have only a couple of months before we hit this
> again.

So what parts of the WebRTC code ended up going into libxul?

Randell Jesup [:jesup] (needinfo me)

Updated

•

12 years ago

Flags: needinfo?(rjesup)

Comment 30

•

12 years ago

media/webrtc went in (/signaling had to be in, and it references many things in /trunk).  We could in the future (especially as the code gets more locked down) probably move trunk to gkmedia, and deal with adding a lot of symbols to symbols.def.in (or finding a better way to deal with that!)  The decision IIRC was to allow those two to come in, but leave the rest in gkmedia.

Signaling landed in m-c at FF 18th uplift, around Oct 6th.  Of the top of my head, I can't remember if media/webrtc/trunk was in gkmedia before that (I think it was), so both moved to libxul around then.

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 31

•

12 years ago

OK, I filed bug 827985 to move that code out of libxul.  Thanks for the clarification.

Ed, this needs to be treated with utmost priority.  Who should work on the graphing thing?

Comment 32

•

12 years ago

FYI, per above on 9/28 were were 3.2G, 11/2 we were at 3.4G (after signaling landed), now we're at 3.7G.  So the 500MB since 11/2 is NOT webrtc.  And I think (thinking back) that webrtc/trunk was in xul before 10/6; if so we took a (guess) 50-100MB hit for signaling, and likely webrtc has contributed little since then.

Can you run a PGO --disable-webrtc build and report the number?  I can't build on windows currently - thanks Microsoft!!  --disable-webrtc will be an significant over-estimate of what you'll get back (as signaling won't get compiled).

Comment 33

•

12 years ago

(In reply to comment #32)
> FYI, per above on 9/28 were were 3.2G, 11/2 we were at 3.4G (after signaling
> landed), now we're at 3.7G.  So the 500MB since 11/2 is NOT webrtc.  And I
> think (thinking back) that webrtc/trunk was in xul before 10/6; if so we took a
> (guess) 50-100MB hit for signaling, and likely webrtc has contributed little
> since then.

It doesn't matter how much we're going to win from this, we need to move all of the code that we can outside of libxul, and the WebRTC stuff is just part of it.

> Can you run a PGO --disable-webrtc build and report the number?  I can't build
> on windows currently - thanks Microsoft!!  --disable-webrtc will be an
> significant over-estimate of what you'll get back (as signaling won't get
> compiled).

I only have VS2012, so the numbers that I get will not be representative (I actually don't know how to run the linker vmem usage measurement script locally.)  That being said, you can push to try to get the numbers, but like I said it doesn't matter much, we need *all* of the wins that we can get.

Assignee

Comment 34

•

12 years ago

Let me try to help with this.

Assignee: nobody → armenzg

(not currently active) Ted Mielczarek

Assignee

Comment 35

•

12 years ago

I know that the following file contains the value that we need to post:
obj-firefox\toolkit\library\linker-vsize

We could add code in here:
http://hg.mozilla.org/mozilla-central/file/0faa1d47ea80/build/link.py#l19
and post to the graph server.

Or we can add a new post compilation step on buildbot to post it.

I will look in the releng side since we already have some GraphServer logic.

Reporter

Comment 36

•

12 years ago

Right, the file contains the info, and it's also output to stdout in the build step in a line starting with "TinderboxPrint: linker max vsize:". You should use whichever is easier.

Chris Cooper [:coop] (he/him)

Assignee

Comment 37

•

12 years ago

Attached patch Add win64 slaves to the graph server — Details — Splinter Review

Attachment #702314 - Flags: review?(coop)

Updated

•

12 years ago

Attachment #702314 - Flags: review?(coop) → review+

Assignee

Updated

•

12 years ago

Attachment #702314 - Flags: checked-in+

Assignee

Comment 38

•

12 years ago

Attached patch post vsize (obsolete) — Details — Splinter Review

Attachment #703578 - Flags: review?(bhearsum)

(not currently active) Ted Mielczarek

Assignee

Comment 39

•

12 years ago

I have my first data point on staging:
http://graphs.allizom.org/graph.html#tests=[[205,6,8]]

At what point is this going to blow up? What is the upper limit?

With regards to producing a graph I would suggest to book a project branch and request releng to add PGO jobs to it. At that point I would suggest creating a list of changesets that we want data points for and trigger PGO builds.

Once bhearsum reviews this and we land it we will start having the data points.

(In reply to Ed Morley (Away 18th-20th Jan) [:edmorley UTC+0] from comment #27)
> (In reply to Ed Morley [:edmorley UTC+0] from comment #20)
> > (In reply to Ed Morley [:edmorley UTC+1] from comment #6)
> > > 2012-02-27: 2785.78 MB / 2921103360 bytes (inbound rev 0714ec049da2)
> > > (Down 116 MB from 4 weeks ago, due to the MSVC 2010 switch)
> > 2012-09-28: 3,217.13 MB / 3373408256 bytes (inbound rev 938e09d5a465)
> 2012-01-07: 3,701.80 MB / 3881619456 bytes (m-c rev 795632f0e4fe)
> 500MB more in 3 months!
2012-01-18: 3756.02 /3938476032 bytes (m-c rev b52c02f77cf5)

Whiteboard: [graphserver][pgo] → [graphserver][pgo] 2012-01-18 --> linker max vsize:3756.02MB

Reporter

Comment 40

•

12 years ago

The blowup point is somewhere near 4GB (the total amount of virtual memory available to a 32-bit process running on Windows x64), but we don't know exactly where. Essentially once the linker tries to allocate more virtual memory and it runs out it will blow up, but that last allocation could be fairly large, it's hard to tell.

Comment 41

•

12 years ago

(In reply to comment #39)
> With regards to producing a graph I would suggest to book a project branch and
> request releng to add PGO jobs to it. At that point I would suggest creating a
> list of changesets that we want data points for and trigger PGO builds.

Who would own updating the branch though?  Why can't we just get the graphs for inbound and central?

Assignee

Comment 42

•

12 years ago

(In reply to :Ehsan Akhgari from comment #41)
> (In reply to comment #39)
> > With regards to producing a graph I would suggest to book a project branch and
> > request releng to add PGO jobs to it. At that point I would suggest creating a
> > list of changesets that we want data points for and trigger PGO builds.
> 
> Who would own updating the branch though?  Why can't we just get the graphs
> for inbound and central?

I thought you guys mentioned that you wanted to get some history to see what things increased the memory usage. To build up history we would need to setup a project branch and trigger old changesets.

As soon as we land the patch we will get coverage on all branches that have pgo enabled from there on.

Comment 43

•

12 years ago

(In reply to comment #42)
> (In reply to :Ehsan Akhgari from comment #41)
> > (In reply to comment #39)
> > > With regards to producing a graph I would suggest to book a project branch and
> > > request releng to add PGO jobs to it. At that point I would suggest creating a
> > > list of changesets that we want data points for and trigger PGO builds.
> > 
> > Who would own updating the branch though?  Why can't we just get the graphs
> > for inbound and central?
> 
> I thought you guys mentioned that you wanted to get some history to see what
> things increased the memory usage. To build up history we would need to setup a
> project branch and trigger old changesets.

I don't see why.  We do have the data in the old logs, right?  We should just be able to write a script to parse them out or something.  Am I missing something?

Assignee

Comment 44

•

12 years ago

(In reply to :Ehsan Akhgari from comment #43)
> (In reply to comment #42)
> > (In reply to :Ehsan Akhgari from comment #41)
> > > (In reply to comment #39)
> > > > With regards to producing a graph I would suggest to book a project branch and
> > > > request releng to add PGO jobs to it. At that point I would suggest creating a
> > > > list of changesets that we want data points for and trigger PGO builds.
> > > 
> > > Who would own updating the branch though?  Why can't we just get the graphs
> > > for inbound and central?
> > 
> > I thought you guys mentioned that you wanted to get some history to see what
> > things increased the memory usage. To build up history we would need to setup a
> > project branch and trigger old changesets.
> 
> I don't see why.  We do have the data in the old logs, right?  We should
> just be able to write a script to parse them out or something.  Am I missing
> something?

Good point! That would save lots of time.

Comment 45

•

11 years ago

Comment on attachment 703578 [details] [diff] [review]
post vsize

Review of attachment 703578 [details] [diff] [review]:
-----------------------------------------------------------------

::: process/factory.py
@@ +1344,5 @@
>                      data=WithProperties('TinderboxPrint: num_ctors: %(num_ctors:-unknown)s'),
>                      ))
>  
> +    def addPostBuildSteps(self):
> +        if self.profiledBuild and self.platform in ('win32',) and self.baseName:

Please add an explicit flag to MercurialBuildFactory/config.py for this rather than guessing based on 3 different things.

@@ +1356,5 @@
> +                    return {'testresults': []}
> +
> +            self.addStep(SetProperty(
> +                name='get_linker_vsize',
> +                command=['cat', '%s\\toolkit\\library\\linker-vsize' % self.mozillaObjdir],

Why the '\\'? All of other steps use / without issue.

Attachment #703578 - Flags: review?(bhearsum) → review-

Assignee

Comment 46

•

11 years ago

closed trees: bug 832992 :(

Comment 47

•

11 years ago

(In reply to Armen Zambrano G. [:armenzg] from comment #46)
> closed trees: bug 832992 :(

well, in the end that may be just a disk space issue, though we are slowly getting near the limit (now at 3939495936).

Whiteboard: [graphserver][pgo] 2012-01-18 --> linker max vsize:3756.02MB → [graphserver][pgo] 2012-01-18 --> linker max vsize:3757MB

Updated

•

11 years ago

Whiteboard: [graphserver][pgo] 2012-01-18 --> linker max vsize:3757MB → [graphserver][pgo] 2012-01-21 --> linker max vsize:3757MB

Comment 48

•

11 years ago

I'm gonna call this a blocker this time.

Severity: major → blocker

Comment 49

•

11 years ago

May this be added to the releng Q1 goals please, we keep hitting the problem, and while there isn't a clear solution to it, this is the only way we have to track its evolution.

Comment 50

•

11 years ago

This bug is in progress, adding it to the goals list won't make it happen any faster. Armen had a working implementation, it just needs small tweaks before it can be landed.

Updated

•

11 years ago

Blocks: 832992

Assignee

Comment 51

•

11 years ago

Attached patch post vsize (obsolete) — Details — Splinter Review

dump_masters shows that this gets added for every PGO and WINNT nightly builds

Attachment #703578 - Attachment is obsolete: true

Attachment #704904 - Flags: review?(bhearsum)

Assignee

Comment 52

•

11 years ago

Attached patch do post build steps config changes (obsolete) — Details — Splinter Review

Attachment #704905 - Flags: review?(bhearsum)

Comment 53

•

11 years ago

For historic values see:
https://blog.mozilla.org/nfroyd/2013/01/22/analyzing-linker-max-vsize/

Nathan, don't suppose you could attach the raw values, so we can backfill the gap on graphs.m.o?

Nathan Froyd [:froydnj]

Comment 54

•

11 years ago

Attached file historic linker vsize values from january 2012 onward — Details

Sure, Ed, no problem.  Here's the file I used; the format is:

<build-date> <hg-revision> <linker-vsize>

The data doesn't perfectly capture the hg revision for every log file, but the number of points that it missed was small enough that I wasn't going to worry about it.

Nathan Froyd [:froydnj]

Comment 55

•

11 years ago

Attached file script to extract hg revisions + linker vsize from log files — Details

...and for reference, here's the script I used to generate the previous file.  The script expects the names of the log files to start with:

YYYY-MM-DD-HH-MM-SS

for the timestamp portion, but that's probably not hard to change.  Simply invoke:

extract-info <list-of-log-files>

Assignee

Comment 56

•

11 years ago

Comment on attachment 704904 [details] [diff] [review]
post vsize

Through IRC.

Attachment #704904 - Flags: review?(bhearsum) → review-

Assignee

Updated

•

11 years ago

Attachment #704905 - Flags: review?(bhearsum) → review-

Comment 57

•

11 years ago

Comment on attachment 704904 [details] [diff] [review]
post vsize

Sorry, Armen and I talked on IRC about this awhile ago but I forgot to update the bug:
13:11 < bhearsum> armenzg: i meant that we should have a flag for 'post_linker_size' or something, not 'do_post_build_steps'
13:11 < bhearsum> i want this line gone:
13:11 < bhearsum>  if self.profiledBuild and self.platform in ('win32',) and self.baseName:
13:11 < bhearsum> because it guesses about what should happen
13:11 < bhearsum> that can be replaced with if self.postLinkerSize
13:12 < armenzg> k

Assignee

Comment 58

•

11 years ago

Attached patch [buildbotcustom] do post vsize — Details — Splinter Review

Attachment #705069 - Flags: review?(bhearsum)

Assignee

Comment 59

•

11 years ago

Attached patch do post build steps config changes — Details — Splinter Review

Attachment #704905 - Attachment is obsolete: true

Attachment #705072 - Flags: review?(bhearsum)

Assignee

Updated

•

11 years ago

Attachment #705069 - Attachment description: do post build steps config changes → [buildbotcustom] do post vsize

Assignee

Updated

•

11 years ago

Attachment #704904 - Attachment is obsolete: true

Updated

•

11 years ago

Attachment #705069 - Flags: review?(bhearsum) → review+

Updated

•

11 years ago

Attachment #705072 - Flags: review?(bhearsum) → review+

Assignee

Updated

•

11 years ago

Attachment #705069 - Flags: checked-in+

Assignee

Updated

•

11 years ago

Attachment #705072 - Flags: checked-in+

Kim Moir [:kmoir] ET

Comment 60

•

11 years ago

in production

Nick Thomas [:nthomas] (UTC+12)

Comment 61

•

11 years ago

(In reply to comment #60)
> in production

Where can the graphs be found?

Phil Ringnalda (:philor)

Updated

•

11 years ago

Depends on: 833653

Comment 62

•

11 years ago

Comment on attachment 705072 [details] [diff] [review]
do post build steps config changes

Reverted this for bustage in bug 833653.

default:    http://hg.mozilla.org/build/buildbot-configs/rev/df9a319c5edd
production: http://hg.mozilla.org/build/buildbot-configs/rev/0dcbc3ce69f9

Attachment #705072 - Flags: checked-in+ → checked-in-

Assignee

Comment 63

•

11 years ago

Attached patch add BuildInfoSteps — Details — Splinter Review

I don't know at which point we lost this line on the patch.
This gets the sourcestamp info missing.

Attachment #705425 - Flags: review?(bhearsum)

Updated

•

11 years ago

Attachment #705425 - Flags: review?(bhearsum) → review+

Updated

•

11 years ago

Blocks: 833881

Assignee

Comment 64

•

11 years ago

I will land and reconfig in the morning.

Comment 65

•

11 years ago

(In reply to comment #64)
> I will land and reconfig in the morning.

Thanks Armen!

Can you please also let me know how much historical data we can get out of this on each of central and inbound?  It would be absolutely amazing if we can get per-checkin data for the interesting ranges highlighted in <https://blog.mozilla.org/nfroyd/2013/01/22/analyzing-linker-max-vsize/>.

Comment 66

•

11 years ago

(In reply to :Ehsan Akhgari from comment #65)
> Can you please also let me know how much historical data we can get out of
> this on each of central and inbound?  It would be absolutely amazing if we
> can get per-checkin data for the interesting ranges highlighted in
> <https://blog.mozilla.org/nfroyd/2013/01/22/analyzing-linker-max-vsize/>.

Per push logs are only kept for 30days, so try runs will be required.

Assignee

Comment 67

•

11 years ago

(In reply to Ed Morley [:edmorley UTC+0] from comment #66)
> (In reply to :Ehsan Akhgari from comment #65)
> > Can you please also let me know how much historical data we can get out of
> > this on each of central and inbound?  It would be absolutely amazing if we
> > can get per-checkin data for the interesting ranges highlighted in
> > <https://blog.mozilla.org/nfroyd/2013/01/22/analyzing-linker-max-vsize/>.
> 
> Per push logs are only kept for 30days, so try runs will be required.

Would a project branch be more interesting for this project?

(In reply to :Ehsan Akhgari from comment #65)
> (In reply to comment #64)
> > I will land and reconfig in the morning.
> 
> Thanks Armen!
> 
> Can you please also let me know how much historical data we can get out of
> this on each of central and inbound?  It would be absolutely amazing if we
> can get per-checkin data for the interesting ranges highlighted in
> <https://blog.mozilla.org/nfroyd/2013/01/22/analyzing-linker-max-vsize/>.

What edmorley says is correct. I think selfserve would be needed for this:

Comment 68

•

11 years ago

(In reply to comment #66)
> (In reply to :Ehsan Akhgari from comment #65)
> > Can you please also let me know how much historical data we can get out of
> > this on each of central and inbound?  It would be absolutely amazing if we
> > can get per-checkin data for the interesting ranges highlighted in
> > <https://blog.mozilla.org/nfroyd/2013/01/22/analyzing-linker-max-vsize/>.
> 
> Per push logs are only kept for 30days, so try runs will be required.

OK.  so I guess we can explore that path when we need to.  Thanks!

Comment 69

•

11 years ago

(In reply to comment #67)
> (In reply to Ed Morley [:edmorley UTC+0] from comment #66)
> > (In reply to :Ehsan Akhgari from comment #65)
> > > Can you please also let me know how much historical data we can get out of
> > > this on each of central and inbound?  It would be absolutely amazing if we
> > > can get per-checkin data for the interesting ranges highlighted in
> > > <https://blog.mozilla.org/nfroyd/2013/01/22/analyzing-linker-max-vsize/>.
> > 
> > Per push logs are only kept for 30days, so try runs will be required.
> 
> Would a project branch be more interesting for this project?

Not sure how that would help?

Assignee

Comment 70

•

11 years ago

(In reply to :Ehsan Akhgari from comment #69)
> (In reply to comment #67)
> > (In reply to Ed Morley [:edmorley UTC+0] from comment #66)
> > > (In reply to :Ehsan Akhgari from comment #65)
> > > > Can you please also let me know how much historical data we can get out of
> > > > this on each of central and inbound?  It would be absolutely amazing if we
> > > > can get per-checkin data for the interesting ranges highlighted in
> > > > <https://blog.mozilla.org/nfroyd/2013/01/22/analyzing-linker-max-vsize/>.
> > > 
> > > Per push logs are only kept for 30days, so try runs will be required.
> > 
> > Would a project branch be more interesting for this project?
> 
> Not sure how that would help?

Turn around is faster.
Other pgo jobs (not related to this data gathering) could be posting numbers in the try graph and would polluting the graph. Even though I can't see any PGO that was triggered by a dev today.

On the other hand, customizing a branch to only do Windows PGO without test jobs could add some overhead to setup.

I think either way is fine.
Assuming that it works for the try server. I will try one build after I reconfig in the morning.

Comment 71

•

11 years ago

(In reply to comment #70)
> (In reply to :Ehsan Akhgari from comment #69)
> > (In reply to comment #67)
> > > (In reply to Ed Morley [:edmorley UTC+0] from comment #66)
> > > > (In reply to :Ehsan Akhgari from comment #65)
> > > > > Can you please also let me know how much historical data we can get out of
> > > > > this on each of central and inbound?  It would be absolutely amazing if we
> > > > > can get per-checkin data for the interesting ranges highlighted in
> > > > > <https://blog.mozilla.org/nfroyd/2013/01/22/analyzing-linker-max-vsize/>.
> > > > 
> > > > Per push logs are only kept for 30days, so try runs will be required.
> > > 
> > > Would a project branch be more interesting for this project?
> > 
> > Not sure how that would help?
> 
> Turn around is faster.
> Other pgo jobs (not related to this data gathering) could be posting numbers in
> the try graph and would polluting the graph. Even though I can't see any PGO
> that was triggered by a dev today.

Hmm, good point.  But that also takes away the ability of just pushing new heads and get builds on them, right?  I mean, we would need to push new heads in the right order, right?

> On the other hand, customizing a branch to only do Windows PGO without test
> jobs could add some overhead to setup.

Ouch.

> I think either way is fine.
> Assuming that it works for the try server. I will try one build after I
> reconfig in the morning.

Thanks, that's a good idea regardless.

Assignee

Updated

•

11 years ago

Attachment #705425 - Flags: checked-in+

Assignee

Updated

•

11 years ago

Attachment #705072 - Flags: checked-in- → checked-in+

Assignee

Comment 72

•

11 years ago

The good news is that this is live.
The original purpose of the bug is fulfilled (as I understand it).

The bad news is that there is no way to trigger PGO builds on try.
Developers change the mozconfig but that does not trigger the PGO/try builders.
Pushing to try as PGO would print the linker size but it won't post to the try server (as before).

Booking a project branch with PGO would give the ability to push changesets in a chronological order but care would be needed to not prevent coallescing from happening (perhaps this can be configured on our side to be prevented - I don't know if it is easy).

Is it good enough to have data points on the graphs DB from here on?

IIUC there are means to gather historical data by pushing to try and scrapping the linker size.

Comment 73

•

11 years ago

(In reply to comment #72)
> The good news is that this is live.
> The original purpose of the bug is fulfilled (as I understand it).
> 
> The bad news is that there is no way to trigger PGO builds on try.
> Developers change the mozconfig but that does not trigger the PGO/try builders.
> Pushing to try as PGO would print the linker size but it won't post to the try
> server (as before).
> 
> Booking a project branch with PGO would give the ability to push changesets in
> a chronological order but care would be needed to not prevent coallescing from
> happening (perhaps this can be configured on our side to be prevented - I don't
> know if it is easy).
> 
> Is it good enough to have data points on the graphs DB from here on?

It's good but definitely not enough.

So the first step is to parse through the PGO logs for the past 30 days, and also nightly logs for as long as we have them stored, and report them to the graph server associated with the correct date and changeset.  Then, I guess we'll need to fill in the gaps for the individual changesets in the spikes that we've seen in Nathan's analysis.  That would help us experiement with the possibility of finding culprit changesets which have added the most to the linker memory usage and see how we can deal with that.

That all being said, gathering detailed historical data only matters if we decide to keep PGO enabled and try to keep the linker memory usage bounded, which is a call that we have not made yet.  We need more of the dependencies of bug 833881 to be resolved before we can make a meaningful decision on that.  If we do decide to keep PGO enabled, I'll file another bug in the RelEng component to gather more historical data.

Last but not least, thanks everyone for your help here, really appreciated!  :-)

Nick Thomas [:nthomas] (UTC+12)

Assignee

Comment 74

•

11 years ago

I thought my reconfig this morning would have done the trick but it seems that when a change is backed out from both branches ("production" and "default") then, the typical land to default and merge to production misses the change [1]. I've seen this happen a couple of times in the past.

I landed it again (on production) and reconfigured the build masters again:
http://hg.mozilla.org/build/buildbot-configs/rev/92846acd0ba5

I re-triggered a second pgo in here that should be successful:
https://tbpl.mozilla.org/?jobname=WINNT%205.2%20mozilla-central%20pgo-build&rev=680e46fecff0

[1] http://hg.mozilla.org/build/buildbot-configs/graph

Comment 75

•

11 years ago

(In reply to Armen Zambrano G. [:armenzg] from comment #72)
> Booking a project branch with PGO would give the ability to push changesets
> in a chronological order but care would be needed to not prevent coallescing
> from happening (perhaps this can be configured on our side to be prevented -
> I don't know if it is easy).

For a project branch you could probably just use self-serve force pgo builds on a revision, so no need to push. That assumes disabling merging is easily done to speed the process up. Whether the history is easily transferable to the m-c branch in the graph server is another question.

Comment 76

•

11 years ago

(In reply to comment #75)
> (In reply to Armen Zambrano G. [:armenzg] from comment #72)
> > Booking a project branch with PGO would give the ability to push changesets
> > in a chronological order but care would be needed to not prevent coallescing
> > from happening (perhaps this can be configured on our side to be prevented -
> > I don't know if it is easy).
> 
> For a project branch you could probably just use self-serve force pgo builds on
> a revision, so no need to push. That assumes disabling merging is easily done
> to speed the process up. Whether the history is easily transferable to the m-c
> branch in the graph server is another question.

Hmm, I'm not quite sure what that exactly means...

Phil Ringnalda (:philor)

Updated

•

11 years ago

Depends on: 834596

Assignee

Comment 77

•

11 years ago

(In reply to :Ehsan Akhgari from comment #76)
> (In reply to comment #75)
> > (In reply to Armen Zambrano G. [:armenzg] from comment #72)
> > > Booking a project branch with PGO would give the ability to push changesets
> > > in a chronological order but care would be needed to not prevent coallescing
> > > from happening (perhaps this can be configured on our side to be prevented -
> > > I don't know if it is easy).
> > 
> > For a project branch you could probably just use self-serve force pgo builds on
> > a revision, so no need to push. That assumes disabling merging is easily done
> > to speed the process up. Whether the history is easily transferable to the m-c
> > branch in the graph server is another question.
> 
> Hmm, I'm not quite sure what that exactly means...

I think what nthomas is suggesting is to trigger PGO builds on a project branch which would add data points for that branch on the graph server. We could then ask a DBA to transfer the data points to the mozilla-central records.
One note is that tbpl might not show up any jobs since the changesets are from the past.

Assignee

Comment 78

•

11 years ago

We are now getting data points:
http://graphs.mozilla.org/graph.html#tests=[[205,63,8]]&sel=none&displayrange=7&datatype=running

mozilla-central will show data point in the next few hours.
I missed inserting the machine name on the graphs DB. This has now been fixed.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED