Closed Bug 582246 Opened 13 years ago Closed 4 years ago

Tinderbox is too slow

Categories

(Webtools Graveyard :: Tinderbox, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jrmuizel, Unassigned)

References

(Depends on 1 open bug)

Details

I often get "Parsing Tinderbox failed" when tinderbox pushlog tries to load the json for the previous date range. This means I can't get the results of my builds after a couple of hours.
(In reply to comment #1)
> For example:
> 
> time wget http://tinderbox.mozilla.org/MozillaTry/json.js?_=1280246075648
> 
> takes about 5s
> 
> time wget
> "http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTry&json=1&maxdate=1280246109&hours=24&_=1280246109357"
> 
> takes over 1 minute

The first URL uses a cached file (json.js) which updates every time new data comes into Tinderbox. The second loads up all the data files (which can be quite large if old data is not expunged regularly, not sure if that's automated but it used to be manual).
(In reply to comment #2)
> The first URL uses a cached file (json.js) which updates every time new data
> comes into Tinderbox. The second loads up all the data files (which can be
> quite large if old data is not expunged regularly, not sure if that's automated
> but it used to be manual).

So what can we do to improve the response time?
(In reply to comment #3)
> (In reply to comment #2)
> > The first URL uses a cached file (json.js) which updates every time new data
> > comes into Tinderbox. The second loads up all the data files (which can be
> > quite large if old data is not expunged regularly, not sure if that's automated
> > but it used to be manual).
> 
> So what can we do to improve the response time?

It's been a few years since I worked on it, but I suspect that showbuilds.cgi is not very efficient at going back in time. Each tree has a set of separate data files that need to be loaded on each CGI request, and these tend to get quite large. I took a quick look at the functions which do this and they don't appear to use an index or anything like that, so I think that for each request like this a handful of files are being opened and grepped and loaded into memory.

I have a dev VM with Tinderbox server in it and it's quite fast at doing this type of operation, so I suspect that pruning history much more frequently would be a good quick fix. Filing a bug in server-ops to find out how often this happens now and talk about changing it, at least on the staging server to start, would be a good way to test my assumption.

I've seen bugs about Tinderbox load problems lately, this is probably a factor too (although perhaps these problems are related to more people using the history feature via JSON due to the issues I am guessing about above).
Longer-term I think that a storage mechanism with reasonable indexing and partitioning is a better way to make this fast. 

Buildbot has bugs to serve logs and JSON directly which I think is not that far off, and getting TBPL integrated with these is probably worthy of more attention than Tinderbox server. I am probably veering outside of scope for this bug though so I'll stop short of enumerating these here.

I just want to point out that "replace Tinderbox server" would be one way to fix "Tinderbox is slow", and my opinion is that it would be less work to replace than to bolt on the infrastructure changes Tinderbox server would need to solve the deep underlying scalability issues.

However I am not against trying, or someone doing this large amount of work, of course.
We are doing some profiling on Tinderbox starting with showbuilds.cgi and have already spotted a couple (hopefully) simple ways to improve the performance here (bug 585814 comment 6). A side-effect of this would remove a lot of redundant data from the JSON output, and bug 399190 would make it valid JSON to boot.

Also see bug 585691. I probably should have put bug 585691 comment 1 and bug 585691 comment 2 in this bug instead; here is a synopsis:

1) there is a patch to greatly speed up showlog.cgi by caching it https://bug390341.bugzilla.mozilla.org/attachment.cgi?id=446579

2) #1-style caching could be done for showbuilds.cgi, to store historical copies of json.js
Depends on: 585876
Depends on: 399190
Depends on: 585700, 585691
Depends on: 583098
Depends on: 585814
Depends on: 586123
There is a new JSON method in Tinderbox now that should be valid JSON and should generate faster (still testing this):

Dynamic (generated at request time, responds to mindate/maxdate):
http://tinderbox-stage.mozilla.org/showbuilds.cgi?tree=Firefox&json2=1

Static (generated updated on build/addnote):
http://tinderbox-stage.mozilla.org/Firefox/json2.js
(In reply to comment #1)
> For example:
> 
> time wget http://tinderbox.mozilla.org/MozillaTry/json.js?_=1280246075648
> 
> takes about 5s
> 
> time wget
> "http://tinderbox.mozilla.org/showbuilds.cgi?tree=MozillaTry&json=1&maxdate=1280246109&hours=24&_=1280246109357"
> 
> takes over 1 minute

Does wget do "Accept-encoding: gzip" by default? That should reduce the transfer time quite a bit.

Anyway JSON1 versus JSON2 on tinderbox-stage (which should *mostly* have production data, not notes though I think?):

$ curl -H 'Accept-encoding:gzip' 'http://tinderbox-stage.mozilla.org/showbuilds.cgi?tree=MozillaTry&json2=1&maxdate=1280246109&hours=24&_=1280246109357' > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100 83532    0 83532    0     0  13464      0 --:--:--  0:00:06 --:--:-- 20338
-----
$ curl -H 'Accept-encoding:gzip' 'http://tinderbox-stage.mozilla.org/showbuilds.cgi?tree=MozillaTry&json=1&maxdate=1280246109&hours=24&_=1280246109357' > /dev/null
  % Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
100  204k    0  204k    0     0  19559      0 --:--:--  0:00:10 --:--:-- 58306
-----

Over several runs tinderbox-stage json2 is about 6s, json about 10s.

Production Tinderbox took about 30 seconds, this could be due to load or other factors that I don't have insight into right now (like lots of note data).

Filed bug 586123 to ask TBPL to use this new feed, and help test staging.
(In reply to comment #8)
> Production Tinderbox took about 30 seconds, this could be due to load or other
> factors that I don't have insight into right now (like lots of note data).

Lots of variance here, sometimes takes several minutes. Tinderbox-stage is relatively unused and is pretty stable in my testing so far, I suspect overall server load (switching to json2 should help but may not be a cure-all).
Product: Webtools → Webtools Graveyard
Tinderbox isn't maintained anymore. Closing.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.