Closed Bug 718632 Opened 8 years ago Closed 8 years ago

TBPL should prefetch logs

Categories

(Tree Management Graveyard :: TBPL, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: justin.lebar+bug, Assigned: Swatinem)

References

Details

(Whiteboard: [sheriff-want])

Attachments

(7 files, 5 obsolete files)

14.63 KB, patch
mstange
: review+
Details | Diff | Splinter Review
1.78 KB, patch
emorley
: review+
Details | Diff | Splinter Review
3.11 KB, patch
Swatinem
: review+
Details | Diff | Splinter Review
3.66 KB, patch
Swatinem
: review+
Details | Diff | Splinter Review
979 bytes, patch
emorley
: review+
Details | Diff | Splinter Review
893 bytes, patch
Swatinem
: review+
Details | Diff | Splinter Review
1.30 KB, patch
Details | Diff | Splinter Review
In bug 717005 comment 31, I wrote:

> Is there any reason we can't aggressively prefetch all the logs, as soon
> as the build finishes?  Or, if not, just the non-green logs, as soon as we
> know the build is non-green?
> 
> If not, a "fetch all teh orange!" button might be helpful.  We can, of
> course, do this as a separate bug.

This change would make TBPL many times more pleasant to use!
Depends on: 717005
Attached patch patch (obsolete) — Splinter Review
This has been a lot easier than I thought.
I need some comments on the IPC though.
Assignee: nobody → arpad.borsos
Status: NEW → ASSIGNED
Attachment #595478 - Flags: review?(mstange)
Attachment #595478 - Flags: review?(ehsan)
Comment on attachment 595478 [details] [diff] [review]
patch

Review of attachment 595478 [details] [diff] [review]:
-----------------------------------------------------------------

Neat!
Attachment #595478 - Flags: review?(ehsan) → review+
Comment on attachment 595478 [details] [diff] [review]
patch

Looks great to me. How much longer does the script now take to run, usually?
Attachment #595478 - Flags: review?(mstange) → review+
Well that is a problem, really.
I had started the process on my server and after 2 hours, it was still not finished prefetching the logs for ~ 5000 runs.
Things got worse: I have no lock file for the cron job, so every 5 minutes, a new job was spawned, blocked on the log processing of the first one. (each import job spawned a php job that was busy-waiting for the first job to finish processing)
I was lucky that I checked on my server, otherwise it would have ended up in swap-hell.

philor measured how long it takes to generate one summary. Doing 5000 serially is an extreme performance killer.


So as far as I remember from a previous bug, the production tbpl actually has a lock on the importer so it’s not possible to start it while the first process is not finished yet. That would avoid the problem that I had. However, we still need to speed the process up a bit.
The subprocess.call as it is now is blocking, so just one single php worker is run at a single time. We could just use subprocess.Popen and leave it be. As far as I understand thats non-blocking. But that would lead to the case when we have 5000 parallel php workers that would swamp the machine just as easily.


I could try creating a configurable number (say 10) of python threads, each blocking on a php worker. I don’t know python that well, are lists threadsafe?
Any other ideas?
> Any other ideas?

You could prefetch only logs from orange builds?

> I don’t know python that well, are lists threadsafe?

No, although deques are.  http://docs.python.org/library/collections.html#deque-objects
Attached patch v2 with worker threads (obsolete) — Splinter Review
(In reply to Justin Lebar [:jlebar] from comment #5)
> > Any other ideas?
> 
> You could prefetch only logs from orange builds?

You are right. Clicking a green run is not very likely, there is no use in prefetching the tinderbox_print for that.

> > I don’t know python that well, are lists threadsafe?
> 
> No, although deques are. 
> http://docs.python.org/library/collections.html#deque-objects

Thanks for the pointer.
I now tried 10 python threads, each spawning and blocking on a php worker. I also added a timeout to the php scripts as I observed one of the workers going rogue and just sitting there for 15 minutes.

I did an incremental import with ~500 new runs (no idea how many of them were != 'success') and the import finished in 6 minutes with 10 workers. This should be better with more workers and better hardware (my vserver just has 512M RAM + 512 swap)

Note to self: make the worker count configurable

So we are slowly getting there. And I start liking python, although I wish tbpl had been done in lovely non-blocking node.js :-)
Attachment #595478 - Attachment is obsolete: true
Attachment #595558 - Flags: review?(mstange)
Attachment #595558 - Flags: review?(ehsan)
Comment on attachment 595558 [details] [diff] [review]
v2 with worker threads

Rather than using multiple Python threads, you could open multiple subprocesses in one thread and then use select() on them.

This shouldn't perform any worse, because Python has this Global Interpreter Lock which prevents multiple threads from running Python code concurrently.

That may or may not be easier than the current state, but might have less overhead if you want to bump up the concurrency.
I think the python code is really concise and readable this way.

But I’m still having problems with long-running php workers that delay the importer for too long. Any ideas?
> But I’m still having problems with long-running php workers that delay the importer for too long. 
> Any ideas?

Well, if you use select(), it's easy to time out a subprocess which has been running for too long.  :)
Can't we just spawn the subprocesses in the background and finish without waiting on them?  I can't see why we need to wait on the subprocesses.
(In reply to Ehsan Akhgari [:ehsan] from comment #10)
> Can't we just spawn the subprocesses in the background and finish without
> waiting on them?  I can't see why we need to wait on the subprocesses.

That way we can’t control how many they are. Like I said in comment 4, depending on the import size, this could be as much as 5000.

And here are a few numbers before I go to sleep:
Note: I have no idea how many of them are != 'success'

Inserted 632 new run entries.   
real    5m48.689s
user    2m29.890s
sys     0m23.040s

Inserted 200 new run entries.   
real    2m28.915s
user    0m36.920s
sys     0m4.550s
Keep in mind, while planning on fetching up the log the second that builds-4hr tells us about a build, that buildbot makes absolutely no promises about the log existing prior to telling us about the build.
(In reply to Phil Ringnalda (:philor) from comment #12)
> Keep in mind, while planning on fetching up the log the second that
> builds-4hr tells us about a build, that buildbot makes absolutely no
> promises about the log existing prior to telling us about the build.

Thats not a problem, we just throw when the log is not available on the ftp.
So I’ve added some instrumentation which logs something like this:

generating raw log (9203783): 2461ms
generating tinderbox_print log (9203783): 2706ms
fetching/waiting for raw log (9203783): 0ms
fetching bugs for "pseudo-element-of-native-anonymous.html" (9203889): 10995ms
generating general_error log (9203783): 617ms
fetching bugs for "spellcheck-textarea-attr-inherit.html" (9204044): 12022ms
fetching bugs for "browser_permissions.js" (9204008): 14296ms
fetching bugs for "poster-15.html" (9203921): 13554ms
generating annotatedsummary log (9203921): 99778ms
fetching bugs for "spellcheck-textarea-property-dynamic-override-inherit.html" (9204145): 12775ms
generating annotatedsummary log (9204145): 164062ms
(full output at http://tbpl.swatinem.de/dataimport/log.txt )

So fetching the log from ftp and log parsing in php is 2-3s.

The worst offender by far is bugzilla queries, I see 8-19 seconds there. According to the php manual, the max_execution_time only refers to script execution itself, not to stream operations. That explains why the processes run longer that 30s.

So what we can do here is to:
1. extend the ignore list of bug searches. I see "plain,", "html,<body>", "Main app process exited normally" and other things as search terms.
2. make a real bug cache. The one we have now, AnnotatedSummaryGenerator->bugsCache, is per-process so not a real cache at all. We could save the results in a new database table and clear entries that are older than, say 2 days and re-fetch them.
or 3. delegate this problem to bzapi or server ops to make bz queries faster and do the caching on their end.
Attached patch v3 (obsolete) — Splinter Review
I’m quite happy with the patch now:
- Nice logging ( http://tbpl.swatinem.de/dataimport/log.txt )
- A real db-based bug-cache
- select() and hopefully no deadlocks :-)
- configurable worker count and timeout

It’s still very slow on my machine, but with more workers it should be fine.
Attachment #595558 - Attachment is obsolete: true
Attachment #595558 - Flags: review?(mstange)
Attachment #595558 - Flags: review?(ehsan)
Attachment #595726 - Flags: review?(mstange)
Attachment #595726 - Flags: review?(ehsan)
Attachment #595726 - Flags: feedback?(justin.lebar+bug)
Comment on attachment 595726 [details] [diff] [review]
v3

Looks pretty sensible to me.

I dunno how happy the log server will be to be hammered like this on the occasion that a build turns everything orange.  I guess the oranges won't all come in at the same time.

Also, what about comment 12 -- will we retry in case the logs aren't there?
Attachment #595726 - Flags: feedback?(justin.lebar+bug) → feedback+
Comment on attachment 595726 [details] [diff] [review]
v3

Review of attachment 595726 [details] [diff] [review]:
-----------------------------------------------------------------

To be honest, this patch goes way beyond my Python knowledge.  Anyone else wanna give this a shot?
Attachment #595726 - Flags: review?(ehsan)
(In reply to Ehsan Akhgari [:ehsan] from comment #16)
> Comment on attachment 595726 [details] [diff] [review]
> v3
> 
> Review of attachment 595726 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> To be honest, this patch goes way beyond my Python knowledge.  Anyone else
> wanna give this a shot?

This looks pretty reasonable to me (calling out to PHP to do processing seems a bit odd, but it'd good reuse of existing code).

The use of lambda and passing functions makes this a bit more tricky, but I don't think it's too bad.

I can review this and also test it in my local install if you'd like, lmk.
Comment on attachment 595726 [details] [diff] [review]
v3

(In reply to Robert Helmer [:rhelmer] from comment #17)
> I can review this and also test it in my local install if you'd like, lmk.

I would appreciate it, thanks.


(In reply to Justin Lebar [:jlebar] from comment #15)
> I dunno how happy the log server will be to be hammered like this on the
> occasion that a build turns everything orange.  I guess the oranges won't
> all come in at the same time.

It should not be that bad when we load smaller incremental changes via the cronjob

> Also, what about comment 12 -- will we retry in case the logs aren't there?

We throw in that case, and thanks for the change I did in ParallelLogGenerating.php, we rethrow and die() when control reaches getLogExcerpt.php.
Before, we would just timeout after 60s. The assumption was that maybe a parallel script would fetch a valid log, but that assumption wasn’t a good one.
Attachment #595726 - Flags: review?(rhelmer)
Comment on attachment 595726 [details] [diff] [review]
v3


>+    class PrefetchJob(object):
>+        def __init__(self, job):
>+            self.process = subprocess.Popen(job, stdout = subprocess.PIPE, stderr = open('/dev/null', 'w'))


Why not:
stderr = None

Also it'd be nice to catch OSError and print the failing command (I didn't have the PHP CLI installed and the traceback isn't very informative :) )

Code looks ok to me (and seems to work fine) otherwise, although I do occasionally see:
PHP Notice:  Undefined property: stdClass::$logDescription in /var/www/tbpl/php/inc/AnnotatedSummaryGenerator.php on line 110

This is likely a pre-existing issue and goes to the Apache error logs (that nobody regularly looks at)
Attachment #595726 - Flags: review?(rhelmer) → review+
(In reply to Robert Helmer [:rhelmer] from comment #19)
> Why not:
> stderr = None

That way a lot of php startup spew (failed loading extension, ...) goes to stderr, I just wanted to suppress that.

> Also it'd be nice to catch OSError and print the failing command (I didn't
> have the PHP CLI installed and the traceback isn't very informative :) )

Done.

> Code looks ok to me (and seems to work fine) otherwise, although I do
> occasionally see:
> PHP Notice:  Undefined property: stdClass::$logDescription in
> /var/www/tbpl/php/inc/AnnotatedSummaryGenerator.php on line 110
> 
> This is likely a pre-existing issue and goes to the Apache error logs (that
> nobody regularly looks at)

Thanks for catching that. That was my copy&paste fail :-)
Attachment #595726 - Attachment is obsolete: true
Attachment #595726 - Flags: review?(mstange)
Attachment #596127 - Flags: review?(mstange)
Comment on attachment 596127 [details] [diff] [review]
v4, comments addressed

I don't completely understand the python part, but I trust rhelmer and you on that. The rest looks great.
Attachment #596127 - Flags: review?(mstange) → review+
In case the python part makes problems we can just return after the db.commit()
That way we at least get the real bug-cache which should be a massive improvement by itself.
Depends on: 728203
Pushed: http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/a266f04ed4ec
Lets see how the staging server handles the load. It should run a lot smoother once the bug cache is primed.
At the moment, it seems to be saying "Log not available" as a summary for everything, which won't be much fun to debug.
https://tbpl-dev.allizom.org/php/getLogExcerpt.php?id=9492362&tree=Firefox&type=annotated <- this works (the „filename“ is blacklisted in the bug search)

For all the others, we are catching some kind of Exception: http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/a266f04ed4ec/php/getLogExcerpt.php#l65
The bugscache table was created in the wrong database. Should work fine now.
Suppose we had the once-promised QA team. What would you tell them to look for on tbpl-dev, to tell whether or not this is working as designed?

Near as I can tell just from gut feeling of how long things should take, we're never successfully prefetching logs (everything always spins for a fetching-length time "loading results" and "retrieving summary"), and we might be caching bug searches (the permaorange 10.7 failures are handy for loading the same failures over and over).
Well I would just ask how full the database is. Or how the load on the server is.
A log from the python importer would be good, maybe there is something wrong with the importer, maybe there is no php-cli installed, like the issue rhelmer was seeing?
Depends on: 730677
Sorry for the noise, going to temporarily back this out so we can get bug 733556 tested on -dev and out into production.

Maybe we should use a feature branch for this?
(In reply to Robert Helmer [:rhelmer] from comment #29)
> Sorry for the noise, going to temporarily back this out so we can get bug
> 733556 tested on -dev and out into production.
> 
> Maybe we should use a feature branch for this?

Backed out - http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/54c6550a89c4
Not sure how we could manage it on a feature branch: so far as Swatinem knows it is finished and working just fine on his server, and what's left is to find out whether or not it works on MoCo's server.

I stared at tbpl-dev for several days without being able to guess whether it was working at all, asked how I could tell, then Swatinem filed bug 730677 to find out, and we'd sure like to find out, what with the whole "this implements a bug cache, and gerv is yelling at us about hitting bzapi too much, probably because someone else is yelling at him about bzapi hitting bmo too much."
(In reply to Robert Helmer [:rhelmer] from comment #30)
> Backed out -
> http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/file/
> 54c6550a89c4

The newly added Timer.php wasn't backed out/removed, so have done so here:
http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/68a941894bf2
Whiteboard: [sheriff-want]
It’s been a long time since there was some progress here.
Since we have had importer logs for some time now, I pushed the patch again (http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/4caf05b3d84d) lets see how it behaves on tbpl-dev.
Depends on: 785616
This broke the initial data import when using Vagrant:

err: /Stage[main]/Tbpl-base/Exec[/usr/bin/python /var/www/tbpl/dataimport/import-buildbot-data.py -d 2]/returns: change from notrun to 0 failed: /usr/bin/python /var/www/tbpl/dataimport/import-buildbot-data.py -d 2 returned 1 instead of one of [0] at /tmp/vagrant-puppet/manifests/classes/tbpl-base.pp:90
So what does that error message mean exactly?
(In reply to Ed Morley [:edmorley] from comment #34)
> This broke the initial data import when using Vagrant:
> 
> err: /Stage[main]/Tbpl-base/Exec[/usr/bin/python
> /var/www/tbpl/dataimport/import-buildbot-data.py -d 2]/returns: change from
> notrun to 0 failed: /usr/bin/python
> /var/www/tbpl/dataimport/import-buildbot-data.py -d 2 returned 1 instead of
> one of [0] at /tmp/vagrant-puppet/manifests/classes/tbpl-base.pp:90

You might want to have this set somewhere (e.g. in init.pp):

Exec { logoutput => on_failure }
(In reply to Robert Helmer [:rhelmer] from comment #36)
> You might want to have this set somewhere (e.g. in init.pp):
> 
> Exec { logoutput => on_failure }

Thank you, I didn't know that Vagrant trick :-)

(In reply to Arpad Borsos (Swatinem) from comment #35)
> So what does that error message mean exactly?

More useful output:

notice: import-buildbot-data.py -d 2]/returns: Fetching http://builddata.pub.build.mozilla.org/buildjson/builds-4hr.js.gz ...
notice: import-buildbot-data.py -d 2]/returns: Traversing runs and inserting into database...
notice: import-buildbot-data.py -d 2]/returns: Inserted 3387 new run entries.
notice: import-buildbot-data.py -d 2]/returns: Traversing builders and updating database...
notice: import-buildbot-data.py -d 2]/returns: Updated 1269 builders.
notice: import-buildbot-data.py -d 2]/returns: Fetching http://builddata.pub.build.mozilla.org/buildjson/builds-2012-09-03.js.gz ...
notice: import-buildbot-data.py -d 2]/returns: Traceback (most recent call last):
notice: import-buildbot-data.py -d 2]/returns:   File "/var/www/tbpl/dataimport/import-buildbot-data.py", line 382, in <module>
notice: import-buildbot-data.py -d 2]/returns:     main()
notice: import-buildbot-data.py -d 2]/returns:   File "/var/www/tbpl/dataimport/import-buildbot-data.py", line 315, in main
notice: import-buildbot-data.py -d 2]/returns:     inserted_runs.extend(do_date(today - datetime.timedelta(i), db, options.overwrite))
notice: import-buildbot-data.py -d 2]/returns: TypeError: 'NoneType' object is not iterable
err: import-buildbot-data.py -d 2]/returns: change from notrun to 0 failed: /usr/bin/python /var/www/tbpl/dataimport/import-buildbot-data.py -d 2 returned 1 instead of one of [0] at /tmp/vagrant-puppet/manifests/classes/tbpl-base.pp:90
(Filed bug 787922 to add that logoutput line to init.pp)
Attached patch Fix daily import (obsolete) — Splinter Review
Ah yeah, it broke the daily import somehow. In case that the url is invalid.
Which it is for some reason. Does the exporter only export past days and not the current day anymore?
Attachment #657848 - Flags: review?(bmo)
Comment on attachment 657848 [details] [diff] [review]
Fix daily import

This fixes that error, however the Vagrant TBPL instance still times out when accessed (works with the initial landing in this bug backed out).

(In reply to Arpad Borsos (Swatinem) from comment #39)
> Ah yeah, it broke the daily import somehow. In case that the url is invalid.
> Which it is for some reason. Does the exporter only export past days and not
> the current day anymore?

(If I've understood correctly,) yeah builds-YYYY-MM-DD.js.gz only get updated once a day, meaning we have a gap between the last complete builds-YYYY-MM-DD.js.gz and builds-4hr.js.gz :-( (See bug 765451).
Attachment #657848 - Flags: review?(bmo) → review+
(In reply to Ed Morley [:edmorley] from comment #40)
> This fixes that error, however the Vagrant TBPL instance still times out
> when accessed (works with the initial landing in this bug backed out).

Presume it's trying to pre-fetch the last two days worth of logs (given Vagrant's '-d 2'). Perhaps we should only pre-fetch logs from builds-4hr.js.gz and not for the full range being imported when using -d? 

ie not do the inserted_runs.extend for the options.days and s/inserted_runs/inserted_recent_runs/ or similar?
This is without the .extend(), i guess we really don’t need to prefetch for older entries. Also you can try to run the job with -t 2 to define a timeout of 2secs per jobs, -w defines the number of php workers (default is 10).
That should bring the run time down a bit...
Attachment #657848 - Attachment is obsolete: true
Attachment #657860 - Flags: review?(bmo)
Only pre-fetch logs on runs imported from builds-4hr.js.gz, to avoid fetching all the things.
Attachment #657861 - Flags: review?(arpad.borsos)
Attachment #657861 - Attachment is obsolete: true
Attachment #657861 - Flags: review?(arpad.borsos)
Comment on attachment 657860 [details] [diff] [review]
Dont prefetch for old logs

:-)
Attachment #657860 - Flags: review?(bmo) → review+
Depends on: 787950
With the Vagrant logging patch from bug 787950, I get logs like:
http://pastebin.mozilla.org/1800918
Yep, thats what its supposed to look like. Once the bug cache is primed, generating the summaries should be a lot faster...
(In reply to Arpad Borsos (Swatinem) from comment #46)
> Yep, thats what its supposed to look like. Once the bug cache is primed,
> generating the summaries should be a lot faster...

Sorry I had hit submit too soon...

(In reply to Ed Morley [:edmorley] from comment #45)
> With the Vagrant logging patch from bug 787950, I get logs like:
> http://pastebin.mozilla.org/1800918

...which I'm presuming is now correct.

However, in order to see if the prefetching is actually working (given that I need to find a recent result from builds-4hr.js.gz), it would help to have the tree/revision/test name in the log output so I don't have to manually look it up from builds-4hr.js.gz. We can also ditch the |$str.= ' ('.$_GET['id'].')';| in Timer.php unless you have any objections.

Patch coming up.
Turns:
{
php /var/www/tbpl/dataimport/../php/getLogExcerpt.php id=14924778&type=annotated:
generating raw log (14924778): 1012ms
generating general_error log (14924778): 1103ms
generating annotatedsummary log (14924778): 1ms
completed after 1234ms
}

Into something like:
{
Prefetching: TB Rev3 Fedora 12 comm-release opt test mozmill on f884da2e73ac
php /var/www/tbpl/dataimport/../php/getLogExcerpt.php id=14927314&type=annotated
generating raw log: 1496ms
generating general_error log: 1863ms
generating annotatedsummary log: 6ms
completed after 1999ms
}
Attachment #657919 - Flags: review?(arpad.borsos)
Attachment #657919 - Flags: review?(arpad.borsos) → review+
Comment on attachment 657919 [details] [diff] [review]
Output the buildername and revision

Thank you for the quick review :-)

http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/64c28d46e682
Comment on attachment 657960 [details] [diff] [review]
Temporary extra logging

To try and help debug bug 785616 comment 4.
Attachment #657960 - Attachment description: Temporary ex → Temporary extra logging
Attachment #657960 - Attachment is patch: true
Attachment #657960 - Flags: review?(arpad.borsos)
Comment on attachment 657960 [details] [diff] [review]
Temporary extra logging

>+          if (PHP_SAPI === 'cli')
>+            echo 'exception generating log: ' . $e . "\n";
>           // rethrow, so we actually kill the script instead of timing out
>           throw $e;

With display_errors now set to On in this version of the logging patch, this hunk is redundant I guess. I won't bother reattaching, but will qref locally.
Comment on attachment 657960 [details] [diff] [review]
Temporary extra logging

Review of attachment 657960 [details] [diff] [review]:
-----------------------------------------------------------------

I don’t quite like the `if (PHP_SAPI === 'cli')` everywhere.
Feel free to leave it for now, but the better solution would be something like Timer, which can enable logging if we need it.
Attachment #657960 - Flags: review?(arpad.borsos) → review+
Comment on attachment 657960 [details] [diff] [review]
Temporary extra logging

(In reply to Arpad Borsos (Swatinem) from comment #53)
> I don’t quite like the `if (PHP_SAPI === 'cli')` everywhere.
> Feel free to leave it for now, but the better solution would be something
> like Timer, which can enable logging if we need it.

Yeah I almost added a switch, but I intend on backing this all out once we have seen the output on tbpl-dev.

http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/e3325a59f257
Depends on: 788197
The import logs in https://tbpl-dev.allizom.org/cache/ are now 0 bytes.

Does the MYSQL_PASSWORD variable in the cron file need updating to the new password too? (At least for the Vagrant project, the sql password is specified in both the cron file & also php/config.php).
Bah, wrong bug sorry.
```
Warning: fclose() expects parameter 1 to be resource, string given in /data/genericrhel6-dev/src/tbpl-dev.allizom.org/tbpl/php/inc/RawGzLogDownloader.php on line 45
```
I must have missed that when i wrote the original patch. We use file_get_contents() now so the fclose is not necessary.
Attachment #658228 - Flags: review?(bmo)
Comment on attachment 658228 [details] [diff] [review]
Remove leftover fclose()

:-)
Attachment #658228 - Flags: review?(bmo) → review+
(In reply to Ed Morley [:edmorley] from comment #54)
> Comment on attachment 657960 [details] [diff] [review]
> Temporary extra logging
> http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/
> e3325a59f257

Backed out in:
http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/435be348c285

Thanks for doing that Arpad :-)
(In reply to Ed Morley [:edmorley] from comment #58)
> Comment on attachment 658228 [details] [diff] [review]
> Remove leftover fclose()
> 
> :-)

http://hg.mozilla.org/users/mstange_themasta.com/tinderboxpushlog/rev/750c0f7d20ab
Blocks: 788518
In bug 728203, the new bugscache table schema was revised before landing on tbpl production. This patch makes the in-tree copy match reality.
Attachment #659281 - Flags: review?(arpad.borsos)
Attachment #659281 - Flags: review?(arpad.borsos) → review+
I've played about with this on tbpl-dev it and it seems to work pretty well.

The only regression per-se is for failures where there were already results returned from the sw:orange search, but none of them matched, so a bug was filed. It now takes 24 hours before the new annotated summaries will show the new bug, due to the bugscache. Doesn't affect failures where the original sw:orange search returned nothing (which from personal experience is >50% of new oranges), since in that case we don't insert the filename into the bugscache.

However, I think that 24 hours may be a bit long to have to manually star (some) new failures, so we may need to resolve this before pushing to prod.

Short term, perhaps we can just reduce the 24 hour expiration to every couple of hours maybe? (Which given that we don't have any caching at all on prod at the moment, will still mean less of a hit on bzapi than at present)

Longer term:
* Add a reload link to the annotated summary panel which refreshes that cache entry.
* We could make the (eventual) solution to bug 779529 clear the cache entry for us (however this doesn't help for bugs filed directly).
* ...any other ideas?

As for pushing to production:
* We'll need to check that bug 785616 (php-cli) & bug 788197 (php-sql) were done for production too (if it is on a different machine).
* The bugscache table is already on production (bug 728203 comment 5), so no DB changes needed.
* ...have I missed anything else?
Depends on: 789506
Swatinem/philor/mbrubeck, any thoughts on comment 62? 

Latest TBPL push to prod is blocked on this, would like to not have to backout if possible - but things like the marionette changes are needed pretty soon :-)
You can change the time range in the importer script as you want. Forcing reload, well we could add that later maybe...
If we can refresh every 2-3 hours (or more frequently) then I think we'll be fine.  Pasting the bug manually for the first couple hours isn't too bad, since it's still fresh in the mind of the person who filed it.
Yeah, I don't think I have any problem with 3 hours, if you're starring something that much in the first 3 hours you should be backing out the cause instead of starring it.
Only keeps entries for 3 hours instead of 24.
Depends on: 790559
We may need to increase timeouts; will leave as is for now & we can look on prod:

{
Prefetching: Rev5 MacOSX Mountain Lion 10.8 mozilla-aurora debug test mochitests-5/5 on 8a8505859cd1
php /data/genericrhel6-dev/src/tbpl-dev.allizom.org/tbpl/dataimport/../php/getLogExcerpt.php id=15154383&type=annotated
generating raw log: 755ms
generating general_error log: 1088ms
fetching bugs for "test_handlerApps.xhtml": 6183ms
fetching bugs for "test_unsafeBidiChars.xhtml": 4615ms
fetching bugs for "test_bug760802.html": 6689ms
generating annotatedsummary log: 17782ms
timed out...

Prefetching: Rev5 MacOSX Mountain Lion 10.8 mozilla-aurora debug test mochitests-1/5 on 8a8505859cd1
php /data/genericrhel6-dev/src/tbpl-dev.allizom.org/tbpl/dataimport/../php/getLogExcerpt.php id=15154402&type=annotated
generating raw log: 1105ms
generating general_error log: 2881ms
fetching bugs for "test_webgl_conformance_test_suite.html": 5542ms
fetching bugs for "plugin process 841": 8218ms
fetching bugs for "tab process 839": 4951ms
generating annotatedsummary log: 19140ms
timed out...

Prefetching: Rev3 WINNT 5.1 mozilla-inbound debug test mochitests-4/5 on fe96a330ddd8
php /data/genericrhel6-dev/src/tbpl-dev.allizom.org/tbpl/dataimport/../php/getLogExcerpt.php id=15154400&type=annotated
generating raw log: 1056ms
generating general_error log: 1870ms
fetching bugs for "test_bug500328.html": 5420ms
fetching bugs for "test_bug586662.html": 6578ms
fetching bugs for "test_bug332636.html": 5228ms
fetching bugs for "test_bug410986.html": 4666ms
fetching bugs for "test_bug414526.html": 5313ms
fetching bugs for "test_bug417418.html": 5232ms
fetching bugs for "TestRunner.js)": 5969ms
generating annotatedsummary log: 38706ms
timed out...
}

Alternatively, switching from whiteboard [orange] to a keyword to improve the bzapi call times (which b.m.o admins would prefer anyway) & then bug 788518, may mean we hit the timeouts less anyway.
The Timeouts are not a real problem I would say. Due to the bugscache, processing just continues where it left of once someone requests the log again, in which case it is retrieved faster than it is currently.
Depends on: 732433
(In reply to Ed Morley [:edmorley UTC+1] from comment #70)
> Alternatively, switching from whiteboard [orange] to a keyword to improve
> the bzapi call times (which b.m.o admins would prefer anyway) 

Filed bug 788518.

(In reply to Arpad Borsos (Swatinem) from comment #71)
> The Timeouts are not a real problem I would say. Due to the bugscache,
> processing just continues where it left of once someone requests the log
> again, in which case it is retrieved faster than it is currently.

Yeah agreed it's not a problem, I just meant more that summaries that take ages to load drive sheriffs up the wall, so it would be nice if we could prefetch more of them in the future. As you say, with this bug they will still load quicker than currently - and with bug 788518 + bug 788518 we should just hit the timeout less :-)
In production.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Depends on: 790889
Depends on: 790895
Depends on: 794483
Product: Webtools → Tree Management
Product: Tree Management → Tree Management Graveyard
You need to log in before you can comment on or make changes to this bug.