bugzilla.mozilla.org has resumed normal operation. Attachments prior to 2014 will be unavailable for a few days. This is tracked in Bug 1475801.
Please report any other irregularities here.

Inbound bisection takes too long/times out since it tries to retrieve all builds in the last 7 days

RESOLVED FIXED

Status

Testing
mozregression
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: emorley, Assigned: parkouss)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 1 obsolete attachment)

(Reporter)

Description

4 years ago
Got as far as we can go bisecting nightlies...
Ensuring we have enough metadata to get a pushlog...
Last good revision: 0b81c10a9074 (2014-11-03)
First bad revision: 5dde8ea48fef (2014-11-04)
Pushlog:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0b81c10a9074&tochange=5dde8ea48fef

... attempting to bisect inbound builds (starting from previous week, to make sure no inbound revision is missed)
Getting http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/10/2014-10-27-11-30-17-mozilla-central/firefox-36.0a1.en-US.win32.txt
Getting inbound builds between da125623d9cb and 5dde8ea48fef
Retrieving valid builds from u'http://inbound-archive.pub.build.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1414601774/' generated an exception: 503 Server Error: Service Unavailable: Back-end server is at capacity



Whilst 1 day was probably too small, switching to 7 days (bug 1059856) without more gracefully fetching revisions inbetween means that the console appears to hang for several minutes, and in fact I've been unable to get it to work without timeouts. We should perhaps switch to a smaller number of days until we do things like fetching a few revs at a time within the 7 day range.
(Assignee)

Comment 1

4 years ago
I wonder if we can filter changesets given their date?

When we enter into inbound bisection from nightly, we first got a full list of changesets given a date range (the previous week is applied here).

We ask it like this:

https://hg.mozilla.org/integration/mozilla-inbound/json-pushes?fromchange=6bd2071b373f&tochange=2114ef80f6ae

This step is fast enough, and we have the associated timestamp of the changes.

Then we look for inbounds builds for each changeset on inbound-archive.pub - and here it becomes very slow, because we have to do a lot of http requests.

But maybe we can look for inbound builds only if the date of the changesets is superior or equals to the one we really care about ?

With an example:

Say a nightly bisection lead us to 2014-11-06 (bad) - 2014-11-07 (good)

1. we ask list of changesets from 1 week ago, to be sure we don't miss something
2. we filter these changesets - their datetime must be superior or equal to 2014-11-06
3. on these filtered changesets, we look for inbound builds

This way we will cut a lot of unused downloads and build tries. This require that we can use the changesets timestamps from inbounds against the date given by nightly bisection (can we ?).
(In reply to Julien Pagès from comment #1)
> But maybe we can look for inbound builds only if the date of the changesets
> is superior or equals to the one we really care about ?

Unfortunately we can't do this as there might be revisions in inbound which are rather old that have not yet been merged into central (this is why we go all the way back a week).

A better strategy might be to only grab the actual *revision* information on-demand, instead of all at once. Either that or cache this information in a better data structure (bug 1095756)
(Reporter)

Comment 3

4 years ago
Inbound is normally merged a couple of times a day. The longest it might go (either because of the weekend, or due to breakage) is a 2-3 days. As such, could we just go back 3-4 days rather than 7?

Agree that fetching each revision on demand is probably the best long term approach.
(Assignee)

Comment 4

4 years ago
(In reply to William Lachance (:wlach) from comment #2)
> A better strategy might be to only grab the actual *revision* information
> on-demand, instead of all at once. Either that or cache this information in
> a better data structure (bug 1095756)

(In reply to Ed Morley [:edmorley] from comment #3)
> Agree that fetching each revision on demand is probably the best long term
> approach.

Sounds interesting, I will work on this.
(In reply to Ed Morley [:edmorley] from comment #3)
> Inbound is normally merged a couple of times a day. The longest it might go
> (either because of the weekend, or due to breakage) is a 2-3 days. As such,
> could we just go back 3-4 days rather than 7?

Yeah, before we were just doing one day, which wasn't enough (see bug 1059856). But 7 may be a bit extreme. I could certainly tweak this, though I fear that we'll continue to see this behaviour even with that change implemented. I think maybe fetching revisions on demand or the caching solution (bug 1095756) is a better way of addressing this.

Thoughts? I guess I could just twiddle the value and do an interim release until we have a better solution.
(Assignee)

Comment 6

4 years ago
Created attachment 8520666 [details] [review]
fetch inbound data on demand

This is an attempt to load inbound data on demand.

Briefly:

We have the list of inbound builds folders in the range (this steps does not take a lot of time).

Instead of trying to check each build folder (that is taking a lot of time) to only keep the good ones (ie valids), it does the following:

1. fetch the lower and higer limits of the list if we don't have it (the firsts and lasts valid builds folders)
2. fetch the centered data of the list if we don't have it (the centered valid builds folders)
3. split the list to bisect and go to 1. with this new list.

There is an estimation on data size, because it may reduce when we really fetch data and find out that some builds are invalids, but this data size is only used to indicate an approximation of the remaining steps when doing inbound bisection so it seems ok to me.

This may need some more testing, I'll appreciate any feedback.
Attachment #8520666 - Flags: feedback?(wlachance)
Comment on attachment 8520666 [details] [review]
fetch inbound data on demand

I took this for a spin and it looks really good! I had some minor nits.

I think it would be really helpful if we could add some "debug" level logging so that we could verify that it is bisecting the inbound range correctly. This would help give us confidence before pushing this.
Attachment #8520666 - Flags: feedback?(wlachance) → feedback+
(Assignee)

Comment 8

4 years ago
Sure. I will rework the nits, and will add a couple of print statements to indicate what we are doing - we could move them later to log.debug statements, once we will use logging. Is that ok for you ?
(In reply to Julien Pagès from comment #8)
> Sure. I will rework the nits, and will add a couple of print statements to
> indicate what we are doing - we could move them later to log.debug
> statements, once we will use logging. Is that ok for you ?

Yes, that makes sense to me. It looks like we're pretty close on the logging anyway, right?
(Assignee)

Comment 10

4 years ago
Yes, you're right! I got something working for master and I will rework on it once this one is landed - it seemed more important.
(Assignee)

Comment 11

4 years ago
Created attachment 8520919 [details] [review]
etch inbound data on demand

I think I fixed the nits and added a few print statements. I also added a commit to retry on httperrors when fetching inbound.
Assignee: nobody → j.parkouss
Attachment #8520666 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8520919 - Flags: review?(wlachance)
Comment on attachment 8520919 [details] [review]
etch inbound data on demand

This looks good! The print statements are a bit verbose, I think I'd like to get the logging patch landed before doing a new release.
Attachment #8520919 - Flags: review?(wlachance) → review+
Merged PR
Status: ASSIGNED → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.