Closed Bug 1797704 Opened 2 years ago Closed 1 year ago

firefox-android: Migrate focus-android to the new android monorepo

Categories

(Release Engineering :: General, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlorenzo, Unassigned)

References

Details

(Keywords: meta)

The second major step to accomplish bug 1782733 is to migrate focus-android[1] to the new android monorepo[2]. I'm filing this meta-bug to track what we did for this specific step. This meta-bug fulfills the same purpose as the one I created for android components (bug 1797702).

[1] https://github.com/mozilla-mobile/focus-android/
[2] https://github.com/mozilla-mobile/firefox-android

Depends on: 1797244
Depends on: 1797245
Depends on: 1798878
Depends on: 1799666
Depends on: 1799668
Depends on: 1799670
Depends on: 1799696
Depends on: 1799698
Depends on: 1799701
Depends on: 1800611
Blocks: 1803130
Depends on: 1805639
Depends on: 1806137
Depends on: 1806432
Depends on: 1805513
Depends on: 1805533
Depends on: 1805572
Depends on: 1805762
Depends on: 1803141
Depends on: 1806804

8 days ago, we started to migrate focus-android to the new Android monorepo. I followed what learned in during the Android-Components one (bug 1797702 comment 1). It turns out we faced many different issues that didn't happen in bug 1797702. We basically hit 6 classes of bugs in different shapes and forms.

1. 🔁 Missing scopes

It's a classic and we were expecting them. Either one of the decision/action task is busted, or Chain of Trust doesn't validate a task, or a production worker doesn't behave properly because it's not using the right config.

That's the type of stuff we can test in staging because by nature, we're using a different set of scopes. That said, we went the extra-mile by testing some jobs on the production environment the day before the migration and we were confident about it.

Regarding the decision tasks bustages, :ahal suggested we could add an API to Taskcluster that could test scopes without scheduling tasks, then Decision task could fail with all scope errors at once. I think it's a great idea!

2. Fallout coming from the new version scheme in Android-Components

That's bug 1800611. We changed the pattern in the 109 cycle and there are so many places that don't expect beta numbers like 109.0b1. We're still fixing some of them up.

This could have been prevented by shipping this change one cycle before the focus migration. Although, we didn’t have a cycle to spare.

The good news is: we won't face that again during the Fenix migration. We're done changing version numbers.

3. 🔁 Missing uplifts

One of the major takeaways of bug 1797702 was to handle release branches better. We did get better by automating the creation of the 2 releases branches. However, the 108 branch (currently hosting release builds) was totally out of date. We should have uplifted most of our migration patches to the 108 branch. I basically lost a whole day uplifting patches because I kept facing bugs that we had fixed.

Next time, I’ll keep an eye out to uplift as many changes as we can.

4. 🔁 Performance regression on CI

In bug 1788606, we knew performance of the decision task could be improved. In bug 1803141, we doubled (and sometimes tripled) the duration of the decision task making CI a real pain to start. We also hit bug 1805762, which made the decision workers unstable for a whole day.

We knew it wasn't ideal but we believe it was okay mainly because I had implemeted a Taskcluster cache would make it less painful. Results showed very few decision tasks reused the existing cache, making the optimization close to useless. In the end, it took us another day to switch gears and fix the performance regression as fast as we could.

The toolchain task in charge of generating gradle caches made the day of the migration painful too. It took an hour to complete. It's a known perf issue. It's been like this for years. I'd like to fix that before the fenix one.

5. 🔁 No code freeze before the migration

The toolchain task alone could have been bearable if we ran it once on the day of the migration. However, a breaking change was landed on main less than 12 hours before the migration. This breaking change busted that toolchain task after an hour. The combination made us waste several hours.

We will implement a stronger code freeze before the fenix migration

6. 🔁 Duplicated tasks that overwrite one another.

We migrated the Focus release promotion graphs. We actually merged these graphs into the existing Android-Components one. As a result, we now have sometimes 2 tasks doing similar jobs. The Github Release one is actually misconfigured (bug 1806804).

We could have noticed that in staging. We missed it. That's the type of bug we will face again in the Fenix migration.

Depends on: 1806884

The team met yesterday and we're ready to start the work on fenix (bug 1803130). I'm closing this metabug as FIXED!

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Depends on: 1817944
You need to log in before you can comment on or make changes to this bug.