Closed Bug 1217490 Opened 9 years ago Closed 8 years ago

[Aries] The dogfood/dogfood-latest channel are reset to nightly after latest OTA

Categories

(Firefox OS Graveyard :: General, defect)

ARM
Gonk (Firefox OS)
defect
Not set
normal

Tracking

(blocking-b2g:2.5+)

RESOLVED FIXED
blocking-b2g 2.5+

People

(Reporter: marcia, Unassigned)

References

Details

(Keywords: foxfood, Whiteboard: [dogfood-blocker])

Delphine and I have now seen this a couple of times on the Aries device.

I run on dogfood-latest and when I update the next day I sometimes don't get an update - when I check the channel I see that is has been reset to either nightly or dogfood.

I will try to get logs the next time it happens - today when I updated the channel was preserved.
QA Whiteboard: [FOTA Issues]
Today for me the channel was not preserved. Happens exactly like Marcia explains in description, every time
bug 1206741 is probably the root cause.  We ensure the channel and updates in that bug.
Not sure how to resolve this situation where we want to preserve; esp if we want to change the channels for some people...

Pinging Janx.
Flags: needinfo?(janx)
Good find Naoki, that's indeed the root cause.

The "app.update.url" and "app.update.channel" custom settings have recently been changed so that they get reset to the underlying pref if that ever changes between updates. In effect, this allows updates to modify these, e.g. to the update URL format (we needed to add a new parameter).

The cases where your "dogfood-latest" channel gets reset to "nightly" or "dogfood" are:
- The underlying "app.update.channel" pref has changed between updates (e.g. it was "nightly" and now it is "dogfood", so in that case your update channel gets reset to the latest value "dogfood").
- We don't know what the underlying pref was before (before bug 1206741 landed, we weren't keeping track of pref values across updates, so for that update we decided to reset everyone's settings just this once).
- You deleted or changed the "app.update.channel.backup" pref (that is what we use to detect pref changes across updates, so if it's different from the "app.update.channel" pref, or if it's missing, the "app.update.channel" setting gets reset to the pref value).

Now, I agree that allowing updates to change the "app.update.url" or ".channel" is against the original design of these prefs in 2005, and maybe the new "reset if the underlying pref has changed between updates" behavior might only be warranted for "app.update.url", not for "app.update.channel".

Naoki, do you think that OTA or FOTA updates could ever seek to change a device's update channel? If we think that will never be the case, we could completely ignore "app.update.channel" pref changes across updates, and always keep the custom setting value.
Flags: needinfo?(janx) → needinfo?(nhirata.bugzilla)
Ah.  so basically a flash of the nightly, and then a channel change to nightly-test or nightly-latest will cause the channel to reset again.  This is something that QA will do.  it would mean that they would have to be aware of the app.update.channel.backup and change that pref as well in order to keep the pref.  Some nightly testers and dev prefer to be on the nightly-latest

I believe that there may be a need to change the channel in the near future, but not at this current time.  I'm not sure of the best solution for this.
Flags: needinfo?(nhirata.bugzilla) → needinfo?(janx)
(In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from comment #4)
> Ah.  so basically a flash of the nightly, and then a channel change to
> nightly-test or nightly-latest will cause the channel to reset again.

Actually the flash of nightly by itself will reset the channel the first time (because before the flash there was no backup pref at all).

You can then manually change back to any custom value like nightly-test or nightly-latest, using the setting. That value will then persist (even after reboot) until you flash again with a pref value that's different from whatever the initial flash used (so, flashing again with the same channel as in the first flash will keep your custom nightly-test or nightly-latest value).

> This is something that QA will do.  it would mean that they would have to be
> aware of the app.update.channel.backup and change that pref as well in order
> to keep the pref.  Some nightly testers and dev prefer to be on the
> nightly-latest

How can I make QA, nightly testers and dev aware of the backup pref? I was hoping that underlying pref changes would be sufficiently rare not to inconvenience people using custom channels too much (i.e. the value will be reset once after installing the latest FOTA, but hopefully not again until we actually need to change the URL or channel for everyone).

> I believe that there may be a need to change the channel in the near future,
> but not at this current time.  I'm not sure of the best solution for this.

Ok, what about this then: We don't use the backup/reset mechanism for app.update.channel now, just for app.update.url (because that one we actually need to change now). That way custom channels will stay custom instead of being reset. And if later, we ever want to actually change the channel for everyone, we can move app.update.channel to the backup/reset mechanism.

Sound fair?
Flags: needinfo?(janx) → needinfo?(nhirata.bugzilla)
(In reply to Jan Keromnes [:janx] from comment #5)
> (In reply to Naoki Hirata :nhirata (please use needinfo instead of cc) from
> comment #4)
> > Ah.  so basically a flash of the nightly, and then a channel change to
> > nightly-test or nightly-latest will cause the channel to reset again.
> 
> Actually the flash of nightly by itself will reset the channel the first
> time (because before the flash there was no backup pref at all).
> 
> You can then manually change back to any custom value like nightly-test or
> nightly-latest, using the setting. That value will then persist (even after
> reboot) until you flash again with a pref value that's different from
> whatever the initial flash used (so, flashing again with the same channel as
> in the first flash will keep your custom nightly-test or nightly-latest
> value).
> 
> > This is something that QA will do.  it would mean that they would have to be
> > aware of the app.update.channel.backup and change that pref as well in order
> > to keep the pref.  Some nightly testers and dev prefer to be on the
> > nightly-latest

Thanks for the info

> How can I make QA, nightly testers and dev aware of the backup pref? I was
> hoping that underlying pref changes would be sufficiently rare not to
> inconvenience people using custom channels too much (i.e. the value will be
> reset once after installing the latest FOTA, but hopefully not again until
> we actually need to change the URL or channel for everyone).

MozDev would be a good place to start.  I kinda think we need to start documenting all of our prefs in one spot.  in any cases, having some blurb in https://developer.mozilla.org/en-US/docs/Mozilla/Preferences/A_brief_guide_to_Mozilla_preferences?redirectlocale=en-US&redirectslug=A_Brief_Guide_to_Mozilla_Preferences or https://developer.mozilla.org/en-US/Firefox_OS/Building_and_installing_Firefox_OS/Firefox_OS_update_packages may help.  Chris Mills is an excellent person to contact and he's quite personable to work with in regards to documentation.

> > I believe that there may be a need to change the channel in the near future,
> > but not at this current time.  I'm not sure of the best solution for this.
> 
> Ok, what about this then: We don't use the backup/reset mechanism for
> app.update.channel now, just for app.update.url (because that one we
> actually need to change now). That way custom channels will stay custom
> instead of being reset. And if later, we ever want to actually change the
> channel for everyone, we can move app.update.channel to the backup/reset
> mechanism.
> 
> Sound fair?

Sounds fair.  After thinking about it a little other thought was just having some sort of flag so that we can force the mechanism to take place if need be; ie FORCE_BACKUP_CHANNEL=1 or something in the build flag...  I mean the code work is already there, we might as well utilize it whenever we want to, right?  :)
Flags: needinfo?(nhirata.bugzilla)
We should also point to the pref documentation when they launch the FxOS Participation Hub sometime in the near future.
Chris, can you document this change please somewhere?  I'm not sure the appropriate location.

the change is that the following pref, "app.update.channel.backup" should be changed on the device using webide to the channel you are using, if you change the channel ( ie nightly-latest or nightly-test ), in order to keep it the same after you do an update.
Flags: needinfo?(cmills)
So, if I follow properly:

* all our foxfooders that installed latest OTA got their channel switched to "nightly".
* so they won't get the updates that come from the channel "dogfood".

Am I right here ?
How do we get out of this situation ?
[Blocking Requested - why for this release]:
blocking-b2g: --- → 2.5?
More importantly, should we keep from giving away dogfood Sony Z3C to contributors at MozFest this week-end, if they get stuck with OTA ?
To be clear, we have 50 devices we plan to distribute at MozFest over the weekend to new Foxfooders from more than 30 different countries. The whole point of distributing at MozFest is to avoid having the logistical headache of shipping devices to all these different countries, paying crazy custom/shipping fees etc...

It would be extremely unfortunate if we did not distribute these phones. 

Instead, what would be great is to explain very clearly what the situation to the Foxfooders and make sure they follow the necessary steps to make sure their phone is back to foxfooding once we have found a fix for this.
Summary: [Aries] After OTA my dogfood-latest channel is sometimes reset to nightly or dogfood → [Aries] The dogfood-latest channel is reset to nightly or dogfood after latest OTA
After some investigation, it seems like we always switch back to nightly even though we're on dogfood.

:julienw has always OTA'd on the dogfood channel (no FOTA update), and he's now on the nightly channel. From my side, I have changed that setting a couple of a time, since I started dogfooding. However, the last time I did an update, I'm sure I was on the dogfood channel. 

Finally, I manage to reproduce (2/2) while updating from [1] (a build close to the one-before-last available OTA) to the current OTA[2]. I did not change any setting. Here are the STR:

1. Download and flash [1]
2. Go to settings, developer and check you're on the dogfood channel
3. Enable Wi-Fi and search for updated
4. Download and install the available system update
5. Go to the developer menu again and see you're now in nightly.

Like said in Friday's standup, pushing any update to dogfooders in that channel might have some unexpected consequences. Flagging this bug as dogfood-blocker. 


[1] https://tools.taskcluster.net/task-inspector/#EXpLFpaBQa6n10MaEcDWtw/1
[2] Build ID               20151023104059
Gaia Revision          410e91ddabc7ba82a9b43b3711a3fdf2cb8de309
Gaia Date              2015-10-23 05:57:04
Gecko Revision         https://hg.mozilla.org/mozilla-central/rev/0625c68c0abcfe4d10880d15d8fe7d06df3369c9
Gecko Version          44.0a1
Device Name            aries
Firmware(Release)      4.4.2
Firmware(Incremental)  eng.worker.20150619.224059
Firmware Date          Fri Jun 19 22:41:08 UTC 2015
Bootloader             s1
blocking-b2g: 2.5? → 2.5+
Summary: [Aries] The dogfood-latest channel is reset to nightly or dogfood after latest OTA → [Aries] The dogfood/dogfood-latest channel are reset to nightly after latest OTA
Whiteboard: [dogfood-blocker]
Johan, even with app.update.channel.backup set to dogfood?
Flags: needinfo?(jlorenzo)
Here is a summary of what happened:

- On 2015-10-05, we landed bug 1206741, which changed the way we way treat the "app.update.{channel,url}" settings. Before the patch, they never changed except if the user went to the Settings app to edit their value; After the patch, these settings started being reset if their underlying pref changed. Also, both settings are reset the very first time this new code lands.

- On 2015-10-23, we pushed an OTA out to the "dogfood" channel, but it had the pref "app.update.channel" set to "nightly". This, in combination with the new code, caused every device on the "dogfood" channel to permanently switch to the "nightly" channel.

Here is how I suggest to fix this:

1) Change the code again to make "app.update.channel" setting-resets harder (e.g. implement a necessary "RESET_UPDATE_CHANNEL=1" build flag, without which we never touch "app.update.channel" setting). I'll write the patch for it.

2) Find out why we're pushing OTAs on the "dogfood" channel that have the pref "app.update.channel" set to "nightly" instead of "dogfood". I think that's a build problem, and that all the builds we push on a given channel should have the "app.update.channel" pref set to the same channel. Naoki, any ideas what went wrong / how to fix it?

3) Once 1) and 2) are fixed, find a way to get all the wrongly-updated dogfood-devices back to the "dogfood" channel. Maybe we can we push a "nightly" build that detects dogfood-devices, and fixes the channel back to "dogfood" if it does?
Flags: needinfo?(nhirata.bugzilla)
With the help of :janx, I performed some more testing.

A) Before the OTA, creating the pref "app.update.channel.backup" and setting it to "dogfood" doesn't show any effect once the OTA is done.

B) The current dogfood update points to a different type of build than it used to be. Today, this link[1] returns:
> <updates>
>   <update appVersion="44.0a1" buildID="20151023104059" displayVersion="44.0a1" platformVersion="44.0a1" type="minor"><patch URL="https://queue.taskcluster.net/v1/task/JNQe3bXTQ6ia7qabcB9vCA/runs/0/artifacts/public/build/b2g-aries-gecko-update.mar" hashFunction="sha512"
>     hashValue="fda242c5db4cbfc50e35fc289eb759e8d8cdef28498a9c97718462c2c2158cb037016df56a27770dc28a56086bb4dd29ecb988e818dc4745217d7ac6dc28c35c" size="129364858" type="complete"/></update>
> </updates>

This taskcluster job[2] is stored under a new namespace which ends by "aries-ota.opt", where as the previous one[3] ends by "aries-dogfood.debug". My current assumption is that this new "aries-ota.opt" job doesn't have the right build flags.

One more thing to make sure of this assumption: if you flash the full image from [2] directly on your phone, you get the "nightly" channel.

Naoki, do you know the bug that introduced jobs like "aries-ota.opt"? 

[1] https://aus5.mozilla.org/update/3/B2G/44.0a1/20151005195518/aries/en-US/dogfood/Boot2Gecko%202.5.0.0-prerelease%20%28SDK%2019%29/default/default/update.xml?force=1
[2] https://tools.taskcluster.net/task-inspector/#JNQe3bXTQ6ia7qabcB9vCA
[3] https://tools.taskcluster.net/task-inspector/#DIWC88Y9SGeCvaWjQCJwMg
Flags: needinfo?(jlorenzo)
I've made a first stab at documenting this issue, at 

https://developer.mozilla.org/en-US/Firefox_OS/Phone_guide/Flame/Updating_your_Flame#Update_channel_reset_bug
Flags: needinfo?(cmills)
(In reply to Chris Mills (Mozilla, MDN editor) [:cmills] from comment #17)
> I've made a first stab at documenting this issue, at 
> 
> https://developer.mozilla.org/en-US/Firefox_OS/Phone_guide/Flame/
> Updating_your_Flame#Update_channel_reset_bug

Thanks Chris. However, this affects not just the Flame, but any foxfooding devices AFAIK. Certainly the Sony.
(In reply to Brian King [:kinger] from comment #18)
> (In reply to Chris Mills (Mozilla, MDN editor) [:cmills] from comment #17)
> > I've made a first stab at documenting this issue, at 
> > 
> > https://developer.mozilla.org/en-US/Firefox_OS/Phone_guide/Flame/
> > Updating_your_Flame#Update_channel_reset_bug
> 
> Thanks Chris. However, this affects not just the Flame, but any foxfooding
> devices AFAIK. Certainly the Sony.

Yeah, I appreciate that. However, I was told not to document the Sony device on MDN, as it is not a general release device. And we don't really have a page for general foxfooding device/firefox os device troubleshooting. I thought this would at least be a good first step to have something to point people towards.
1) sounds good.
2) I think we're going to try to fix the builds so that they have : B2G_UPDATE_CHANNEL=""  It's more than likely that the build that was used (OTA) has it set to nightly.
3) aries-OTA.opt shouldn't be used for dogfood builds.  Those are suppose to be for nightlies.  aries-dogfood should be used for dogfood builds.  That's set to dogfood channel.

I think the main issue is that people should be using the right full flash builds for the right channel devices.  Otherwise you get weird resets.
Flags: needinfo?(nhirata.bugzilla)
Flags: needinfo?(jlorenzo)
Flags: needinfo?(janx)
Found out the main issue: OTA server is serving OTA nightlies instead of the dogfood OTA.
We need to make a switch there for this bug to be resolved for Dogfooders.
Naoki, what's the plan to fix the dogfooders that did the update already (I'd guess; most of them) ? Manually changing the channel ?
(In reply to Julien Wajsberg [:julienw] from comment #22)
> Naoki, what's the plan to fix the dogfooders that did the update already
> (I'd guess; most of them) ? Manually changing the channel ?

As mentioned in comment 15, "maybe we can we push a "nightly" build that detects dogfood-devices, and fixes the channel back to "dogfood" if it does?"

Otherwise ask dogfooders to change back manually, once we're certain they won't be forced to "nightly" again.
Flags: needinfo?(janx)
We also have bug 1222527 because of this issue, I hope we'll be able to restore the right settings at the right values...
Agreed with pushing a build that detects the dogfood devices, on the nightly channel.
Flags: needinfo?(jlorenzo)
The only way is to manually change for now.  Nightly Aries doesn't provide any updates currently from what I saw on balrog.
Not sure about a mechanism to switch nightly to dogfood.

There's a legal aspect to this, dogfood has metricing and we cannot run the risk of those that didn't sign up for the metricking to be a part of it.

Otherwise it would violate privacy issues.  We have the IMEI of all the devices that are suppose to be in the dogfood program, I would rather force feed the IMEI devices with the correct FOTA build for dogfood later on when the IMEI whitelisting is implemented.

I filed bug 1224357 to explicitly fix central builds for aries.
Turns out that hash only appears for dogfood builds as it stands right now.  

We need to make sure we have a coordinated plan of who is doing what and how we're going to switch the dogfood build over correctly rather than people just pushing code patches, otherwise it will continue to be in a mess.

Let's figure this out in a different bug.
Depends on: 1224725
Depends on: 1209503
Blocks: 1224725
No longer depends on: 1224725
QA Whiteboard: [FOTA Issues] → [FOTA Issues],[foxfood-triage]
Jan, bug 1206741 .. can you back out the patch that is the root cause of this problem.  My concern is we don't have enough time to implement a solution for Mozlando, Dec 8 when we are planning to give 200 Z3C phones to contributors.  This would would create another similar situation to Mozfest which was messing.
Flags: needinfo?(janx)
Hi Jean, bug 1206741 is not the root cause of this problem. It simply makes it possible for an OTA update to change app.update.channel and app.update.url values, which was required for the IMEI-whitelisting (before the patch, there was no way to make devices send their IMEI when asking for updates).

The root cause of this problem was that a "nightly" OTA update was accidentally pushed on the "dogfood" channel, which hopefully shouldn't happen again. With the 200 Z3Cs distributed at Mozlando, the situation wouldn't be similar to MozFest, unless people pushing out updates repeat the mistake of sending a "nightly" build on "dogfood" channel (I think that's unlikely).

The good news is, we're not far from starting our transition plan outlined in bug 1224725 comment 7, as soon as QA (:nhirata or :jlorenzo) find a valid "dogfood" OTA build that we can send to all devices. This should start getting devices we lost to "nightly" channel back to "dogfood" channel.

Actually, you don't even have to care about this problem at all, because devices on the "dogfood" channel are not affected. You can flash any valid "dogfood" build that QA gives you onto Mozlando devices, and they won't be affected.
Flags: needinfo?(janx) → needinfo?(jgong)
I just checked with Johan: the current "dogfood" channel still gives a "nightly" build. This was not changed when we discovered the issue.

We need to make sure that before Mozlando we get a "dogfood" build on this channel. Either using the latest "dogfood" build (instead of a "nightly" build) or a newer one that's validated by QA. My advice here would be to actually change it _now_ to latest known OTA build, so that at least we're safe about it.
Note, when I say "now", we can actually wait until next tuesday and see if we have a sanctified OTA for this week.
Depends on: 1228330
This bug is now fixed. Marcia - Can you confirm and please close it?

Thanks
per comment 34.
Flags: needinfo?(mozillamarcia.knous)
Flags: needinfo?(jgong)
Like diagnosed above, this issue has happened only because of a Settings database update at the same time as a nightly build being pushed to dogfooders. I don't think this issue will happen again until these 2 event happen. This mistake has been resolved by putting everybody back on the dogfood channel.

Let's mark this bug as fixed, and reopen if these mistakes happened one new time.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(mozillamarcia.knous)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.