Permaorange on Windows nightly build checkUpdaterSigSvc.js | run_test - [run_test : 39] the maintenance service exit value should be 0 - 1 == 0

RESOLVED FIXED

Status

()

Toolkit
Application Update
RESOLVED FIXED
2 years ago
a year ago

People

(Reporter: philor, Assigned: mhowell)

Tracking

({intermittent-failure})

Trunk
intermittent-failure
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox45 affected)

Details

(Reporter)

Description

2 years ago
Regression from bug 1079858.

https://treeherder.mozilla.org/logviewer.html#?job_id=2876170&repo=mozilla-central
https://treeherder.mozilla.org/logviewer.html#?job_id=2876197&repo=mozilla-central
https://treeherder.mozilla.org/logviewer.html#?job_id=2876324&repo=mozilla-central

Comment 1

2 years ago
12 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-central: 12

Platform breakdown:
* windows8-64: 5
* windows7-32: 4
* windowsxp: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232078&startday=2015-12-07&endday=2015-12-13&tree=all

Comment 2

2 years ago
26 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-central: 17
* mozilla-aurora: 9

Platform breakdown:
* windowsxp: 9
* windows8-64: 9
* windows7-32: 8

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232078&startday=2015-12-14&endday=2015-12-20&tree=all

Comment 3

2 years ago
24 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-central: 21
* mozilla-aurora: 3

Platform breakdown:
* windowsxp: 8
* windows8-64: 8
* windows7-32: 8

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232078&startday=2015-12-21&endday=2015-12-27&tree=all

Comment 4

2 years ago
34 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-central: 18
* mozilla-aurora: 16

Platform breakdown:
* windows8-64: 13
* windowsxp: 11
* windows7-32: 10

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232078&startday=2015-12-28&endday=2016-01-03&tree=all

Comment 5

2 years ago
37 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-aurora: 19
* mozilla-central: 18

Platform breakdown:
* windowsxp: 13
* windows8-64: 12
* windows7-32: 12

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232078&startday=2016-01-04&endday=2016-01-10&tree=all

Comment 6

2 years ago
45 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-central: 30
* mozilla-aurora: 15

Platform breakdown:
* windowsxp: 15
* windows8-64: 15
* windows7-32: 15

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232078&startday=2016-01-11&endday=2016-01-17&tree=all
matt, could you take a look at this, thanks!
Flags: needinfo?(mhowell)
(Assignee)

Comment 8

2 years ago
I've been trying, but I'm at a loss to understand how the referenced bug could have broken this test, or in fact how the test could ever have passed at all. As far as I can see, the test requires two things that won't ever be present: properly signed binaries, and a couple of Windows registry entries that the test can't create because they can only be written by an elevated user.
rstrong, it looks like you reviewed the patch that added this test; can you take a quick look and maybe point out what I'm missing?
Flags: needinfo?(mhowell) → needinfo?(robert.strong.bugs)
The build systems should have the required registry entries which is why the tests passed before the cert changed and the binaries are signed with a test certificate.
Flags: needinfo?(robert.strong.bugs)
(Assignee)

Comment 10

2 years ago
Oh, okay; thanks, Robert.
So this is another issue with the registry-pinned certificate information. Unless I misunderstand (always possible), it sounds like the build/test machines have those registry entries, but maybe they haven't been updated with the new SHA-2 cert info; that would explain why this test is failing.
Ben, would you be the right person to check on those registry entries and help get them up to date?
Flags: needinfo?(bhearsum)
(In reply to Matt Howell [:mhowell] from comment #10)
> Oh, okay; thanks, Robert.
> So this is another issue with the registry-pinned certificate information.
> Unless I misunderstand (always possible), it sounds like the build/test
> machines have those registry entries, but maybe they haven't been updated
> with the new SHA-2 cert info; that would explain why this test is failing.
> Ben, would you be the right person to check on those registry entries and
> help get them up to date?

So...we installed the new fake root in bug 1203990, but I'm not sure we ever updated the maintenace registry entries. I have a vague memory of someone saying that they should be able to update themselves after the initial version is installed. If we need some extra registry updates, RelOps can probably do that.
Flags: needinfo?(bhearsum)
(Assignee)

Comment 12

a year ago
(In reply to Ben Hearsum (:bhearsum) from comment #11)

Thanks, Ben. Do you know who in RelOps I should pester? I don't know anybody over there.

I'm still learning how all the build/test infrastructure works. I think I have a better idea what's going on now, which as far as I can tell is this:
Bug 1203990 replaced the fallback certificate that's used to sign test builds with a shiny new SHA-256 one. The old fallback certificate had its subject name set to "Mozilla Fake SPC", and the registry entries on the test machines are set to that (I guess). But the new one has "Mozilla Fake CA" as the subject, and the registry entries weren't changed. And that's why the test is failing, because the name on the signing certificate doesn't match the name in the registry. As far as I can see, that's what's happening.

In addition to updating the test machines directly, we'll also want to update the maintenance service installer code to create the modified fallback key, since it's still using the old name as well.
Assignee: nobody → mhowell
Flags: needinfo?(bhearsum)
The new self created certificates should have the same values as the old ones so updating the names isn't necessary. Another reason that this is optimal is that the build systems are the same across all branches and don't always reboot between runs so they can't just be updated on boot.
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #13)
> The new self created certificates should have the same values as the old
> ones so updating the names isn't necessary. Another reason that this is
> optimal is that the build systems are the same across all branches and don't
> always reboot between runs so they can't just be updated on boot.

Yeah, the root and cert subjects shouldn't have changed...

Matt, maybe a sloan would be the best thing here, so you can poke directly at a build machine on your own? You can request one through https://wiki.mozilla.org/ReleaseEngineering/How_To/Request_a_slave
Flags: needinfo?(bhearsum) → needinfo?(mhowell)
(Assignee)

Comment 15

a year ago
(In reply to Ben Hearsum (:bhearsum) from comment #14)
> Matt, maybe a sloan would be the best thing here, so you can poke directly
> at a build machine on your own? You can request one through
> https://wiki.mozilla.org/ReleaseEngineering/How_To/Request_a_slave

Oh, awesome, that's exactly what I need; I'll put in such a request and reference this bug.
Flags: needinfo?(mhowell)
(Assignee)

Comment 16

a year ago
You guys were right and I was wrong about the subject name; it really hasn't changed. I've had a look at a build slave [1], and I found something different. The test builds that are failing aren't being signed using the fake certificate, they're being signed using the "real" production DigiCert certificate. The slaves don't have that one in their registry, they only have entries for the old SHA-1 DigiCert and for the fake CA.
This isn't affecting all builds; here [2] is a recent push that failed because the build is signed with the DigiCert CA, but here [3] is one that passed because that build is signed with our fake CA. I'm not sure what the relevant difference between those two is.
So at this point, I don't know what fix to recommend, because I don't know whether using the different certificates at different times is intentional or not. If it is intentional, then the registry entries on the slaves do need to be updated (just not exactly how I thought before). But if the intent is for the same cert to always be used, then that's what needs to be fixed.

[1] Which I requested in bug 1241530, just for reference.
[2] https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=5644818538de
[3] https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=e7cfbfa5847c
41 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-central: 21
* mozilla-aurora: 19
* mozilla-inbound: 1

Platform breakdown:
* windowsxp: 14
* windows7-32: 14
* windows8-64: 13

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232078&startday=2016-01-18&endday=2016-01-24&tree=all
(In reply to Matt Howell [:mhowell] from comment #16)
> This isn't affecting all builds; here [2] is a recent push that failed
> because the build is signed with the DigiCert CA, but here [3] is one that
> passed because that build is signed with our fake CA. I'm not sure what the
> relevant difference between those two is.

This is very likely to be that [2] is only a dep build, while [3] has dep and nightly builds (B and N on treeherder). Tests get run on Nightly builds too, so they care that we've used a real cert for end users. So we need to update the registry with the new information.
(Assignee)

Comment 19

a year ago
(In reply to Nick Thomas [:nthomas] from comment #18)
> This is very likely to be that [2] is only a dep build, while [3] has dep
> and nightly builds (B and N on treeherder). Tests get run on Nightly builds
> too, so they care that we've used a real cert for end users. So we need to
> update the registry with the new information.

Okay, cool. Who would be the person to contact to make those updates happen?
Flags: needinfo?(nthomas)
Passing the ni to bhearsum, who set this up initially.
Flags: needinfo?(nthomas) → needinfo?(bhearsum)
(In reply to Matt Howell [:mhowell] from comment #19)
> (In reply to Nick Thomas [:nthomas] from comment #18)
> > This is very likely to be that [2] is only a dep build, while [3] has dep
> > and nightly builds (B and N on treeherder). Tests get run on Nightly builds
> > too, so they care that we've used a real cert for end users. So we need to
> > update the registry with the new information.
> 
> Okay, cool. Who would be the person to contact to make those updates happen?

I think RelOps typically takes care of that at this point. Looking back at https://bug704578.bmoattachments.org/attachment.cgi?id=577617, it looks like we need to insert something like this:
[HKEY_LOCAL_MACHINE\SOFTWARE\Mozilla\MaintenanceService\3932ecacee736d366d6436db0f55bce4]

[HKEY_LOCAL_MACHINE\SOFTWARE\Mozilla\MaintenanceService\3932ecacee736d366d6436db0f55bce4\2]
"name"="Mozilla Corporation"
"issuer"="DigiCert SHA2 Assured ID Code Signing CA"
"programName"=""
"publisherLink"=""

If you can confirm that, I'll get the right stuff on file to make the change happen.
Flags: needinfo?(bhearsum) → needinfo?(mhowell)
(Assignee)

Comment 22

a year ago
(In reply to Ben Hearsum (:bhearsum) from comment #21)
> (In reply to Matt Howell [:mhowell] from comment #19)
> > (In reply to Nick Thomas [:nthomas] from comment #18)
> > > This is very likely to be that [2] is only a dep build, while [3] has dep
> > > and nightly builds (B and N on treeherder). Tests get run on Nightly builds
> > > too, so they care that we've used a real cert for end users. So we need to
> > > update the registry with the new information.
> > 
> > Okay, cool. Who would be the person to contact to make those updates happen?
> 
> I think RelOps typically takes care of that at this point. Looking back at
> https://bug704578.bmoattachments.org/attachment.cgi?id=577617, it looks like
> we need to insert something like this:
> [HKEY_LOCAL_MACHINE\SOFTWARE\Mozilla\MaintenanceService\3932ecacee736d366d643
> 6db0f55bce4]
> 
> [HKEY_LOCAL_MACHINE\SOFTWARE\Mozilla\MaintenanceService\3932ecacee736d366d643
> 6db0f55bce4\2]
> "name"="Mozilla Corporation"
> "issuer"="DigiCert SHA2 Assured ID Code Signing CA"
> "programName"=""
> "publisherLink"=""
> 
> If you can confirm that, I'll get the right stuff on file to make the change
> happen.

Yes, that looks correct, adding that should take care of it. Thanks.
Flags: needinfo?(mhowell)
Depends on: 1243803
44 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-aurora: 23
* mozilla-central: 21

Platform breakdown:
* windowsxp: 15
* windows8-64: 15
* windows7-32: 14

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232078&startday=2016-01-25&endday=2016-01-31&tree=all
40 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-aurora: 22
* mozilla-central: 18

Platform breakdown:
* windows8-64: 14
* windows7-32: 13
* windowsxp: 12
* linux64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232078&startday=2016-02-01&endday=2016-02-07&tree=all
I think this is fixed now that bug 1243803 is. I had a look at one nightly test log and found:
05:01:41     INFO -  TEST-PASS | toolkit/mozapps/update/tests/unit_service_updater/checkUpdaterSigSvc.js | took 1797ms

(from http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-central-win32/1455015657/mozilla-central_win7-ix_test-xpcshell-bm126-tests1-windows-build1.txt.gz)

Can someone confirm?
(Reporter)

Comment 26

a year ago
Yep, is fixed, thanks!
Status: NEW → RESOLVED
Last Resolved: a year ago
Resolution: --- → FIXED
6 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-central: 3
* mozilla-aurora: 3

Platform breakdown:
* windowsxp: 2
* windows8-64: 2
* windows7-32: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232078&startday=2016-02-08&endday=2016-02-14&tree=all
You need to log in before you can comment on or make changes to this bug.