milestone_winversion should be unique
Categories
(Firefox Build System :: General, defect, P3)
Tracking
(firefox128 fixed)
Tracking | Status | |
---|---|---|
firefox128 | --- | fixed |
People
(Reporter: yannis, Assigned: ahochheiden)
References
(Regressed 1 open bug)
Details
Attachments
(1 file)
In bug 1843977, we had a massive crash spike caused by a third-party antivirus. The antivirus was wrongly applying instrumentation to our 115 ESR xul.dll
as if it were our 115 Release xul.dll
. It was doing this because the two binaries that we shipped embedded the exact same version number - which the antivirus assumed would be a unique way to identify our 115 Release xul.dll
. While we could argue that the antivirus should not have made this assumption, it would still be better if we did ship binaries with a unique version number to avoid this kind of trouble in the future.
As figured out by [:RyanVM], our current code for generating version information for our binaries is as follows, and the logic for that was added in bug 386740:
milestone_winversion = ",".join(
split_and_normalize_version(milestone, 3)
+ [str(days_from_2000_to_buildid(buildid))]
)
Because we shipped the same 115 and 115 ESR minor versions on the same day with the same milestone
number, the corresponding binaries ended up with the exact same value for milestone_winversion
there.
As discussed with [:RyanVM] and [:rkraesig], we should at least guarantee that building with the same milestone
but for a different channel leads to a different version number in the end - independently of the day and hour of build. We probably also want more granularity than days. We should also check if we have the same issue on other platforms.
Comment 1•5 months ago
|
||
The severity field is not set for this bug.
:ahochheiden, could you have a look please?
For more information, please visit BugBot documentation.
Assignee | ||
Comment 2•5 months ago
•
|
||
(In reply to Yannis Juglaret [:yannis] from comment #0)
but for a different channel leads to a different version number in the end - independently of the day and hour of build. We probably also want more granularity than days. We should also check if we have the same issue on other platforms.
So you just want days_from_2000_to_buildid(buildid): to be seconds instead?
eg:
def seconds_from_2000_to_buildid(buildid):
start = datetime(2000, 1, 1, 0, 0, 0)
buildid_time = datetime.strptime(buildid, "%Y%m%d%H%M%S")
time_delta = (buildid_time - start)
return time_delta.total_seconds()
Assignee | ||
Comment 3•5 months ago
•
|
||
I dug into this with Glandium and we figured out why it's using 'days since 2000' in the first place (and why seconds won't work nicely, but can sort of still work).
https://learn.microsoft.com/en-us/windows/win32/menurc/versioninfo-resource
The 'version' that we care about is FILEVERSION
:
FILEVERSION version Binary version number for the file. The version consists of two 32-bit integers, defined by four 16-bit integers. For example, > "FILEVERSION 3,10,0,61" is translated into two doublewords: 0x0003000a and 0x0000003d, in that order. Therefore, if version is defined by the > DWORD values dw1 and dw2, they need to appear in the FILEVERSION statement as follows: HIWORD(dw1), LOWORD(dw1), HIWORD(dw2), > LOWORD(dw2).
Basically we have 4 "words" and currently we use the first 3 "words" for major
minor
patch
semantic versioning, then the 4th word for the days since 2000
part of the FILEVERSION
. Since they're "words", they're only 16 bits, which can have a maximum value of 65535 (which is why we're using days, since that's currently ~8700). Seconds will be too large to fit in that range (this would have been a helpful comment).
One way we can 'solve' this is by encoding the major
minor
and patch
versions into only the first two words, and using the last two words for the seconds
(that'll give us until the year ~2136 before we overflow). It'll kinda be ugly to read though, but it will work.
Example:
Current FILEVERSION
123.2.13.8780
(arbitrary example)
would break down as follows:
1st word: major
(123)
2nd word: minor * 100 + patch
: 2 * 100 + 13
(213)
seconds_since_2000
: 758660440
3rd word (Upper 16 bits of seconds_since_2000
): 11576
4th word (Lower 16 bits of seconds_since_2000
): 15704
New FILEVERSION
: 123.213.11576.15704
One caveat here being that this still doesn't fully resolve the issue, since if different channels build at the same second, the original error will still occur (but it's extremely unlikely).
We could use the same scheme to encode a number representing the channel in the decimal one's place of the 2nd "word" and and shift the other two by 10 each (eg: minor * 1000 + patch * 10 + encoded_channel_digit
: 2 * 1000 + 13 * 10 + 1
). Yielding 123.2131.11576.15704
.
Note: There are limitations for both approaches. The second (also encoding a channel digit
) is more restrictive and assumes the following:
- There will not be more than 10 channels (0-9)
- There will not be more than 100
patch
versions (0-99) - There will not be more than 60
minor
versions (0-59)
If you omit the channel digit
, the patch
and minor
restrictions could be:
- There will not be more than 1000
patch
versions (0-999) - There will not be more than 65
minor
versions (0-64)
That's assuming you want to keep the minor
and patch
numbers somewhat human readable. We could also use 2/16 bits for the channel encoding ((0-3 range) 4 values for 4 channels), and 7 bits each (0-127 range) for patch
and minor
, but that would in turn would just look like some nonsense. To be fair, the days_since_2000
as the 4th word currently can also be classified as nonsense.
We could also include the 1st "word" in the encoding scheme, but then we open up the possibility of collisions with older versions. I'm assuming here that the antivirus keeps a list of all past versions, so if we allow the 1st "word" to potentially be a number we've already used (eg: 120) it's possible to collide with a version sequence used previously (though, very unlikely, since by my estimate we've had in the range of 10s of thousands of releases, so let's say upper bound 100,000. 100000/2^64 is pretty low odds).
Reporter | ||
Comment 4•5 months ago
•
|
||
(In reply to Alex Hochheiden [:ahochheiden] (needinfo? me) from comment #2)
So you just want days_from_2000_to_buildid(buildid): to be seconds instead?
The current problem we identified with using days is that we ship two nightlies per day, so even if we split over release channels, using days would still lead to duplicate version IDs. I don't think we necessarily need seconds granularity though. The granularity should be decided by release managers.
Comment 5•5 months ago
|
||
I honestly don't have a super strong opinion on what to do here. This isn't a version we use anywhere in our tooling. I am a bit worried about fallout/unintended consequences from the proposal in comment 3 to change the second and third digit representations as well, but only because I don't know what might depend on that in the wild.
Comment 6•5 months ago
|
||
Possibly-more-straightforward suggestion for the encoding: since a separate timestamp is only useful on the Nightly channel(?), carve out a niche for the channel-identifier by means of addition rather than multiplication.
0
: release.1
: beta.2
: esr.3+n
: nightly, with some arbitrary countern
.
This is mostly compatible with our existing versioning scheme, puts no more constraints on major/minor/patch version-numbers than already exist, and has an obvious course of action if we want to reinstitute the Aurora channel (i.e., make aurora 3
(or whatever), and nightly 4+n
).
I have no opinion on whether the counter granularity should be increased to ensure uniqueness of Nightly builds' FILEVERSION
s — but if it is, and if it remains clock-based, I'd suggest counting from the time of the last major-version-number bump rather than from a fixed date.
Comment 7•5 months ago
|
||
Encoding hours would allow up to 7 years in 16-bits, which is plenty enough if we go with last major-version-number bump as a reference point. The downside, obviously, is that we store that information nowhere in the tree currently.
carve out a niche for the channel-identifier by means of addition rather than multiplication.
Addition to which number? If it's any of the "normal" numbers in the version, the result wouldn't allow to distinguish between that version and an older version on another channel.
Comment 8•5 months ago
|
||
We could also encode hours since beginning of the milestone year, which requires no additional information, would still allow to distinguish two nightlies from the same day, and leaves us 2 bits where we could encode the channel.
Comment 9•5 months ago
|
||
(In reply to Mike Hommey [:glandium] from comment #7)
Addition to which number? If it's any of the "normal" numbers in the version, the result wouldn't allow to distinguish between that version and an older version on another channel.
Not quite. I'm not suggesting "add 1 to the value to represent "beta"', I'm suggesting "add 3 to the value to carve out some unused values solely for the channel-identifier".
(So if you used it for any of the first three numbers, you wouldn't be able to distinguish between that version and any other version on the same channel.)
Assignee | ||
Comment 10•4 months ago
•
|
||
I do like the hours since current milestone year idea (we could even make it a more granular chunk: This wouldn't work when going back to much older revisions and building, disregard).value
= seconds since start of milestone year
/ 2^bits available
One thought, doesn't a beta channel release candidate become the full release? Should we even encode the channel at all with that in mind?
Comment 11•4 months ago
|
||
RC builds get shipped to the Beta channel, yes. But they're built off mozilla-release if that matters.
Assignee | ||
Comment 12•4 months ago
|
||
The previous implementation used days since Jan 1 2000 for the last
16-bit segment. This was not unique enough and caused issues with
Antivirus software if two different channels were built on the same day.
The new approach uses hours since the last milestone bump and uses the
VCS to determine how long ago that was relative to the build time. This
means it will always reset when a new cycle begins, but still be unique
since the digits in the first 3 segments have incremented.
We also now use two of the 16-bits to encode the channel (nightly, beta,
ESR, and release). So two channels built within the same hour will still
be unique.
Using only 14-bits to store the 'hours since version bump', we have
about ~682 days from a version bump before we reach the maximum value we
can store. If a build is done after that point, the segment value will
always be the maximum value for that channel.
Updated•4 months ago
|
Comment 13•4 months ago
|
||
Pushed by ahochheiden@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/344524bd7c75 Change how the last segment of `milestone_winversion` is generated to improve uniqueness r=firefox-build-system-reviewers,RyanVM,glandium
Comment 14•4 months ago
|
||
bugherder |
Comment 15•4 months ago
|
||
Backed out for causing bug 1879519
https://hg.mozilla.org/integration/autoland/rev/530b7d00997c09c7335cd57c741664680cf13b46
Comment 16•4 months ago
|
||
Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/530b7d00997c09c7335cd57c741664680cf13b46
Comment 17•3 months ago
|
||
A Try push as a central-as-beta simulation with this patch applied succeeds - unknown why it passes now.
Alex, feel encouraged to reland as-is.
Comment 18•3 months ago
|
||
The problem still exists. It comes from the timestamp of milestone.txt ending up being "newer" than the buildid, making the delta between them negative. And the reason this happens is that the timezone is not taken into account. In the build from bug 1879519, the full timestamp from milestone.txt is 2024-02-09 12:51:24 +0200
. The buildid is 20240209105204, which in a more readable form is 2024-02-09 10:52:04
. The buildid itself is UTC, so translated in the same timezone as the milestone.txt timestamp, it's actually 2024-02-09 12:52:04 +0200
, which is properly greater.
I'd say the timezone should be accounted for, but also, we should make sure the value can't be negative.
Assignee | ||
Comment 19•3 months ago
•
|
||
The timezone should be accounted for in the current implementation. The value can also no longer be negative (It will raise an exception in CI if it is, and will set to MAX_VALUE outside of CI).
Comment 20•2 months ago
|
||
There is an r+ patch which didn't land and no activity in this bug for 2 weeks.
:ahochheiden, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit BugBot documentation.
Updated•2 months ago
|
Comment 21•1 month ago
|
||
Just a friendly reminder that 128 will be the next ESR release and we're ~3 weeks from the start of that Nightly cycle.
Comment 22•1 month ago
|
||
Pushed by ahochheiden@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b685d0db4248 Change how the last segment of `milestone_winversion` is generated to improve uniqueness r=firefox-build-system-reviewers,glandium
Comment 23•1 month ago
|
||
Backed out for causing py3 failures in test_commit.py
- Backout link
- Push with failures
- Failure Log
- Failure line: python/mozversioncontrol/test/test_commit.py::test_commit[git] TEST-UNEXPECTED-FAIL
Comment 24•19 days ago
|
||
Pushed by ahochheiden@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9cc387be4b6c Change how the last segment of `milestone_winversion` is generated to improve uniqueness r=firefox-build-system-reviewers,glandium
Comment 25•18 days ago
|
||
bugherder |
Updated•18 days ago
|
Description
•