Create a Linux build job that compares two builds to detect reproducible regressions
Categories
(Firefox Build System :: General, enhancement)
Tracking
(Not tracked)
People
(Reporter: tjr, Assigned: glandium)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 1 obsolete file)
We recently made Linux builds reproducible (https://glandium.org/blog/?p=3923) - it would be neat to detect regressions to this by comparing them. We could do a fuzzy comparison using a script that ignores expected, innocuous differences like signatures and timestamps.
Assignee | ||
Comment 1•6 years ago
|
||
Duplicate the linux64-shippable/opt task, and edit taskcluster/ci/diffoscope/kind.yml to add a comparison job.
Assignee | ||
Comment 2•6 years ago
|
||
Assignee | ||
Comment 3•6 years ago
|
||
The patch is trivial. The problem is it's not landable: reproducibility is already broken.
Comment 4•5 years ago
|
||
There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:glandium, could you have a look please?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 5•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Comment 7•5 years ago
|
||
bugherder |
Comment 10•5 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:rstewart, maybe it's time to close this bug?
Comment 11•5 years ago
|
||
Is there progress being made on the dependencies? Wondering if we should close as WONTFIX or just leave this open indefinitely.
Reporter | ||
Comment 12•5 years ago
|
||
Yes, we are still pursuing this.
Comment 13•4 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:mhentges, maybe it's time to close this bug?
Comment 14•4 years ago
|
||
Based on the previous response by :tjr, this is still valuable.
![]() |
||
Comment 15•4 years ago
|
||
Hm, I thought we had this already for ages. Why the leave-open? Is that to cover the fact that we only do 32-bit and still want 64-bit?
Assignee | ||
Comment 16•4 years ago
|
||
yes
![]() |
||
Comment 17•4 years ago
|
||
It looks like the blocker for 64-bit builds is wasm sandboxing (at least with clang 12, which has been really good wrt intermittents on Linux32). The lucet_trap_table
symbols appear in nondeterministic order: https://firefoxci.taskcluster-artifacts.net/NnHtHnWFT4e3XWBD9pMgqQ/0/public/diff.html
Can I interest you in digging into that? Or maybe as baby steps we can zap those libraries with pre-diff-commands
?
Reporter | ||
Comment 18•4 years ago
|
||
Tagging the rlbox folks in case they think the lucet thing is easy or could give a pointer.
Comment 19•4 years ago
|
||
(In reply to Tom Ritter [:tjr] (ni? for response to sec-[advisories/bounties/ratings/cves]) from comment #18)
Tagging the rlbox folks in case they think the lucet thing is easy or could give a pointer.
I supposed https://bugzilla.mozilla.org/show_bug.cgi?id=1612035 is the one for this particular issue, no?
Assignee | ||
Comment 20•4 years ago
|
||
(In reply to :dmajor from comment #17)
It looks like the blocker for 64-bit builds is wasm sandboxing (at least with clang 12, which has been really good wrt intermittents on Linux32). The
lucet_trap_table
symbols appear in nondeterministic order: https://firefoxci.taskcluster-artifacts.net/NnHtHnWFT4e3XWBD9pMgqQ/0/public/diff.htmlCan I interest you in digging into that? Or maybe as baby steps we can zap those libraries with
pre-diff-commands
?
The blocker(s) are in the bug dependencies and are not related to wasm sandboxing (although that's also another problem). They also were 64-bits only. They may or may not have been fixed in clang 11 or 12, but that would be something to test.
![]() |
||
Comment 21•4 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #20)
The blocker(s) are in the bug dependencies and are not related to wasm sandboxing (although that's also another problem). They also were 64-bits only. They may or may not have been fixed in clang 11 or 12, but that would be something to test.
Bug 1596350 comment 2 says it's 32-bit only.
I don't see an explicit mention of bitness in bug 1596283, but it's not reproducing with clang 11 for me.
Let's start running tasks for 64-bit builds, minus lib*wasm.so
. Spinoff bug incoming.
Comment 22•4 years ago
|
||
Re lucet reproducible builds --- unfortunately, I don't have too much info in this. I know that the C->Wasm compilation via clang is reproducible, so the problem is definitely inside lucet. Fwiw, I believe the codegen of lucet is first party code, but the object file output is based on third part rust components. I think the only info I have is that the first step is to probably figure out if this is due to first party Lucet code vs 3rd party object file outputs.
Updated•4 years ago
|
![]() |
||
Comment 23•4 years ago
|
||
The main (only?) difference I see in this diff is that the lucet_trap_table_guest_func_...
are not in consistent order. Those symbols appear to be written out by write_trap_tables
but it's not clear to me which is the origin of the problem: is write_trap_tables
at fault for not doing some sorting right there and then, or should some earlier piece of the pipeline be on the hook for feeding in the data in stable order? This is getting outside my familiarity, Shravan any chance you might have any ideas here?
Comment 24•4 years ago
|
||
We will be addressing this soon with https://bugzilla.mozilla.org/show_bug.cgi?id=1720828
Assignee | ||
Comment 25•4 years ago
|
||
We can actually close this bug, because both the linux and linux64 jobs exist and are tier 2 right now.
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Description
•