Closed Bug 1577212 Opened 2 years ago Closed 3 months ago

Create a Linux build job that compares two builds to detect reproducible regressions

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: tjr, Assigned: glandium)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 1 obsolete file)

We recently made Linux builds reproducible (https://glandium.org/blog/?p=3923) - it would be neat to detect regressions to this by comparing them. We could do a fuzzy comparison using a script that ignores expected, innocuous differences like signatures and timestamps.

Duplicate the linux64-shippable/opt task, and edit taskcluster/ci/diffoscope/kind.yml to add a comparison job.

The patch is trivial. The problem is it's not landable: reproducibility is already broken.

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:glandium, could you have a look please?
For more information, please visit auto_nag documentation.

Flags: needinfo?(mh+mozilla)
Depends on: 1596283
Depends on: 1596341
Depends on: 1596350
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/358e2c496639
Add diffoscope jobs ensuring reproducibility of the linux32 shippable builds. r=mshal
Flags: needinfo?(mh+mozilla)
Keywords: leave-open

why is this on linux32 and not linux64?

Flags: needinfo?(mh+mozilla)

See the dependencies on this bug.

Flags: needinfo?(mh+mozilla)

The leave-open keyword is there and there is no activity for 6 months.
:rstewart, maybe it's time to close this bug?

Flags: needinfo?(rstewart)

Is there progress being made on the dependencies? Wondering if we should close as WONTFIX or just leave this open indefinitely.

Flags: needinfo?(rstewart)

Yes, we are still pursuing this.

The leave-open keyword is there and there is no activity for 6 months.
:mhentges, maybe it's time to close this bug?

Flags: needinfo?(mhentges)

Based on the previous response by :tjr, this is still valuable.

Flags: needinfo?(mhentges)

Hm, I thought we had this already for ages. Why the leave-open? Is that to cover the fact that we only do 32-bit and still want 64-bit?

yes

It looks like the blocker for 64-bit builds is wasm sandboxing (at least with clang 12, which has been really good wrt intermittents on Linux32). The ​lucet_trap_table symbols appear in nondeterministic order: https://firefoxci.taskcluster-artifacts.net/NnHtHnWFT4e3XWBD9pMgqQ/0/public/diff.html

Can I interest you in digging into that? Or maybe as baby steps we can zap those libraries with pre-diff-commands?

Flags: needinfo?(mh+mozilla)

Tagging the rlbox folks in case they think the lucet thing is easy or could give a pointer.

Flags: needinfo?(shravanrn)
Flags: needinfo?(deian)

(In reply to Tom Ritter [:tjr] (ni? for response to sec-[advisories/bounties/ratings/cves]) from comment #18)

Tagging the rlbox folks in case they think the lucet thing is easy or could give a pointer.

I supposed https://bugzilla.mozilla.org/show_bug.cgi?id=1612035 is the one for this particular issue, no?

(In reply to :dmajor from comment #17)

It looks like the blocker for 64-bit builds is wasm sandboxing (at least with clang 12, which has been really good wrt intermittents on Linux32). The ​lucet_trap_table symbols appear in nondeterministic order: https://firefoxci.taskcluster-artifacts.net/NnHtHnWFT4e3XWBD9pMgqQ/0/public/diff.html

Can I interest you in digging into that? Or maybe as baby steps we can zap those libraries with pre-diff-commands?

The blocker(s) are in the bug dependencies and are not related to wasm sandboxing (although that's also another problem). They also were 64-bits only. They may or may not have been fixed in clang 11 or 12, but that would be something to test.

Flags: needinfo?(mh+mozilla)

(In reply to Mike Hommey [:glandium] from comment #20)

The blocker(s) are in the bug dependencies and are not related to wasm sandboxing (although that's also another problem). They also were 64-bits only. They may or may not have been fixed in clang 11 or 12, but that would be something to test.

Bug 1596350 comment 2 says it's 32-bit only.
I don't see an explicit mention of bitness in bug 1596283, but it's not reproducing with clang 11 for me.

Let's start running tasks for 64-bit builds, minus lib*wasm.so. Spinoff bug incoming.

Depends on: 1686507
Depends on: 1686510

Re lucet reproducible builds --- unfortunately, I don't have too much info in this. I know that the C->Wasm compilation via clang is reproducible, so the problem is definitely inside lucet. Fwiw, I believe the codegen of lucet is first party code, but the object file output is based on third part rust components. I think the only info I have is that the first step is to probably figure out if this is due to first party Lucet code vs 3rd party object file outputs.

Flags: needinfo?(shravanrn)
Flags: needinfo?(deian)

The main (only?) difference I see in this diff is that the lucet_trap_table_guest_func_... are not in consistent order. Those symbols appear to be written out by write_trap_tables but it's not clear to me which is the origin of the problem: is write_trap_tables at fault for not doing some sorting right there and then, or should some earlier piece of the pipeline be on the hook for feeding in the data in stable order? This is getting outside my familiarity, Shravan any chance you might have any ideas here?

Flags: needinfo?(shravanrn)

We will be addressing this soon with https://bugzilla.mozilla.org/show_bug.cgi?id=1720828

Flags: needinfo?(shravanrn)

We can actually close this bug, because both the linux and linux64 jobs exist and are tier 2 right now.

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Assignee: nobody → mh+mozilla
Attachment #9088937 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.