Open Bug 1973805 Opened 2 days ago Updated 2 days ago

URLPattern latent x86 shippable gtest crash

Categories

(Core :: Networking, defect, P2)

defect

Tracking

()

People

(Reporter: edgul, Unassigned)

Details

(Whiteboard: [necko-triaged])

During development of Bug 1948330 (see also: bug 1731418 and bug 1948295) one of the WIP patches introduced a gtest which directly called urlp_get_protocol_component() rust-defined function from c++ bindings (cbindgen-generated). This invocation would crash when we attempt to clone the q_pattern.protocol in the following (rust) code on x86 (32 bit) PGO/LTO builds:

#[no_mangle]
pub unsafe extern "C" fn urlp_get_protocol_component(
    pattern: UrlpPattern,
    res: *mut UrlpComponent,
) {
    let q_pattern = &*(pattern.0 as *const Uq::UrlPattern);
    let tmp: UrlpComponent = q_pattern.protocol.clone().into();
    *res = tmp;
}

This crash magically goes away when we replace the direct call to urlp_get_protocol_component() to use our c++ defined convenience getter UrlpGetProtocol, which uses the same rust function in the implementation. (scenario 1)

It is also suspected that the crash disappears by changing the c++ getter UrlpGetProtocol to pass UrlpPattern by-value instead of by-reference, but we are having trouble reliably reproducing the crash. (scenario 2). Note that in this scenario we have continued to use the rust-defined getting directly in the test.

So the current working hypothesis is that this is a compiler bug in either/or PGO or LTO 32 bit builds.


First seen on mozilla-central: https://treeherder.mozilla.org/jobs?repo=mozilla-central&duplicate_jobs=visible&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception%2Csuperseded%2Csuccess%2Cretry%2Cusercancel%2Crunning%2Cpending%2Crunnable&revision=bf77e4f0b323cd23ac5d935b5eac576f7986af23&selectedTaskRun=LTfqyWnLTL2q9ajvMcUgog.0

But also seen on try in followup builds (with Linux 24.04 x86 Shippable gtest-1proc manually added to the suite): https://treeherder.mozilla.org/jobs?repo=try&revision=4062e451a048b7d558c07df537e1b994b4e4531e&selectedTaskRun=J05_NjMxSHSNrhUogWKFdg.0


I've also created some additional test builds (currently still running) to try to narrow down PGO or LTO:

Severity: -- → S4
Priority: -- → P2
Whiteboard: [necko-triaged]
You need to log in before you can comment on or make changes to this bug.