Closed Bug 1789399 Opened 2 years ago Closed 1 year ago

Making CheckedUnsafePtr be much easier debugging

Categories

(Core :: DOM: Service Workers, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
112 Branch
Tracking Status
firefox112 --- fixed

People

(Reporter: edenchuang, Assigned: edenchuang)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

CheckedUnsafePtr is used to detect the UAF of raw pointers.
Once a potential UAF is detected by CheckedUnsafePtr, MOZ_CRASH would be called with a message "Found dangling CheckedUnsafePtr." However, the message is not enough to help to debug the dangling issues.

To figure out the dangling pointer root cause, we would like to enhance CheckedUnsafePtr to give more debugging information.

At least, we want to know which objects hold the CheckedUnsafePtr and probably where they were assigned.

Assignee: nobody → echuang
Severity: -- → S3
Priority: -- → P2
Blocks: 1752377

It's a good idea to check which object is still being held, because it will help weed out crashes coming from users with bad hardware. Currently the check only tests the pointer against NULL, so a single bit-flip in the pointer will make it appear as if it were dangling.

Blocks: 1791978
Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/e00c185cf4ea
Print out the creation stack and the last assignment stack of CheckedUnsafePtr when it is unsafe. r=dom-worker-reviewers,janv,jstutte

Backed out changeset e00c185cf4ea (Bug 1789399) for causing telemetry failures on test_dynamic_probes.py.
Backout link
Push with failures <--> c <--> X2 <--> dt3 <--> Mn-swr
Failure Log c
Also Failure Log X2
Also Failure Log dt3
Also Failure Log Mn-swr
Also Failure Log bc2
Also Failure Log bc4
Also Failure Log dt12
...

Flags: needinfo?(echuang)
Regressions: 1804451

Hmm, looking at the performance regression (which would only apply to Nightly and early Beta, but still) I assume we might better take the path of having some static strings we can recognize rather than doing the stack generation.

(In reply to Marian-Vasile Laza from comment #4)

Backed out changeset e00c185cf4ea (Bug 1789399) for causing telemetry failures on test_dynamic_probes.py.
Backout link
Push with failures <--> c <--> X2 <--> dt3 <--> Mn-swr
Failure Log c
Also Failure Log X2
Also Failure Log dt3
Also Failure Log Mn-swr
Also Failure Log bc2
Also Failure Log bc4
Also Failure Log dt12
...

== Change summary for alert #36312 (as of Mon, 05 Dec 2022 14:26:51 GMT) ==

Regressions:

Ratio Test Platform Options Absolute values (old vs new)
194% wikia loadtime linux1804-64-shippable-qr cold fission webrender 903.96 -> 2,660.54
10% wikia LastVisualChange linux1804-64-shippable-qr cold fission webrender 3,354.07 -> 3,686.67

Improvements:

Ratio Test Platform Options Absolute values (old vs new)
18% fandom loadtime linux1804-64-shippable-qr bytecode-cached fission warm webrender 227.46 -> 186.67
10% fandom loadtime linux1804-64-shippable-qr fission warm webrender 334.04 -> 301.14
8% wikia fcp linux1804-64-shippable-qr fission warm webrender 103.35 -> 94.75
7% fandom PerceptualSpeedIndex linux1804-64-shippable-qr fission warm webrender 253.17 -> 234.50
7% fandom SpeedIndex linux1804-64-shippable-qr fission warm webrender 249.25 -> 230.92
... ... ... ... ...
4% wikia LastVisualChange linux1804-64-shippable-qr fission warm webrender 1,566.67 -> 1,500.00

For up to date results, see: https://treeherder.mozilla.org/perfherder/alerts?id=36312

Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/38fcdae79f60
Print out the creation stack and the last assignment stack of CheckedUnsafePtr when it is unsafe. r=dom-worker-reviewers,janv,jstutte,smaug
Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/12293be9ad25
Print out the creation stack and the last assignment stack of CheckedUnsafePtr when it is unsafe. r=dom-worker-reviewers,janv,jstutte,smaug

Backed out for causing multiple failures

Backout link

Push with failures - Assertions -> Failure log

Push with failures - gl1e -> Failure log

There's also an xpcshell failure that might have a failure line that is important. Failure log here.

Could these reftest failures also be from here? Failure log 1 // Failure log 2

GPU test failure on Android 7 is present too. Failure log

Wpt failures -> Failure log

There's a r+ patch which didn't land and no activity in this bug for 2 weeks.
:edenchuang, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.

Flags: needinfo?(jstutte)
Flags: needinfo?(echuang)
Flags: needinfo?(echuang)
Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5dc1b7f6d1c1
Print out the creation stack and the last assignment stack of CheckedUnsafePtr when it is unsafe. r=dom-worker-reviewers,janv,jstutte,smaug

Backed out changeset 5dc1b7f6d1c1 (Bug 1789399) for bustages on CheckedUnsafePtr.h and multiple other failures.
Backout link
Push with failures <--> Bbc
Failure Log
Also dt9
Also dt10
Also dt1
Also c
Also dt6
Also X2
Also X2
Also X2
Also dt7
Also X1
Also dt7
Also wpt31
Also wpt13
Also wpt3
Also c3
Also wpt14
Also bc1
Also bc6
Also bc3
... + probably other.

Next time, please do a Try push before pushing to autoland, thanks.

Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8e36c4a3163b
Print out the creation stack and the last assignment stack of CheckedUnsafePtr when it is unsafe. r=dom-worker-reviewers,janv,smaug
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 112 Branch
Flags: needinfo?(jstutte)
Flags: needinfo?(echuang)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: