Crash in [@ mozilla::net::UrlClassifierFeatureEmailTrackingDataCollection::ProcessChannel]
Categories
(Core :: Privacy: Anti-Tracking, defect, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox-esr102 | --- | unaffected |
firefox103 | --- | unaffected |
firefox104 | blocking | verified |
firefox105 | blocking | verified |
People
(Reporter: mccr8, Assigned: mccr8)
References
(Depends on 1 open bug, Regression)
Details
(Keywords: crash, regression)
Crash Data
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
diannaS
:
approval-mozilla-release+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/8e8416b7-b5fe-423c-ba99-a2a750220815
Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Top 10 frames of crashing thread:
0 XUL mozilla::net::UrlClassifierFeatureEmailTrackingDataCollection::ProcessChannel netwerk/url-classifier/UrlClassifierFeatureEmailTrackingDataCollection.cpp:201
1 XUL mozilla::detail::RunnableFunction<mozilla::net::AsyncUrlChannelClassifier::CheckChannel xpcom/threads/nsThreadUtils.h:531
2 XUL mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:851
3 XUL NS_ProcessPendingEvents xpcom/threads/nsThreadUtils.cpp:430
4 XUL nsAppShell::ProcessGeckoEvents widget/cocoa/nsAppShell.mm:509
5 CoreFoundation __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__
6 CoreFoundation __CFRunLoopDoSource0
7 CoreFoundation __CFRunLoopDoSources0
8 CoreFoundation __CFRunLoopRun
9 CoreFoundation CFRunLoopRunSpecific
I hit this crash when I was re-loading a Reddit page shortly after closing it. Maybe there's some missing null check.
Assignee | ||
Comment 1•2 years ago
•
|
||
[Tracking Requested - why for this release]: The volume looks reasonably high, and this should be straightforward to fix. I'm not sure why we didn't have a bug on file for this already. It looks like a regression in 104, which is when some of this email tracking telemetry work seems to have landed, but I'm not sure exactly which bug.
Assignee | ||
Comment 2•2 years ago
|
||
Looks like it was regressed by bug 1773701, though there was some followup work in bug 1775627, so maybe the patch will need a bit of rebasing on 104. Tim, could you take a look? Thanks.
Comment 3•2 years ago
|
||
I can reliably reproduce with this STR
- go to https://www.theverge.com/
- Ctrl/Cmd/middle click on their (current) top story "Apple is allegedly threatening to fire an employee over a viral TikTok video"
- switch to the tab that got opened
Assignee | ||
Comment 4•2 years ago
|
||
[@ mozilla::dom::CanonicalBrowsingContext::GetTopWindowContext ] looks like a very similar crash, except that it is happening during the calculation of topWindowParent. I don't know if it is the same issue or a similar one, but it also looks like a missing null check. This signature is actually the top crashes on beta right now, so it is a bit odd that there's no bug on file for it. I'll bundle it into this bug for now.
Assignee | ||
Comment 5•2 years ago
|
||
Combined these appear to be around 75% of all beta crashes in the last week...
Assignee | ||
Comment 6•2 years ago
|
||
It looks like it massively spiked in the 20220811191329 build, but I'm not sure why. No changes landed in that beta for the UrlClassifierFeatureEmailTrackingDataCollection.cpp file as far as I can see.
Assignee | ||
Comment 7•2 years ago
|
||
Another set of steps I was able to reproduce this crash with, based on a comment on a crash report:
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Comment 8•2 years ago
|
||
For mozilla::net::UrlClassifierFeatureEmailTrackingDataCollection::ProcessChannel
specifically, I see a lot of crashes in the last day or so, but some from older builds like build id 20220722214625 (https://crash-stats.mozilla.org/report/index/57d9709d-1fd8-4235-855f-c36190220815 for example). Can this be caused by a blocklist update or something like that?
(If not, I apologize for the noise. :))
Assignee | ||
Comment 9•2 years ago
|
||
Doing a bit of investigation, both the Verge case and the Yakima Herald case seem to be hitting a case where the browsing context is discarded. This function seems to be running off a runnable so I guess a navigation can happen while the runnable is waiting or something. I'll add checks for that and also similar checks for topWindowParent, just to be safe, as a number of callers also seem to do that.
Assignee | ||
Comment 10•2 years ago
|
||
Assignee | ||
Comment 11•2 years ago
|
||
The diff rebased for beta is here. It is basically identical, as the changed code is after the code I change, but I don't know how finicky hg is about rebasing so I've included it.
Updated•2 years ago
|
Comment 12•2 years ago
|
||
I ran into this crash today with 105 after opening https://sharedrop.io and closing the tab (I only clicked the initial message window which explains what the page is about) after a few seconds. For me this crash appears to be 100% reproducible right now. Here are the crash stats link for my attempts:
https://crash-stats.mozilla.org/report/index/562d2a45-5ffc-4b4b-bc92-6890a0220816
https://crash-stats.mozilla.org/report/index/9dad0a8d-4750-4f6d-b331-9f7740220816
https://crash-stats.mozilla.org/report/index/377c9073-7ebb-417f-b816-4f6240220816
https://crash-stats.mozilla.org/report/index/1897e817-14a9-4a35-a9e1-5a6a20220816
Note: for of them the stack trace doesn't point to mozilla::net::UrlClassifierFeatureEmailTrackingDataCollection::ProcessChannel, but I included it anyway just in case.
Comment 13•2 years ago
|
||
Forgot to mention that the same crash is for me not reproducible in 103 (although I'm using the different profile for the release channel).
Comment 14•2 years ago
|
||
I am getting this crash exclusively when I close (some) tabs. Approximately every fifth or so tab closing ends with this crash now.
https://crash-stats.mozilla.org/report/index/bp-532628ae-f67d-47bf-af8d-a3dfd0220816
https://crash-stats.mozilla.org/report/index/bp-8103de12-df51-492e-8752-841550220816
https://crash-stats.mozilla.org/report/index/bp-49d1c59c-f4ec-4798-95cb-e26640220816
https://crash-stats.mozilla.org/report/index/bp-39461282-ac0a-4126-866a-1561b0220816
After last update I've been browsing and presumably closing without crashes for a day, but have not hard data not reliable memory to swear on it (you know, typical tab hoarder, cleaning tabs only after midnight).
Comment 15•2 years ago
|
||
Confirming crash upon closing https://sharedrop.io/ on the first try and many times upon closing https://angular.io/ .
Closing unloaded tab using context menu works OK.
Comment 16•2 years ago
|
||
This suddenly crashed on me tonight, and I can no longer open DevEdition due to it (crashes on every startup attempt)
One of my crashes: https://crash-stats.mozilla.org/report/index/a264aa2e-4daf-4b9b-b814-0924a0220816#tab-details
Happy to try and narrow this down if there is more need.
Comment 17•2 years ago
|
||
(In reply to Justin Wood (:Callek) from comment #16)
Ooops, submitted too early, DevEd: 104.0b9
Comment 18•2 years ago
|
||
As a workaround, I was able to get the browser launched if I set privacy.trackingprotection.emailtracking.data_collection.enabled
to false
in my prefs.js manually.
Comment 19•2 years ago
|
||
Pushed by amccreight@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/038d31437d8e Don't try to gather telemetry for a discarded bc in UrlClassifierFeatureEmailTrackingDataCollection::ProcessChannel. r=dimi
Assignee | ||
Comment 20•2 years ago
|
||
Comment on attachment 9289989 [details]
Bug 1784983 - Don't try to gather telemetry for a discarded bc in UrlClassifierFeatureEmailTrackingDataCollection::ProcessChannel.
Beta/Release Uplift Approval Request
- User impact if declined: very high crash rates
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): It just ignores a few simple cases. I think the worst possibility is that it won't fix the crash. See comment 11 for the patch rebased for beta. Patch hasn't merged to m-c yet.
- String changes made/needed: none
- Is Android affected?: Yes
Comment 21•2 years ago
|
||
bugherder |
Comment 23•2 years ago
|
||
Comment on attachment 9289989 [details]
Bug 1784983 - Don't try to gather telemetry for a discarded bc in UrlClassifierFeatureEmailTrackingDataCollection::ProcessChannel.
Approved for 104.0rc2
Comment 24•2 years ago
|
||
bugherder uplift |
Comment 26•2 years ago
|
||
Comment on attachment 9289989 [details]
Bug 1784983 - Don't try to gather telemetry for a discarded bc in UrlClassifierFeatureEmailTrackingDataCollection::ProcessChannel.
Approved for 104.0b10 devedition
Comment 27•2 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 28•2 years ago
|
||
This seems to have hit about 95% of all 104.0rc1 crashes in the last 24 hours.
Comment 29•2 years ago
|
||
Odd that the regression is old, but my crashing is just started today, twice with 104.0, both news.google.com
Comment 30•2 years ago
|
||
Yeah my nightly suddenly started crashing on every startup today. Reverting to a nightly from a month ago didn't help, I ended up having to switch to release until the fix landed.
Comment 31•2 years ago
|
||
Desktop Nightly builds should be ready with this fix now. Would be great to get some confirmation that the crashes are resolved. We're working on respins for all the other affected releases (desktop & mobile) as well.
Comment 32•2 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #31)
Desktop Nightly builds should be ready with this fix now. Would be great to get some confirmation that the crashes are resolved. We're working on respins for all the other affected releases (desktop & mobile) as well.
My build is working fine now.
Comment 33•2 years ago
|
||
(In reply to Dave Townsend [:mossop] from comment #30)
Yeah my nightly suddenly started crashing on every startup today. Reverting to a nightly from a month ago didn't help, I ended up having to switch to release until the fix landed.
Justin Wood (:Callek) pointed out here that you can temporarily hot-fix/circumvent it with about:config pref. If your profile crashes on load, try to locate <your Firefox profile folder>\prefs.js
file and ensure the is present
user_pref("privacy.trackingprotection.emailtracking.data_collection.enabled", false);
there before launch.
Thanks Justin I could use my primary ordinary Developer edition profile without juggling with release. (Thanks again!)
Assignee | ||
Comment 34•2 years ago
|
||
There's a low crash volume so far, but I don't see any crashes in the build with the fix that have either signature.
Comment 35•2 years ago
•
|
||
(In reply to Wayne Mery (:wsmwk) from comment #29)
Odd that the regression is old, but my crashing is just started today, twice with 104.0, both news.google.com
Mine is 104.0 beta, not nightly, on Mac. I've had some more crashes that were not news.google.com - mostly when closing tabs.
bp-b1f09e0c-63a2-4ea0-8fcd-826f00220816
bp-62237cf3-832e-4b7f-83c1-5630d0220816
bp-8fcd8729-74ed-4254-9ee9-cd2b00220816
bp-d7b0c33d-0a2a-4c0b-9749-d44e70220816
bp-ab41f48c-245c-487c-a9e1-76f150220816
Comment 36•2 years ago
|
||
DevEdition 104.0b10 and Firefox 104.0 RC2 are both now live with the fix. Fenix & Focus Nightly & Beta respins are also in review in Google Play.
Updated•2 years ago
|
Comment 38•2 years ago
|
||
Dimi, given the volume of this crash in such short order, I strongly think we should still take the time to fully understand what happened here to see what possible hardening we can do to avoid such cases in the future.
Comment 39•2 years ago
•
|
||
We tested the issue on several devices and it no longer occurs on RC 104.1.0 build2 and Firefox Beta 104.0b7. Tabs can be closed without Fenix crashing.
Devices used:
- Huawei MediaPad M3 (Android 7)
- Google Pixel 6 (Android 13)
- Samsung Galaxy S22 Ultra (Android 12)
- Samsung A32 5G (Android 11)
- Oppo Find X5 (Android 12)
- P40 Lite (Android 10)
- Sony Xperia (Android 6.0.1)
- Xperia 10 (Android 11)
Comment 40•2 years ago
|
||
The crash happened because the google tag manager script sends a beacon to report the region analytics data via the navigator.sendBeacon()
. The beacon request is async and has the lowest loading priority. If the page was closed before the beacon request starts to load, the top-level WindowGlobalParent
won't be available in this case. So, it will crash when we access it without a null check.
is there a possible way to enforce a null check when getting either a BrowsingContext
or a WindowGlobalParent
?
Comment 41•2 years ago
|
||
(In reply to Tim Huang[:timhuang] from comment #40)
The crash happened because the google tag manager script sends a beacon to report the region analytics data via the
navigator.sendBeacon()
. The beacon request is async and has the lowest loading priority. If the page was closed before the beacon request starts to load, the top-levelWindowGlobalParent
won't be available in this case. So, it will crash when we access it without a null check.
Was this a change on Google's side? I'm wondering why old nightly builds that were previously fine suddenly started crashing.
Also a little confused why this was happening to me on startup, I wasn't closing any pages. Maybe the webpages that were loading from the session were creating and then immediately removing frames to trigger this?
Comment 42•2 years ago
|
||
The crash is verified fixed on latest Focus RC 104.1.0 build2 and Focus Beta 104.0b7, as well. No crash occurs when closing tabs from various sites, including those mentioned above.
Devices used:
- Google Pixel 6 (Android 12)
- Xiaomi 12 Pro (Android 12)
- Lenovo Yoga Tab 11 (Android 11)
- OPPO A15s (Android 10)
- LG G7 fit (Android 8.1.0)
Comment 43•2 years ago
|
||
(In reply to Dave Townsend [:mossop] from comment #41)
(In reply to Tim Huang[:timhuang] from comment #40)
The crash happened because the google tag manager script sends a beacon to report the region analytics data via the
navigator.sendBeacon()
. The beacon request is async and has the lowest loading priority. If the page was closed before the beacon request starts to load, the top-levelWindowGlobalParent
won't be available in this case. So, it will crash when we access it without a null check.Was this a change on Google's side? I'm wondering why old nightly builds that were previously fine suddenly started crashing.
Did it do this in Google Chrome or any other browsers? Maybe there is your answer.
Comment 44•2 years ago
|
||
Hello,
I was able to reproduce the crash on Firefox RC1(build ID: 20220815150936) on Windows 10 using the steps from Comment 3 and Comment 7. Verified as fixed on Firefox RC2(build ID: 20220816115024), DevEd.0b10(build ID: 20220816121120) and Nightly 105.0a1(build ID: 20220816190318) on Windows 10, macOS 12, Ubuntu 22.04.
Assignee | ||
Comment 45•2 years ago
|
||
(In reply to Tim Huang[:timhuang] from comment #40)
is there a possible way to enforce a null check when getting either a
BrowsingContext
or aWindowGlobalParent
?
This issue is pretty common, so for IPC we have MaybeDiscarded<BrowsingContext>, which requires checking that the browsing context isn't discarded (in my testing, it was actually the bc->IsDiscarded()
check that caused us to avoid the crash). It would probably be possible to go through and find various APIs like LoadInfo::GetBrowsingContext where people are checking IsDiscarded on the return value and change them to return MaybeDiscarded<BrowsingContext>. Presumably it could also be done for WindowGlobalParent. I'll file a bug about that.
Comment 46•2 years ago
|
||
(In reply to Dave Townsend [:mossop] from comment #41)
(In reply to Tim Huang[:timhuang] from comment #40)
The crash happened because the google tag manager script sends a beacon to report the region analytics data via the
navigator.sendBeacon()
. The beacon request is async and has the lowest loading priority. If the page was closed before the beacon request starts to load, the top-levelWindowGlobalParent
won't be available in this case. So, it will crash when we access it without a null check.Was this a change on Google's side? I'm wondering why old nightly builds that were previously fine suddenly started crashing.
Also a little confused why this was happening to me on startup, I wasn't closing any pages. Maybe the webpages that were loading from the session were creating and then immediately removing frames to trigger this?
It's not on Google's side.
It's because the email tracking data collection feature never worked until yesterday. We have published the email tracker tables on our Shavar server yesterday, so it started to collect data and crash since then.
Comment 47•2 years ago
|
||
Would it be practical/useful to un-publish them for a week or so to allow affected clients to update to a fixed version? I'm worried about how many users are currently stuck on an affected build due to crashing on startup and being unable to update.
Comment 48•2 years ago
|
||
Yeah, it's reasonable. I have requested a rolling back of the Shavar changes.
Description
•