Startup Crash in [@ BaseNotification::SetContentString]
Categories
(Toolkit Graveyard :: Notifications and Alerts, defect)
Tracking
(firefox-esr102 unaffected, firefox112 wontfix, firefox113 wontfix, firefox114 fixed, firefox115 fixed)
Tracking | Status | |
---|---|---|
firefox-esr102 | --- | unaffected |
firefox112 | --- | wontfix |
firefox113 | --- | wontfix |
firefox114 | --- | fixed |
firefox115 | --- | fixed |
People
(Reporter: aryx, Assigned: nrishel)
References
(Regression)
Details
(Keywords: crash, regression, Whiteboard: [fidedi-notifications])
Crash Data
Attachments
(2 files)
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-release+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-release+
|
Details | Review |
250 crashes for Firefox 111.0.x on Windows, newer versions also affected. Firefox 111 had native Windows notifications turned on in bug 1497425.
Crash report: https://crash-stats.mozilla.org/report/index/9c773c95-e6cb-4531-9907-efd0b0230414
Reason: EXCEPTION_ACCESS_VIOLATION_READ
Top 10 frames of crashing thread:
0 wpnapps.dll BaseNotification::SetContentString onecoreuap\base\diagnosis\platform\notifications\developer\basenotification.cpp:659
0 wpnapps.dll BaseNotification::Post onecoreuap\base\diagnosis\platform\notifications\developer\basenotification.cpp:314
1 wpnapps.dll <lambda_d9910e541950f8d272504b6cd8567d65>::operator const onecoreuap\base\diagnosis\platform\notifications\developer\toast.cpp:669
2 wpnapps.dll ToastNotifierImpl::Show onecoreuap\base\diagnosis\platform\notifications\developer\toast.cpp:673
3 rpcrt4.dll Invoke
4 rpcrt4.dll ?Ndr64StubWorker@@YAJPEAX0PEAU_RPC_MESSAGE@@PEAU_MIDL_SERVER_INFO_@@PEBQ6AJXZPEAU_MIDL_SYNTAX_INFO@@PEAK@Z
5 rpcrt4.dll NdrStubCall3
6 combase.dll CStdStubBuffer_Invoke onecore\com\combase\ndr\ndrole\stub.cxx:1440
7 rpcrt4.dll CStdStubBuffer_Invoke
8 combase.dll InvokeStubWithExceptionPolicyAndTracing::__l6::<lambda_c9f3956a20c9da92a64affc24fdd69ec>::operator const onecore\com\combase\dcomrem\channelb.cxx:1279
Comment 1•2 years ago
|
||
:nrishel, since you are the author of the regressor, bug 1497425, could you take a look? Also, could you set the severity field?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 2•2 years ago
|
||
:aryx would this segfault have crashed the browser, or would we have recovered from it given the error is on a non-main thread? I'm assuming the former but I'm currently unable to recreate a crash.
Some hypotheses: might be an issue with our use of HString
and HStringReference
classes - possibly Windows violating "The Windows Runtime releases all references to the string reference when the call returns." - or an issue with the supplied AString
not being valid UTF-16. I was thinking the former to me makes more sense wrt causing a null pointer segfault as the classes are likely nulling their content on destruction, but I'd suspect significantly more crashes if that were the case. Alternatively maybe an issue with UTF-16 4 byte characters, but testing with \ud834\udd1e
didn't have any issue.
Reporter | ||
Comment 3•2 years ago
|
||
crash-stats lists the process type as parent
- the browser crashed.
Reporter | ||
Comment 4•2 years ago
|
||
67% of the crashes are in the first minute after launch. These might be users with session restore enabled or who visit a page which sends notifications. In case the user submitted the url of the page which they had open (assumption here it's not necessarily the one which reques7 the notification), there are also established assets like Google, Spotify, and Facebook - unlikely they all provide invalid strings.
Firefox in Japanese represent 7% of the crashes, Spanish and French 5% - but English (US) 48% and English (UK) 5%. No pattern.
Assignee | ||
Comment 5•2 years ago
|
||
:aryx Is there a way to query the matching bugs? From a random sampling I suspect they all contain OnWriteBitmapFinished
or OnWriteImageFinished
(same function renamed) but not OnWriteImageSuccess
. This would indicate an issue for notifications with images that fail to save to disk.
Comment 6•2 years ago
|
||
(In reply to Nick Rishel [:nrishel] from comment #5)
:aryx Is there a way to query the matching bugs? From a random sampling I suspect they all contain
OnWriteBitmapFinished
orOnWriteImageFinished
(same function renamed) but notOnWriteImageSuccess
. This would indicate an issue for notifications with images that fail to save to disk.
Yes: if you click on the crash report and then either "More reports" or "Search", you can log in to crash-stats (which might require access?) and get much more information. I will note that of the public data I am looking at, 87% are on AMD64. That's suspicious.
But as I do this, I don't see how to query the entire stack for a) frames that are not top-most and b) frames from the main thread. Hopefully :aryx can guide us.
Comment 7•2 years ago
|
||
Update: it looks like "proto signature" is what we want, but when I try to search for, e.g., proto signature contains "mozilla::widget::ToastNotificationHandler::OnWriteBitmapFinished"
I don't get sensible hits. I don't see how to view the "proto signature" in order to figure out what it actually contains so I'm stuck here.
Assignee | ||
Comment 8•2 years ago
|
||
I think proto signature
only contains the crashing stack trace (you can aggregate on proto signature
here to get an idea of what it contains). We need to match the stack trace of a thread blocked by the crashing thread.
Reporter | ||
Comment 9•2 years ago
|
||
Proto signature contains the top 15 frames of the crashing thread as far as I know.
Will, is there a way to search for crash reports which have a non-crashing thread with a substring in its stack?
Comment 10•2 years ago
|
||
Documentation for the proto signature field: https://crash-stats.mozilla.org/documentation/datadictionary/dataset/processed/field/proto_signature
Non-crashing thread information isn't indexed, so I think the answer to your question is no.
Updated•2 years ago
|
Assignee | ||
Comment 11•2 years ago
•
|
||
Do we have other metrics, e.g. can we check for low disk/memory availability? This is increasingly looking like something not being handled correctly Windows-side. When an image fails to write to disk we guard against using image toasts. Everything looks correctly guarded and I haven't seen any contradiction that this only occurs when either we're not able to write to disk (or retrieve the image surface). That suggests to me that perhaps either the disk/memory is at capacity and a null check isn't being performed.
Edit, if I'm reading the reports correctly I'm guessing they have sufficient disk/memory.
Comment 12•2 years ago
|
||
The severity field is not set for this bug.
:tspurway, could you have a look please?
For more information, please visit BugBot documentation.
Updated•2 years ago
|
Assignee | ||
Comment 14•2 years ago
•
|
||
:RyanVM No updates. I don't have any solid hypothesis what's triggering this other than it appears to correlate with failure to save an image to disk, but the control flow path that triggers are easy to reason about and don't have any apparent errors. Absent a MRE, I think this may be a Windows bug that triggers under some resource constrained scenario.
:aryx, missed NI for in my last comment's question.
Reporter | ||
Comment 15•2 years ago
|
||
bp-c646ac09-e1c9-48ce-9557-30d610230508 has "System Memory Use Percentage 41" - 59% of the memory are available. There is no telemetry about available disk space as far as I can see - both in the crash reports and locally at about:telemetry.
Comment 16•2 years ago
|
||
Yannis, does anything stand out to you in these crashes?
Comment 17•2 years ago
•
|
||
This crash happens while Microsoft's wpnapps.dll
tries to convert the XML for a toast notification from UTF16 to UTF8. For some reason the translation fails. The Microsoft internal function wpnapps!ConvertUTF16ToUtf8
outputs a null pointer in error case, which wpnapps!BaseNotification::SetContentString
will use without first checking that it is not null. This should result in a benign error instead of a crash if the check was there; so we can probably ping Microsoft about this part. However we should also investigate how a failing UTF16 sequence could be found there in the first place.
This analysis seems confirmed by the few crashes in Nightly (e.g. here), where the last error value is available and listed as ERROR_NO_UNICODE_TRANSLATION
. This is because wpnapps!ConvertUTF16ToUtf8
calls WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, lpWideCharStr, ...)
, which fails, resulting in this ERROR_NO_UNICODE_TRANSLATION
last error code followed by the crash.
It is possible to get this last error code manually e.g. by breaking surrogate pair rules:
For UTF-16, a "surrogate pair" is required to represent a single supplementary character. The first (high) surrogate is a 16-bit code value in the range U+D800 to U+DBFF. The second (low) surrogate is a 16-bit code value in the range U+DC00 to U+DFFF.
wchar_t chars[]{ 0xD800, 0 }; // Use a high surrogate followed by anything *except* a low surrogate
auto result = WideCharToMultiByte(CP_UTF8, WC_ERR_INVALID_CHARS, chars, -1, nullptr, 0, nullptr, nullptr);
printf("%d %d\n", result, GetLastError()); // Outputs: 0 1113
So far I haven't had success in reproducing the crash e.g. by modifying mTitle
and mMsg
in ToastNotificationHandler::CreateToastXmlDocument
to try to introduce this kind of problem. But it does reproduce if I manually alter the UTF16 argument of wpnapps!ConvertUTF16ToUtf8
in a debugger. Maybe this gives more ideas of things to try?
This being a startup crash could be explained by users reloading their session after crashing, and the same notification getting dispatched again, thus re-crashing the browser like comment 4 suggests.
Comment 18•2 years ago
•
|
||
(In reply to Nick Rishel [:nrishel] from comment #5)
:aryx Is there a way to query the matching bugs? From a random sampling I suspect they all contain
OnWriteBitmapFinished
orOnWriteImageFinished
(same function renamed) but notOnWriteImageSuccess
. This would indicate an issue for notifications with images that fail to save to disk.
This is a very interesting observation, this crash could definitely be related to notifications with images. But I don't think we should assume that this is limited to the failure-to-save case. If I read the code correctly the notification is always sent from within OnWriteImageFinished
even in case of success; OnWriteImageSuccess
only modifies what will be sent in the notification. To me this additional information could make the call to AppendUTF8toUTF16
rather suspicious here.
Comment 19•2 years ago
•
|
||
After finding some hints in the user comments, using aggregation on crash reports revealed that 619 of the 634 crashes have a common addon, listed with identifier jid0-GjwrPchS3Ugt7xydvqVK4DQk8Ls@jetpack:1.0.5
: Notifier for Gmail™
, version 1.0.5. It appears to be this addon which reports the same id/name/version. This addon should be helpful to reproduce the issue -- which may be more general.
I installed it and created a custom Gmail account, but so far have no success in reproducing the crash. The notifications are always shown with the same image: the icon for the addon. I wonder if receiving simultaneously hundreds of notifications (for unread emails) all with the same image (the icon) could trigger the crash?
I also found that this particular crash has some limited heap information that is relevant: at address 000001598f09881c
we have the beginning of the faulty UTF16 string:
<toast launch="program
firefox
profile
C:\Users\<removed>\AppData\Roaming\Mozilla\Firefox\Profiles\[removed]
windowsTag
[removed]"><visual><binding template="ToastImageAndText03"><image id="1" src="file:///C:/Users/[removed]/AppData/Local/Temp/notificationimages/0c63809e-10f7-45b9-8efd-0c07f10eb151.png"/><text id="1">Notifier for Gmail™
However the full string is not available in the crash report, but at least this shows that the problem is not with the image's URL like I suggested in comment 18. The string ends like this in raw bytes:
00000159`8f098ac0 3c 00 74 00 65 00 78 00-74 00 20 00 69 00 64 00 <.t.e.x.t. .i.d.
00000159`8f098ad0 3d 00 22 00 31 00 22 00-3e 00 4e 00 6f 00 74 00 =.".1.".>.N.o.t.
00000159`8f098ae0 69 00 66 00 69 00 65 00-72 00 20 00 66 00 6f 00 i.f.i.e.r. .f.o.
00000159`8f098af0 72 00 20 00 47 00 6d 00-61 00 69 00 6c 00 22 21 r. .G.m.a.i.l."!
00000159`8f098b00 ?? ?? ?? ?? ?? ?? ?? ??-?? ?? ?? ?? ?? ?? ?? ?? ???????????????? // remaining data not available in the crash dump
If I look at what the notifications from this addon look like on my own machine, I have the same start of UTF-16 string (and no crash):
<toast launch="program
firefox
profile
C:\Users\yjugl\AppData\Roaming\Mozilla\Firefox\Profiles\h8yxxtwx.default-release-5
windowsTag
2964831140"><visual><binding template="ToastImageAndText03"><image id="1" src="file:///C:/Users/yjugl/AppData/Local/Temp/notificationimages/5378b913-4a4f-4d7f-aaf7-84beb79820a3.png"/><text id="1">Notifier for Gmail™</text><text id="2">From: yjuglaret@whatever.com
Title: Hello
Summary: This is a test</text></binding></visual><actions><action content="Notification settings" arguments="program
firefox
profile
C:\Users\yjugl\AppData\Roaming\Mozilla\Firefox\Profiles\h8yxxtwx.default-release-5
windowsTag
2964831140
action
settings" placement="contextmenu"/></actions></toast>
(the line breaks are \r\n)
Here is what the raw bytes look like around the ™ (U+2122) on my machine, all seems normal:
00000248`bbb73930 70 00 6e 00 67 00 22 00-2f 00 3e 00 3c 00 74 00 p.n.g."./.>.<.t.
00000248`bbb73940 65 00 78 00 74 00 20 00-69 00 64 00 3d 00 22 00 e.x.t. .i.d.=.".
00000248`bbb73950 31 00 22 00 3e 00 4e 00-6f 00 74 00 69 00 66 00 1.".>.N.o.t.i.f.
00000248`bbb73960 69 00 65 00 72 00 20 00-66 00 6f 00 72 00 20 00 i.e.r. .f.o.r. .
00000248`bbb73970 47 00 6d 00 61 00 69 00-6c 00 22 21 3c 00 2f 00 G.m.a.i.l."!<./.
00000248`bbb73980 74 00 65 00 78 00 74 00-3e 00 3c 00 74 00 65 00 t.e.x.t.>.<.t.e.
Edit: Maybe the bad UTF-16 comes from the message title / summary. For example, maybe the summary can cut a message in the middle of a surrogate pair. I'll try that.
Comment 20•2 years ago
•
|
||
The addon from comment 19 contains code which looks like this:
// ...
app.notify = function(text, title, callback, buttons = []) {
// ...
chrome.notifications.create(null, options, id => {
// ...
});
};
// ...
function shorten(str) {
if (str.length < config.email.truncate) {
return str;
}
return str.substr(0, config.email.truncate / 2) + '...' + str.substr(str.length - config.email.truncate / 2);
}
var report = tmp.map(e => config.notification.format
.replace('[author_name]', e.author_name)
.replace('[author_email]', e.author_email)
.replace('[summary]', shorten(e.summary))
.replace('[title]', shorten(e.title))
.replace(/\[break\]/g, '\n'));
// ...
app.notify(report, '', () => {
// ...
}, buttons);
Where config.email.truncate
is 70 by default. And it seems that JS strings can indeed cut into the middle of a surrogate pair, as illustrated here in dev tools:
"𒐪".substr(0,1)
-> "\ud809"
"𒐪".substr(1,1)
-> "\udc2a"
Maybe the web notification path is safeguarded against the kind of strings described in comment 17 but not the Browser Extensions notification path? I'll do more tests in this direction next week.
Comment 21•2 years ago
•
|
||
I am able to reproduce the crash using the following code in a local HTML file:
<!DOCTYPE html>
<html>
<head>
<script type="text/javascript">
function notify(title) {
new Notification(title)
}
function tryNotify(title) {
if (!("Notification" in window)) {
alert("This browser does not support desktop notification")
} else if (Notification.permission === "granted") {
notify(title)
} else if (Notification.permission !== "denied") {
Notification.requestPermission().then((permission) => {
if (permission === "granted") {
notify(title)
}
});
}
}
window.onload = function() {
var charCode = parseInt(prompt("Enter a char code", defaultValue="dfff"), 16);
if (!isNaN(charCode)) {
tryNotify(String.fromCharCode(charCode))
}
};
</script>
</head>
</html>
Under the following conditions (otherwise a notification gets dispatched with no crash):
- using a char code between DC00 and DFFF i.e. a lonely low surrogate;
- assuming that this is the first notification dispatched by Firefox.
This seems to crash both in the title and body of the notification; and both using the web API and addon API.
The occasional presence of lonely low surrogates when using the addon mentioned in comment 19 is explained by the second substr
call in the shorten
function shown in comment 20. The first call can lead to lonely high surrogates, which somehow do not lead to a crash; while the second call can lead to lonely low surrogates, which somehow lead to a crash if this is the first notification getting dispatched.
[:nrishel], would you like to continue the investigation from here?
Assignee | ||
Comment 22•2 years ago
|
||
Much appreciated yannis! Yes, I'll take over from here.
Updated•2 years ago
|
Assignee | ||
Comment 23•1 years ago
|
||
Assignee | ||
Comment 24•1 years ago
|
||
Depends on D178745
Updated•1 years ago
|
Updated•1 years ago
|
Comment 25•1 years ago
|
||
Is this going to land shortly? Thanks
Assignee | ||
Comment 26•1 years ago
|
||
:pascalc I'm planning to try to land this tomorrow. No significant problems found in review, just needs a cleanup pass.
Comment 27•1 years ago
|
||
Assignee | ||
Comment 28•1 years ago
|
||
Comment on attachment 9335363 [details]
Bug 1828073 - Fix invalid UTF-16 crashing Windows Toast notifications. r=nalexander
Beta/Release Uplift Approval Request
- User impact if declined: Crashing under uncommon circumstances for Windows users.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Trivial to reason about, using existing functionality to clean up UTF-16 strings.
- String changes made/needed:
- Is Android affected?: No
Assignee | ||
Updated•1 years ago
|
Comment 29•1 years ago
|
||
Comment on attachment 9335363 [details]
Bug 1828073 - Fix invalid UTF-16 crashing Windows Toast notifications. r=nalexander
Fx114 is in RC, moving uplift request to release
Updated•1 years ago
|
Comment 30•1 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/338450ea3be0
https://hg.mozilla.org/mozilla-central/rev/62eed0dd28da
Comment 31•1 year ago
|
||
Comment on attachment 9335363 [details]
Bug 1828073 - Fix invalid UTF-16 crashing Windows Toast notifications. r=nalexander
Approved for 114.0.2 next week, thanks.
Updated•1 year ago
|
Comment 32•1 year ago
|
||
bugherder uplift |
Updated•1 year ago
|
Updated•1 year ago
|
Description
•