Closed Bug 1410276 Opened 2 years ago Closed 2 years ago
Add a canary field to ns
Add a temporary conary field to nsStringBuffer to help root out suspected heap crashes, bug 1405525.
Ping. See bug 1406996 comment 37 for why we should get moving with this.
(In reply to Bobby Holley (parental leave - send mail for anything urgent) from comment #1) > Ping. See bug 1406996 comment 37 for why we should get moving with this. Thanks. I'll make this my current task.
Hi bz, you were suggested as a reviewer for this patch. We want to add a canary value to nsStringBuffer and have it in nightly for a few days to try to detect suspected heap curruption. Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab2a107d0312ca4acbd8e6c0816eaae5b69bab11 Thanks.
Attachment #8921862 - Flags: review?(bzbarsky)
Comment on attachment 8921862 [details] [diff] [review] Bug 1410276 - Add a canary field to nsStringBuffer You probably don't want to include the trailing ';' in the CHECK_STRING_BUFFER_CANARY macro (both versions). Making FromData out of line might be a perf hit that we notice in some cases, but that's OK for a temporary thing.
Attachment #8921862 - Flags: review?(bzbarsky) → review+
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #4) > Comment on attachment 8921862 [details] [diff] [review] > Bug 1410276 - Add a canary field to nsStringBuffer > > You probably don't want to include the trailing ';' in the > CHECK_STRING_BUFFER_CANARY macro (both versions). Derp. > Making FromData out of line might be a perf hit that we notice in some > cases, but that's OK for a temporary thing. I thought so too, I'll add a comment for anyone else reading this. FWIW I wrote this in a way that we can keep it in if we chose to. If we do that we could remove the assertion in this function and move it back to the class definition. Thanks.
r+ carried forward.
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/a3785ec9a48c Add a canary field to nsStringBuffer. r=bz
This seems to have cought a few crashes, I might need some guideance on how to get the dump out of the crash reports and somehow examine it: https://crash-stats.mozilla.com/search/?signature=%3Dmozalloc_abort%20%7C%20NS_DebugBreak%20%7C%20nsStringBuffer%3A%3AFromData&signature=%3Dmozalloc_abort%20%7C%20Abort%20%7C%20NS_DebugBreak%20%7C%20nsStringBuffer%3A%3ARelease&version=58.0a1&date=%3E%3D2017-10-25T14%3A01%3A00.000Z&date=%3C2017-11-14T14%3A01%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Crash Signature: mozalloc_abort | Abort | NS_DebugBreak | nsStringBuffer::Release mozalloc_abort | NS_DebugBreak | nsStringBuffer::FromData
Crash Signature: mozalloc_abort | Abort | NS_DebugBreak | nsStringBuffer::Release mozalloc_abort | NS_DebugBreak | nsStringBuffer::FromData → [@ mozalloc_abort | Abort | NS_DebugBreak | nsStringBuffer::Release] [@ mozalloc_abort | NS_DebugBreak | nsStringBuffer::FromData]
Emilio, do you have access to the crash report dumps?
Yes, I do. Paul, let me know what I should look for, or reach to me on IRC so we can talk about it or what not. Though probably you should get access to the dumps given you're working on the GC...
Flags: needinfo?(emilio) → needinfo?(pbone)
The "moz crash reason" field for all of these crashes don't contain any useful information. These are all content process crashes, and in the content process NS_RUNTIMEABORT doesn't do anything: http://searchfox.org/mozilla-central/rev/af86a58b157fbed26b0e86fcd81f1b421e80e60a/xpcom/base/nsDebugImpl.cpp#394
Well, of course it crashes. I guess the actual field for this would be AbortMessage but that is blank, too.
Bug 1412048 replaced some uses of NS_RUNTIMEABORT, but it looks like it missed this one. But you can just land a patch to do a similar transformation for your call site.
mcrr8: When I read that other bug it sounds like all the relevant data _is_ captured, it's just not public. So hopefully having access to the full crash reports will help me. All: I have applied for minidump access and should be able to access this shortly. I noticed that all the crashes in Release() are on windows and all the crashes in FromData() are on Linux. This is probably just a coincidence: there are only 8 crashes in total.
Well, that comment isn't quite right. It only applies to annotations in the parent process, AFAICT. I have minidump access and I couldn't see them. It looks like a more recent patch did update the canary annotation so these should start showing up: https://hg.mozilla.org/mozilla-central/rev/e85f59ea455d
Okay thanks. I expect that with a minidump I'll be able to inspect the memory and find out what these values were.
(In reply to Andrew McCreight [:mccr8] from comment #14) > Bug 1412048 replaced some uses of NS_RUNTIMEABORT, but it looks like it > missed this one. But you can just land a patch to do a similar > transformation for your call site. That is not necessary. When I landed the patch for bug 1412048 that removed NS_RUNTIMEABORT last week, I replaced CHECK_STRING_BUFFER_CANARY's NS_RUNTIMEABORT call with MOZ_CRASH_UNSAFE_PRINTF: https://searchfox.org/mozilla-central/diff/4a8bdd4ca5087c07464aaab990dfc3ccd4d3c909/xpcom/string/nsSubstring.cpp#44
Backed out in inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/23f86a1ac424
Backed out in central: https://hg.mozilla.org/mozilla-central/rev/23f86a1ac424
Hi bz, The original patch was backed out because 58 is getting close to beta. But it hasn't caught many assertions yet. This patch should affect performance less and might be valuable. Is this worth persuing or should we close this bug?
Comment on attachment 8926718 [details] [diff] [review] Bug 1410276 - Add a canary field to nsStringBuffer This seems fine, though FromDataOutlineCheckCanary is only called when the check would fail, so we could just rename it to FromDataCanaryCheckFailed and nix the checks inside it. Either way, we should document what it does.
Attachment #8926718 - Flags: review?(bzbarsky) → review+
Updated based on review feedback. r+ carried forward.
Here are some talos jobs to help determine what affect this has on performance and help us decide if it's worth committing. Do we want this to go to Beta with 58 assuming there's no affect on performance? Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6b6a49fa7d322ed4e8acaf6f7556452659157b9a Talos: https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=6b6a49fa7d322ed4e8acaf6f7556452659157b9a NI: jonco, I'm not sure who else to ask and this relates to Bug 1405525
(In reply to Paul Bone [:pbone] from comment #24) If there's no impact on performance I'd say let's put this in and see what it turns up.
The latest patch causes a single 4.82% perf regression for tp5o_webext responsiveness opt e10s. I only bothered to test on linux64, there shouldn't be any platform differences. https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=e2f87726b608&newProject=try&newRevision=6b6a49fa7d322ed4e8acaf6f7556452659157b9a&framework=1&showOnlyImportant=1 I'm going to wait until 58 goes to beta and then push this on 59 for a while.
Pushed by email@example.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/ced8af085927 Add a canary field to nsStringBuffer r=bz
I've committed this again so that it can run for a while on 59. Ionuț this will likely cause some perf regressions, like last time but hopefully fewer / less sagnificant. Thanks.
(In reply to Paul Bone [:pbone] from comment #28) > I've committed this again so that it can run for a while on 59. > > Ionuț this will likely cause some perf regressions, like last time but > hopefully fewer / less sagnificant. > > Thanks. Thanks for the heads up!
Here are a few Mac nightly crashes in 59 that I spotted today while going through crash stats: http://bit.ly/2iWsYBg
Paul, any updates on this nsStringBuffer canary? It's still enabled in Nightly 60 and there are about a dozen crash reports from the last week. Should we remove this code or just leave it? https://crash-stats.mozilla.com/search/?signature=~FromDataCanaryCheckFailed&product=Firefox&date=%3E%3D2018-02-03T23%3A47%3A50.000Z&date=%3C2018-02-10T23%3A47%3A50.000Z&_sort=-date&_facets=signature&_facets=cpu_arch&_facets=version&_facets=platform_pretty_version&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Oh, I'll take a look at those crashes later today as Bug 1405525. I lost track of this bug and didn't realise it was still providing crashes. I'm happy to keep this code in since it's only used in Nightly and Debug builds. I'll close this bug and continue in the original Bug 1405525.
You need to log in before you can comment on or make changes to this bug.