Crash pings sometimes have null bytes after module names
Categories
(Toolkit :: Crash Reporting, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox84 | --- | fixed |
People
(Reporter: kats, Assigned: akshay1992kalbhor, Mentored)
References
(Blocks 1 open bug)
Details
(Keywords: good-first-bug, Whiteboard: [lang=c++])
Attachments
(2 files)
With :wlach and :willkg I've been working on a way to symbolicate stacks from crash pings and have them accessible via a BigQuery table in STMO. While setting this up I found that some crash pings fail to symbolicate with tecken, because they have some trailing null bytes in the module name which is invalid. For example:
[{'stacks': [[[0, 75652760]]], 'memoryMap': [['libxul.so\x00\x00', '0A3B2742A91342D89C9E7EE340F619ED0']], 'version': 5}]
I'm not sure where these null bytes are coming from but we should fix this at the source so that the data is cleaner and doesn't cause symbolication failures.
Comment 1•4 years ago
|
||
Did the malformed crash pings come from main process crashes or child process crashes? We use two different mechanism to generate them, so if the issue affects only one of the two it will make the investigation easier.
Reporter | ||
Comment 2•4 years ago
|
||
All the ones I'm seeing seem to be from the content process (payload.process_type in the crash ping is "content").
Reporter | ||
Comment 3•4 years ago
|
||
For future reference https://sql.telemetry.mozilla.org/queries/74131/source has a query that picks out some of these crash pings, they all seem to be on Linux or Mac.
Comment 4•4 years ago
|
||
Looking at the code that generates the ping we feed all the values with the trailing null bytes to PathnameStripper::File()
. There might be a bug lurking in there somewhere, or in the minidump writer.
Using the SHA256 checksums I found a crash report with this issue:
https://crash-stats.mozilla.org/report/index/88428fa6-d5e0-47d4-a811-b05470200812
If I process the raw minidump with the minidump-analyzer
tool I reproduce the issue. The interesting bit is that the original path that leads to the trailing NUL characters contains non-ASCII characters so we might be miscalculating the length of a non-ASCII string: there's one trailing NULL character for every non-ASCII character in the original path.
Comment 5•4 years ago
|
||
This is a simple enough issue for someone who wants to try and contribute. The amount of code involved is very small and reproducing the bug is trivial.
Assignee | ||
Comment 6•4 years ago
|
||
If no one is assigned this bug, can I try working on it?
Reporter | ||
Comment 7•4 years ago
|
||
Sorry for the delay here - Gabriele who is mentoring this bug is away until next week. https://firefox-source-docs.mozilla.org/toolkit/crashreporter/crashreporter/index.html has some information about the crash reporter and minidump analyzer if you're interested in some background reading material. I believe the minidump-analyzer tool gets built as part of the standard Firefox build, so if you haven't done that yet you can follow the steps at https://firefox-source-docs.mozilla.org/setup/index.html to get set up with that.
Assignee | ||
Comment 8•4 years ago
|
||
Thanks for the information. I tried accessing the raw minidump to reproduce the issue like Gabriele did but I need extra privileges I guess. Is that normal or am I doing something wrong?
Reporter | ||
Comment 9•4 years ago
|
||
Yes that is normal, the minidump usually has additional sensitive user information that we don't make public. Let me see if I can sanitize the dump or find one without any sensitive user information.
Comment 10•4 years ago
|
||
Kats: That's against our data policy. You can generate your own dump and then share that with akshay or provide instructions for akshay to generate their own dump, but you can't share a dump from Crash Stats sanitized or otherwise. Sorry!
Reporter | ||
Comment 11•4 years ago
|
||
(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #10)
Kats: That's against our data policy. You can generate your own dump and then share that with akshay or provide instructions for akshay to generate their own dump, but you can't share a dump from Crash Stats sanitized or otherwise. Sorry!
Thanks for the clarification!
I generated a dump locally that I think should reproduce the problem, and am attaching it to this bug (zipped). For reference, here are the steps I used to generate the dump (on Linux, because that's where the problem seems to manifest):
- Make a folder with non-ASCII characters. e.g.
mkdir ~/Downloads/Téléchargement
- Put a copy of Firefox in there. In my case I downloaded the July 1 nightly from here to make the next step easier
- Start this firefox instance (I used a fresh profile) and trigger a crash. With the July 1 build I did it by running with the
MOZ_WEBRENDER=1
environment variable and going to this URL which is a testcase for a crasher bug that I fixed later in July. - When the crash reporter thingy comes up, don't submit the crash dump, but instead go to your
~/.mozilla/firefox/Crash Reports/pending/
folder and look for the most recent.dmp
file.
I'm not super familiar with the minidump-analyzer code so I don't think I can help you much more beyond this but presumably feeding the dump into the analyzer will allow you to reproduce the trailing null bytes in the string, if Gabriele's hypothesis is correct.
Comment 12•4 years ago
|
||
(In reply to akshay1992kalbhor from comment #6)
If no one is assigned this bug, can I try working on it?
I have assigned the bug to you, so welcome and happy hacking!
The problem might be either in the minidump-analyzer
tool or in the code that writes out the name of the module when generating a minidump. My hunch is that the problem is in the latter.
The code that writes out the name of a module under Linux is here. It will read the module name using the LinuxDumper::GetMappingEffectiveNameAndPath()
method, then write it out using MinidumpFileWriter::WriteString()
. I guess that somewhere in there we're miscalculating the length of the string and inserting nul characters at the end. Note that the strings stored in the minidump have an explicit length which is why we're then copying those nul characters into our own strings later. If they were treated as regular C strings those would go away.
To investigate the issue do the following (I'm assuming you already managed to build Firefox, but if you didn't don't hesitate to ask for help):
- Put your
mozilla-central
checkout in a folder that contains a non-ASCII character, such as ~/tést/mozilla-central - Build Firefox as you normally would. Just make sure that your
.mozconfig
file contains theac_add_options --enable-crashreporter
option - Once built, launch firefox with
./mach run --enable-crash-reporter
- Open a new tab and load a page in it
- From the same tab navigate to
about:crashcontent
, this will crash the content process associated with the tab and trigger the affected code.
If you need any more information just let me know, I'll reply ASAP.
Comment 13•4 years ago
|
||
Here's another hint at where things might be going wrong. While reading the code I noticed that MinidumpFileWriter::WriteString()
calls MinidumpFileWriter::WriteStringCore()
. This code performs a UTF-8 to UTF-16 conversion in the CopyStringToMDString()
function, but a quick glance at the code shows that it does not take into account the length of the converted string (the UTF-16 one might have less characters than the UTF-8 one). I'm convinced that the issue is lurking there somewhere.
Assignee | ||
Comment 14•4 years ago
|
||
Thanks for all the information! I have started working on the issue and will give you an update in a day or two
Assignee | ||
Comment 15•4 years ago
|
||
So I have a few questions but a few of them or all might be stupid :)
- kats mentioned that the problem is on Linux and Mac. I am working on a Mac and was wondering if I should switch to Linux?
- I have generated the .dmp and .extra files but I could not manage to find the name of the module anywhere in the .extra file. What am I missing? What exactly should I be looking for? My source is at '~/tést/mozilla-central'. Also I tried using the minidump-analyzer tool on the .dmp file but it does not seem to generate anything.
Thanks in advance!
Comment 16•4 years ago
|
||
(In reply to akshay1992kalbhor from comment #15)
- kats mentioned that the problem is on Linux and Mac. I am working on a Mac and was wondering if I should switch to Linux?
No, mac is fine, the problem is most likely the same.
- I have generated the .dmp and .extra files but I could not manage to find the name of the module anywhere in the .extra file. What am I missing? What exactly should I be looking for? My source is at '~/tést/mozilla-central'. Also I tried using the minidump-analyzer tool on the .dmp file but it does not seem to generate anything.
The .extra file contains JSON, open it up in a graphical editor such as https://jsoneditoronline.org/ or whatever you prefer. Look for the StackTraces
field and open it up. Once in there open the modules
field, it's an array with a module entry for each element. You should find the affected module names there.
PS: Sorry for the late response, I was on PTO for a couple more days but forgot to adjust my bugzilla handle.
Comment 17•4 years ago
|
||
Did you manage to reproduce the issue on macOS? Do you need further help? Don't hesitate to ask if you're stuck.
Assignee | ||
Comment 18•4 years ago
|
||
So I managed to reproduce it and am now looking at the MinidumpFileWriter::WriteString()
and MinidumpFileWriter::WriteStringCore()
methods and figuring out how everything works! Like you said it definitely feels like the conversion between UTF-8 and UTF-16 might be the problem. Thanks!
Assignee | ||
Comment 19•4 years ago
|
||
So I think I know what the problem is. Basically the length of the UTF-8 string is calculated using a for loop which calculates the number of bytes in the UTF-8 string instead of the num of UTF-8 encoded chars. After that two bytes are reserved for every original byte in the UTF-8 string. So for ‘é’ (UTF-8: 0xC3A9, UTF-16: 0x00E9) 4 bytes are reserved but it only needs 2 and the two remaining bytes are null which we saw in the mini-dump. To confirm this I changed the name of my outer folder to ’t£st’ and determined that the module name in the minidump should have 4 null bytes at the end because ‘£’ has 3 bytes in UTF-8 and 2 bytes in UTF-16.
Extra bytes = (2 * 3) - (2) = 4
And that's what I saw in the minidump:
filename : XUL\u0000\u0000
Now I am not sure about the next step. What is the best way to solve this problem? The first thing that I thought of was to change the way the length of the UTF-8 string is calculated. Maybe there is already a function which calculates the length correctly? Any advice would be greatly appreciated! Thanks
Comment 20•4 years ago
|
||
You could use the UTF8ToUTF16Char() function to compute the length. The function returns the number of UTF-8 characters it has converted and you can compute the number of UTF-16 characters it generated by looking at the output. If the second UTF-16 is 0 then only one character was generated, otherwise two characters were generated. See how it's used here.
You can write a similar function that only computes the length of the string so you'll then use that to allocate the proper length here.
Assignee | ||
Comment 21•4 years ago
|
||
Yup. I think that will work! I am going to try that now! Thanks!
Comment 22•4 years ago
|
||
Did you have any luck fixing the issue? If you need further help don't hesitate to ask.
Assignee | ||
Comment 23•4 years ago
|
||
So I have finally managed to fix the problem. I apologise for the delay :(
I have done exactly what you asked me to in your second last comment. The next step is to clean it up a little bit ex. figure out what to name the new variables and decide if we need to delete some of the older code. I am going try to explain the problems below in detail.
-
What do we name the new variable which holds the number of uint16_t values which we need for the
mdstring
? -
Do we still need the code which calculates the value for
mdstring_length
? This mostly depends on the difference between the value of thelength
argument passed to theWriteStringCore
function and the value calculated formdstring_length
.
Comment 24•4 years ago
|
||
Put up a patch for review and we'll discuss the rest there, it's a lot simpler with the code at hand :-)
Assignee | ||
Comment 25•4 years ago
|
||
Comment 26•4 years ago
|
||
Comment 27•4 years ago
|
||
bugherder |
Description
•