Closed Bug 1660516 Opened 4 years ago Closed 4 years ago

Crash pings sometimes have null bytes after module names

Categories

(Toolkit :: Crash Reporting, defect, P3)

defect

Tracking

()

RESOLVED FIXED
84 Branch
Tracking Status
firefox84 --- fixed

People

(Reporter: kats, Assigned: akshay1992kalbhor, Mentored)

References

(Blocks 1 open bug)

Details

(Keywords: good-first-bug, Whiteboard: [lang=c++])

Attachments

(2 files)

With :wlach and :willkg I've been working on a way to symbolicate stacks from crash pings and have them accessible via a BigQuery table in STMO. While setting this up I found that some crash pings fail to symbolicate with tecken, because they have some trailing null bytes in the module name which is invalid. For example:

[{'stacks': [[[0, 75652760]]], 'memoryMap': [['libxul.so\x00\x00', '0A3B2742A91342D89C9E7EE340F619ED0']], 'version': 5}]

I'm not sure where these null bytes are coming from but we should fix this at the source so that the data is cleaner and doesn't cause symbolication failures.

Did the malformed crash pings come from main process crashes or child process crashes? We use two different mechanism to generate them, so if the issue affects only one of the two it will make the investigation easier.

Flags: needinfo?(kats)

All the ones I'm seeing seem to be from the content process (payload.process_type in the crash ping is "content").

Flags: needinfo?(kats)

For future reference https://sql.telemetry.mozilla.org/queries/74131/source has a query that picks out some of these crash pings, they all seem to be on Linux or Mac.

Looking at the code that generates the ping we feed all the values with the trailing null bytes to PathnameStripper::File(). There might be a bug lurking in there somewhere, or in the minidump writer.

Using the SHA256 checksums I found a crash report with this issue:

https://crash-stats.mozilla.org/report/index/88428fa6-d5e0-47d4-a811-b05470200812

If I process the raw minidump with the minidump-analyzer tool I reproduce the issue. The interesting bit is that the original path that leads to the trailing NUL characters contains non-ASCII characters so we might be miscalculating the length of a non-ASCII string: there's one trailing NULL character for every non-ASCII character in the original path.

Severity: -- → S3
Priority: -- → P3

This is a simple enough issue for someone who wants to try and contribute. The amount of code involved is very small and reproducing the bug is trivial.

Mentor: gsvelto
Keywords: good-first-bug
Whiteboard: [lang=c++]

If no one is assigned this bug, can I try working on it?

Sorry for the delay here - Gabriele who is mentoring this bug is away until next week. https://firefox-source-docs.mozilla.org/toolkit/crashreporter/crashreporter/index.html has some information about the crash reporter and minidump analyzer if you're interested in some background reading material. I believe the minidump-analyzer tool gets built as part of the standard Firefox build, so if you haven't done that yet you can follow the steps at https://firefox-source-docs.mozilla.org/setup/index.html to get set up with that.

Thanks for the information. I tried accessing the raw minidump to reproduce the issue like Gabriele did but I need extra privileges I guess. Is that normal or am I doing something wrong?

Yes that is normal, the minidump usually has additional sensitive user information that we don't make public. Let me see if I can sanitize the dump or find one without any sensitive user information.

Kats: That's against our data policy. You can generate your own dump and then share that with akshay or provide instructions for akshay to generate their own dump, but you can't share a dump from Crash Stats sanitized or otherwise. Sorry!

(In reply to Will Kahn-Greene [:willkg] ET needinfo? me from comment #10)

Kats: That's against our data policy. You can generate your own dump and then share that with akshay or provide instructions for akshay to generate their own dump, but you can't share a dump from Crash Stats sanitized or otherwise. Sorry!

Thanks for the clarification!

I generated a dump locally that I think should reproduce the problem, and am attaching it to this bug (zipped). For reference, here are the steps I used to generate the dump (on Linux, because that's where the problem seems to manifest):

  1. Make a folder with non-ASCII characters. e.g. mkdir ~/Downloads/Téléchargement
  2. Put a copy of Firefox in there. In my case I downloaded the July 1 nightly from here to make the next step easier
  3. Start this firefox instance (I used a fresh profile) and trigger a crash. With the July 1 build I did it by running with the MOZ_WEBRENDER=1 environment variable and going to this URL which is a testcase for a crasher bug that I fixed later in July.
  4. When the crash reporter thingy comes up, don't submit the crash dump, but instead go to your ~/.mozilla/firefox/Crash Reports/pending/ folder and look for the most recent .dmp file.

I'm not super familiar with the minidump-analyzer code so I don't think I can help you much more beyond this but presumably feeding the dump into the analyzer will allow you to reproduce the trailing null bytes in the string, if Gabriele's hypothesis is correct.

(In reply to akshay1992kalbhor from comment #6)

If no one is assigned this bug, can I try working on it?

I have assigned the bug to you, so welcome and happy hacking!

The problem might be either in the minidump-analyzer tool or in the code that writes out the name of the module when generating a minidump. My hunch is that the problem is in the latter.

The code that writes out the name of a module under Linux is here. It will read the module name using the LinuxDumper::GetMappingEffectiveNameAndPath() method, then write it out using MinidumpFileWriter::WriteString(). I guess that somewhere in there we're miscalculating the length of the string and inserting nul characters at the end. Note that the strings stored in the minidump have an explicit length which is why we're then copying those nul characters into our own strings later. If they were treated as regular C strings those would go away.

To investigate the issue do the following (I'm assuming you already managed to build Firefox, but if you didn't don't hesitate to ask for help):

  • Put your mozilla-central checkout in a folder that contains a non-ASCII character, such as ~/tést/mozilla-central
  • Build Firefox as you normally would. Just make sure that your .mozconfig file contains the ac_add_options --enable-crashreporter option
  • Once built, launch firefox with ./mach run --enable-crash-reporter
  • Open a new tab and load a page in it
  • From the same tab navigate to about:crashcontent, this will crash the content process associated with the tab and trigger the affected code.

If you need any more information just let me know, I'll reply ASAP.

Assignee: nobody → akshay1992kalbhor
Status: NEW → ASSIGNED

Here's another hint at where things might be going wrong. While reading the code I noticed that MinidumpFileWriter::WriteString() calls MinidumpFileWriter::WriteStringCore(). This code performs a UTF-8 to UTF-16 conversion in the CopyStringToMDString() function, but a quick glance at the code shows that it does not take into account the length of the converted string (the UTF-16 one might have less characters than the UTF-8 one). I'm convinced that the issue is lurking there somewhere.

Thanks for all the information! I have started working on the issue and will give you an update in a day or two

So I have a few questions but a few of them or all might be stupid :)

  1. kats mentioned that the problem is on Linux and Mac. I am working on a Mac and was wondering if I should switch to Linux?
  2. I have generated the .dmp and .extra files but I could not manage to find the name of the module anywhere in the .extra file. What am I missing? What exactly should I be looking for? My source is at '~/tést/mozilla-central'. Also I tried using the minidump-analyzer tool on the .dmp file but it does not seem to generate anything.

Thanks in advance!

(In reply to akshay1992kalbhor from comment #15)

  1. kats mentioned that the problem is on Linux and Mac. I am working on a Mac and was wondering if I should switch to Linux?

No, mac is fine, the problem is most likely the same.

  1. I have generated the .dmp and .extra files but I could not manage to find the name of the module anywhere in the .extra file. What am I missing? What exactly should I be looking for? My source is at '~/tést/mozilla-central'. Also I tried using the minidump-analyzer tool on the .dmp file but it does not seem to generate anything.

The .extra file contains JSON, open it up in a graphical editor such as https://jsoneditoronline.org/ or whatever you prefer. Look for the StackTraces field and open it up. Once in there open the modules field, it's an array with a module entry for each element. You should find the affected module names there.

PS: Sorry for the late response, I was on PTO for a couple more days but forgot to adjust my bugzilla handle.

Did you manage to reproduce the issue on macOS? Do you need further help? Don't hesitate to ask if you're stuck.

Flags: needinfo?(akshay1992kalbhor)

So I managed to reproduce it and am now looking at the MinidumpFileWriter::WriteString() and MinidumpFileWriter::WriteStringCore() methods and figuring out how everything works! Like you said it definitely feels like the conversion between UTF-8 and UTF-16 might be the problem. Thanks!

Flags: needinfo?(akshay1992kalbhor)

So I think I know what the problem is. Basically the length of the UTF-8 string is calculated using a for loop which calculates the number of bytes in the UTF-8 string instead of the num of UTF-8 encoded chars. After that two bytes are reserved for every original byte in the UTF-8 string. So for ‘é’ (UTF-8: 0xC3A9, UTF-16: 0x00E9) 4 bytes are reserved but it only needs 2 and the two remaining bytes are null which we saw in the mini-dump. To confirm this I changed the name of my outer folder to ’t£st’ and determined that the module name in the minidump should have 4 null bytes at the end because ‘£’ has 3 bytes in UTF-8 and 2 bytes in UTF-16.

Extra bytes = (2 * 3) - (2) = 4
And that's what I saw in the minidump:
filename : XUL\u0000\u0000

Now I am not sure about the next step. What is the best way to solve this problem? The first thing that I thought of was to change the way the length of the UTF-8 string is calculated. Maybe there is already a function which calculates the length correctly? Any advice would be greatly appreciated! Thanks

You could use the UTF8ToUTF16Char() function to compute the length. The function returns the number of UTF-8 characters it has converted and you can compute the number of UTF-16 characters it generated by looking at the output. If the second UTF-16 is 0 then only one character was generated, otherwise two characters were generated. See how it's used here.

You can write a similar function that only computes the length of the string so you'll then use that to allocate the proper length here.

Yup. I think that will work! I am going to try that now! Thanks!

Did you have any luck fixing the issue? If you need further help don't hesitate to ask.

Flags: needinfo?(akshay1992kalbhor)

So I have finally managed to fix the problem. I apologise for the delay :(

I have done exactly what you asked me to in your second last comment. The next step is to clean it up a little bit ex. figure out what to name the new variables and decide if we need to delete some of the older code. I am going try to explain the problems below in detail.

  1. What do we name the new variable which holds the number of uint16_t values which we need for the mdstring?

  2. Do we still need the code which calculates the value for mdstring_length? This mostly depends on the difference between the value of thelength argument passed to the WriteStringCore function and the value calculated for mdstring_length.

Flags: needinfo?(akshay1992kalbhor)

Put up a patch for review and we'll discuss the rest there, it's a lot simpler with the code at hand :-)

Pushed by gsvelto@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/9f922038249f Crash pings sometimes have null bytes after module names. r=gsvelto
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 84 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: