Closed Bug 1323150 Opened 8 years ago Closed 8 years ago

Crash [@ ReadDBEntry ]

Categories

(NSS :: Libraries, defect)

x86
Unspecified
defect
Not set
critical

Tracking

(firefox51+ wontfix, firefox52- fixed, firefox53+ fixed)

RESOLVED FIXED
Tracking Status
firefox51 + wontfix
firefox52 - fixed
firefox53 + fixed

People

(Reporter: cbook, Assigned: franziskus)

References

()

Details

(Keywords: crash, Whiteboard: [adv-main52+])

Attachments

(2 files)

Attached file bughunter stack
found via bughunter and reproduced on latest windows aurora opt build (also bughunter reports this for trunk -> beta on opt builds for windows). Seems a windows only crash Steps to reproduce -> Load http://jobs.work-in-bavaria.de/ --> Crash Bughunter flagged this exploitable high. Opt crash id : https://crash-stats.mozilla.com/report/index/4802573c-18ef-452d-adac-e00b22161213
[Tracking Requested - why for this release]: Bughunter opt crash on real world website Franziskus, Kai, i guess this something for you guys. Could you take a look ?
Flags: needinfo?(kaie)
Flags: needinfo?(franziskuskiefer)
Track 51+ as this affected opt build.
Makes sense to track this reproducible crash for 52/53.
I could not repro with a x86_64 build on Windows or Linux, x86 only maybe?
Hardware: Unspecified → x86
Group: core-security → crypto-core-security
This bug was reported on Dec 13. The most recent change of NSS on aurora was an update to this NSS revision: https://hg.mozilla.org/projects/nss/rev/6c26f0cd19ba The uplift was made on Nov 28: https://hg.mozilla.org/releases/mozilla-aurora/file/071d92d89379/security/nss/TAG-INFO https://bugzilla.mozilla.org/show_bug.cgi?id=1305970#c30 If the NSS change is responsible for this crash, then all builds between Nov 29 and Dec 13 should crash, can this be confirmed? Can it be confirmed that Aurora builds before Nov 28 didn't crash? Before Nov 28, aurora was using NSS revision 5d2424d699a0 from Nov 11: https://hg.mozilla.org/projects/nss/rev/5d2424d699a0 https://hg.mozilla.org/releases/mozilla-aurora/file/3f1dcd04bc47/security/nss/TAG-INFO If the above assumptions are correct, the code causing the crash must have landed between the above two NSS revisions. That was a rather big diff, 14000 lines, to view it run (inside an NSS tree): hg diff -r 5d2424d699a0 -r 6c26f0cd19ba However, at first glance, I cannot see any changes to the code that's at the top in the call stack (dbm/softokn) If the above NSS revision really introduced the crash, it must have been caused by the changes to lib/ssl
Flags: needinfo?(kaie)
Carsten, would be able to help confirming which builds crash, and which don't, see my previous comment? Thank you.
Flags: needinfo?(cbook)
I'm looking into this. I can produce a crash with that site. A different one though. There's definitely something going on here that we have to fix.
seeem franziskus got a reproducible crash now , so letting him comment
Flags: needinfo?(cbook)
I'm not getting a crash on trunk. The cert has lots of subjectAltNames. Those that I tested don't show the same issue for me either. There are changes to certificate handling in the range that Kai provided, but this is a TLS 1.2 server and those changes are TLS 1.3 only. Franziskus, I would look at the changes to ssl3_CompleteHandleCertificate() to start with.
This crashes reproducibly on Win10 32-bit. I don't think this is caused by any changes we did. The intermediate certificate on the page is new (Dec 2) and as mt points out pretty large. This seems to be a bug in the legacydb. Trying to map the file to memory at [1] fails. One solution would be to always emulate the mapping here, which seems to solve the issue for me. I'm not entirely sure yet what exactly is going wrong here and there might be a better fix for this but I don't think it's worth spending too much time on this considering that it's the legacydb (Firefox should finally do the switch). [1] https://searchfox.org/nss/rev/8f7b205fa456a79d9b387ccf1717c1a8c64c36b1/lib/softoken/legacydb/dbmshim.c#347
Assignee: nobody → nobody
Component: Security: PSM → Libraries
Flags: needinfo?(franziskuskiefer)
Product: Core → NSS
Version: unspecified → trunk
Attached patch bug1323150.patchSplinter Review
This is the fix I'd suggest. This might influence performance though as we always read the file instead of mapping it to memory. Unfortunately I can't find the reason this is failing and only handle that specific case. I think input from Bob or Wan-Teh could help here.
Assignee: nobody → franziskuskiefer
Attachment #8822144 - Flags: feedback?(wtc)
Attachment #8822144 - Flags: feedback?(rrelyea)
Attachment #8822144 - Flags: feedback?(martin.thomson)
Comment on attachment 8822144 [details] [diff] [review] bug1323150.patch Review of attachment 8822144 [details] [diff] [review]: ----------------------------------------------------------------- It's possible that we're running into memory limits. win32 in particular is hard on RAM and adding a large memory mapped file to that probably chews up a lot. Let's see what Bob has to add.
Comment on attachment 8822144 [details] [diff] [review] bug1323150.patch Review of attachment 8822144 [details] [diff] [review]: ----------------------------------------------------------------- ::: lib/softoken/legacydb/dbmshim.c @@ +333,5 @@ > goto loser; > } > > len = dbs_getBlobSize(data); > + /* PR_MemMap fails on Windows for larger certificates. Let's always use We should at least add a link to this bug report in the comment "PR_MemMap fails on Windows for larger certificates." What is the return value of PR_GetError() after PR_MemMap() returns NULL? This info should be in the bug report. It seems that this is the paging file size limitation mentioned in MSDN: https://msdn.microsoft.com/en-us/library/windows/desktop/aa366761(v=vs.85).aspx If a file mapping object is backed by the paging file (CreateFileMapping is called with the hFile parameter set to INVALID_HANDLE_VALUE), the paging file must be large enough to hold the entire mapping. If it is not, MapViewOfFile fails. The initial contents of the pages in a file mapping object backed by the paging file are 0 (zero). It would be good to confirm that.
Comment on attachment 8822144 [details] [diff] [review] bug1323150.patch Review of attachment 8822144 [details] [diff] [review]: ----------------------------------------------------------------- ::: lib/softoken/legacydb/dbmshim.c @@ +335,5 @@ > > len = dbs_getBlobSize(data); > + /* PR_MemMap fails on Windows for larger certificates. Let's always use > + * the emulated map, i.e. read the file. */ > + addr = dbs_EmulateMap(filed, len); yes, I wondered what WTC does as well. My first thought is to simply always fall back to dbs_EmulateMap() if the map failed (so the majority of the cases we still get the "benefit" of file mapping). OTOH, this is yet another reason to avoid legacydb. It's a time bomb ready to go off (the code is question predates even me, so I have no idea why we are using memory mapping rather than just reading the file). bob
> We should at least add a link to this bug report in the comment "PR_MemMap fails on Windows for larger certificates." yep! > What is the return value of PR_GetError() after PR_MemMap()returns NULL? This info should be in the bug report. > simply always fall back to dbs_EmulateMap() if the map failed The problem is that PR_MemMap does NOT return NULL and doesn't set an error code (not entirely sure about the error code but I don't have the right VM with me to check). That's why I have trouble catching an error and only emulate the map in the error case. > OTOH, this is yet another reason to avoid legacydb. It's a time bomb ready to go off (the code is question predates even me, so I have no idea why we are using memory mapping rather than just reading the file). I totally agree. We should press Firefox harder to get off the legacydb.
Comment on attachment 8822144 [details] [diff] [review] bug1323150.patch Review of attachment 8822144 [details] [diff] [review]: ----------------------------------------------------------------- OK, it looks like you have the feedback you need. When this change lands in m-c, it might pay to watch for Talos regressions though. Please include the requested comments, and maybe a link to the MSDN article Wan-Teh found.
Attachment #8822144 - Flags: feedback?(martin.thomson) → review+
Comment on attachment 8822144 [details] [diff] [review] bug1323150.patch feedback given, patch looks fine based on Franziscus's comments.
Attachment #8822144 - Flags: feedback?(rrelyea) → feedback+
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → 3.29
Too late for 51 at this point. Are we going to want to backport an updated NSS release to 52 for this? Also, can we assign this bug a security rating?
Flags: needinfo?(franziskuskiefer)
Group: crypto-core-security → core-security-release
I don't see a big security risk here. The bug crashes firefox because of Win32 memory limits (which might be a reason to uplift). This landed in 3.29, so will be in FF53. If we want this in 52 and 51, we could make another point release of 3.28 (which is in both versions).
Flags: needinfo?(jcristau)
Flags: needinfo?(gchang)
Flags: needinfo?(franziskuskiefer)
How about we keep this one in mind if we need to bump nss in 52 for another reason, but don't do a point release just for this?
Flags: needinfo?(jcristau)
Flags: needinfo?(gchang)
Tracking "-" this one since it's now tagged as fix-optional. Please let me know if you have any concerns.
Blocks: 1337580
Whiteboard: [adv-main52+]
Group: core-security-release

dbs_EmulateMap() causes a memory leak. Before this bug fix, dbs_EmulateMap() was only used on platforms that don't support memory mapping files. However, after this bug fix, dbs_EmulateMap() is used unconditionally, which leads to this memory leak applying to all systems that use legacydb.

More specifically:
Memory for the "data" DBT object that is populated by db->get() is normally managed by the DB library, and callers of db->get() do not need to free this memory. However, dbs_EmulateMap() in dbmshim.c uses PORT_Alloc() to allocate memory for the blob data, then returns this data from a wrapper that emulates db->get(). After that, nothing ever frees this memory; The caller assumes that this memory is managed by the library and doesn't need to be freed, the underlying database library has no knowledge of this memory allocated to hold the blob, and nothing in dbmshim.c ever subsequently frees it.

Given that the legacydb format is effectively dead, I'm not advocating that this memory leak be fixed ... But since I just wasted an afternoon hunting it down, I wanted to throw this out there somewhere that others might be able to find it.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: