Closed Bug 1872261 Opened 1 year ago Closed 3 months ago

Crash in [@ RedBlackTree<T>::TreeNode::SetColor] with Trend Micro

Categories

(External Software Affecting Firefox :: Other, defect)

defect

Tracking

(firefox-esr115 wontfix, firefox-esr140 wontfix, firefox143+ fixed, firefox144 fixed, firefox145 fixed)

RESOLVED FIXED
145 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox-esr140 --- wontfix
firefox143 + fixed
firefox144 --- fixed
firefox145 --- fixed

People

(Reporter: merlino37, Assigned: gstoll)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression, topcrash)

Crash Data

Attachments

(3 files)

Crash report: https://crash-stats.mozilla.org/report/index/a7e6eda1-46ca-42c2-a566-1206c0231228

MOZ_CRASH Reason: MOZ_RELEASE_ASSERT(mNode)

Top 10 frames of crashing thread:

0  firefox-bin  RedBlackTree<arena_chunk_map_t, ArenaRunTreeTrait>::TreeNode::SetColor  memory/build/rb.h:182
0  firefox-bin  RedBlackTree<arena_chunk_map_t, ArenaRunTreeTrait>::MoveRedRight  memory/build/rb.h:636
0  firefox-bin  RedBlackTree<arena_chunk_map_t, ArenaRunTreeTrait>::Remove  memory/build/rb.h:533
0  firefox-bin  RedBlackTree<arena_chunk_map_t, ArenaRunTreeTrait>::Remove  memory/build/rb.h:137
0  firefox-bin  arena_t::DallocSmall  memory/build/mozjemalloc.cpp:3702
0  firefox-bin  arena_dalloc  memory/build/mozjemalloc.cpp:3781
0  firefox-bin  BaseAllocator::free  memory/build/mozjemalloc.cpp:4591
0  firefox-bin  Allocator<MozJemallocBase>::free  memory/build/malloc_decls.h:54
0  firefox-bin  free  memory/build/malloc_decls.h:54
1  libxul.so  mozilla::layers::CompositableClient::Release  gfx/layers/client/CompositableClient.h:75

same bug in previous 5 versions

The Bugbug bot thinks this bug should belong to the 'Core::Graphics' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: General → Graphics
Product: Firefox → Core
Crash Signature: [@ RedBlackTree<T>::TreeNode::SetColor ]
Keywords: crash

I looked at about 10 crashes with this signature, none of them had the same stack as the linked crash. There were many distinct stacks, some not in graphics. There are likely multiple different problems under this signature.

The bug has a crash signature, thus the bug will be considered confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true

This feels like the class of bug that might be background-level memory corruption.
I think this is probably not graphics, but rather many stacks will contain graphics, because graphics is pervasive.

Component: Graphics → General
Severity: -- → S3
Component: General → Memory Allocator

The bug is linked to a topcrash signature, which matches the following criteria:

  • Top 20 desktop browser crashes on beta (startup)
  • Top 10 content process crashes on beta

:glandium, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(mh+mozilla)

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit BugBot documentation.

Is it known why the crash volume spiked for Firefox 143 builds?

Crash Signature: [@ RedBlackTree<T>::TreeNode::SetColor ] → [@ _malloc_message] [@ RedBlackTree<T>::TreeNode::SetColor ]

The bug is marked as tracked for firefox143 (release). However, the bug still isn't assigned and has low severity.

:jstutte, could you please find an assignee and increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit BugBot documentation.

Flags: needinfo?(jstutte)

This could be some external DLL thing causing issues? I looked at the correlations tab for release and it has this:

(100.0% in signature vs 01.88% overall) moz_crash_reason = MOZ_RELEASE_ASSERT(mNode)
(100.0% in signature vs 02.57% overall) Module "TmUmEvt64.dll" = true
(100.0% in signature vs 02.57% overall) Module "tmmon64.dll" = true
(100.0% in signature vs 25.69% overall) Module "bcryptprimitives.dll" = true

Google says that tmmon64.dll is associated with Trend Micro UMH Monitor Engine, and that TmUmEvt64.dll "belongs to AMSP UMH module".

Component: Memory Allocator → Other
Product: Core → External Software Affecting Firefox
Summary: Crash in [@ RedBlackTree<T>::TreeNode::SetColor] → Crash in [@ RedBlackTree<T>::TreeNode::SetColor] with Trend Micro
Assignee: nobody → gstoll
Status: NEW → ASSIGNED

firefox-beta Uplift Approval Request

  • User impact if declined: Trend Micro users will experience tab crashes
  • Code covered by automated testing: no
  • Fix verified in Nightly: no
  • Needs manual QE test: yes
  • Steps to reproduce for manual QE testing: Install Trend Micro trial version and verify normal browsing works as expected
  • Risk associated with taking this patch: low
  • Explanation of risk level: Just a content process block of Trend Micro DLLs
  • String changes made/needed: no
  • Is Android affected?: no
Attachment #9514153 - Flags: approval-mozilla-beta?
Flags: qe-verify+

firefox-release Uplift Approval Request

  • User impact if declined: Trend Micro users will experience tab crashes
  • Code covered by automated testing: no
  • Fix verified in Nightly: no
  • Needs manual QE test: yes
  • Steps to reproduce for manual QE testing: Install Trend Micro trial version and verify normal browsing works as expected
  • Risk associated with taking this patch: low
  • Explanation of risk level: Just a content process block of Trend Micro DLLs
  • String changes made/needed: no
  • Is Android affected?: no
Attachment #9514154 - Flags: approval-mozilla-release?
Attachment #9514154 - Attachment description: Bug 1872261 - block Trend Micro DLLs in the content process r=yjuglaret! → Bug 1872261 - block Trend Micro DLLs in the content process r=yjuglaret
Attachment #9514153 - Attachment description: Bug 1872261 - block Trend Micro DLLs in the content process r=yjuglaret! → Bug 1872261 - block Trend Micro DLLs in the content process r=yjuglaret
Attachment #9514153 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9514154 - Flags: approval-mozilla-release? → approval-mozilla-release+
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → 145 Branch

We attempted to reproduce the issue using Firefox 143.0 on both Windows 10 and 11, with Trend Micro Antivirus+ (v. 17.8.1476) installed. While browsing popular webpages and performing copy-paste actions across various websites, we were unable to reproduce the crash.

We also tested the above mentioned scenarios with Firefox 143.0.1 and Firefox 144.0b3 (treeherder build from Comment 18) under the same setup on both Windows 10 and 11, verifying that tmmon64.dll and TmUmEvt64.dll were loaded in about:third-party with the Occurrences value “1”.
Additionally, we noticed that sometimes the “Block this model” option appears in about:third-party only after refreshing the page. @Greg, is this expected?

Testing with the 143.0.1 RC (also with tmmon64.dll and TmUmEvt64.dll blocked) and the 144.0b3 build also did not result in any crashes. However, since we were unable to reproduce the issue in the first place, we are unable to mark this bug as verified.

Flags: qe-verify+ → needinfo?(gstoll)
QA Contact: bhidecuti

(In reply to Bianca Hidecuti, Desktop Test Engineering [:bhidecuti] from comment #21)

Additionally, we noticed that sometimes the “Block this model” option appears in about:third-party only after refreshing the page. @Greg, is this expected?

Yes, this is expected. At the same time that the option appears, you should see a button to the far right containing a downarrow that looks like a v. You can use that button to get more info about a specific DLL entry. Here you could use this to confirm what processes the faulty DLLs are loaded into. The patch is working if there is no "Tab" process listed with status "Loaded" for these two DLLs, even after you load some web pages in some tabs.

Flags: needinfo?(gstoll)

Cleaning up earlier needinfos.

Flags: needinfo?(pbone)
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(jstutte)

(In reply to Yannis Juglaret [:yannis] from comment #22)

(In reply to Bianca Hidecuti, Desktop Test Engineering [:bhidecuti] from comment #21)

Additionally, we noticed that sometimes the “Block this model” option appears in about:third-party only after refreshing the page. @Greg, is this expected?

Yes, this is expected. At the same time that the option appears, you should see a button to the far right containing a downarrow that looks like a v. You can use that button to get more info about a specific DLL entry. Here you could use this to confirm what processes the faulty DLLs are loaded into. The patch is working if there is no "Tab" process listed with status "Loaded" for these two DLLs, even after you load some web pages in some tabs.

Thank you for the reply! I can confirm that there is no "Tab" process listed after loading different webpages (only a "Main" process with the status "Loaded").

Adding some STR for documentation. They seem to reliably to reproduce the issue for me with bad builds. Essentially, open many tabs, wait for a while, and continue using the browser. The number of tabs and time to wait may need to be adjusted per machine.

STR:

  • install Trend Micro Maximum Security (trial version in my case);
  • open Firefox, double-check the presence of Trend Micro DLLs in about:third-party;
  • install the URLs List addon;
  • copy "many" URLs from the top 1000 domains list (for me 100 is a good number, 1000 works for sure, and 25 seems to be rather reliable);
  • paste into the URLs List addon and click Open;
  • leave the computer unattended for 20 minutes to 1 hour (for me 30 min is a good number);
  • navigate between the existing tabs, open new tabs.

Expected behavior: normal navigation.

Bad behavior: at least one of the new tabs or old tabs crashes.

Has STR: --- → yes
Blocks: 1990299

Following mozregression and initial suspicious by :mccr8, I'm marking bug 1970638 as a regressor.

Since the landing of the patch in bug 1970638, after the main thread of a sandboxed content process calls RevertToSelf to start the sandbox, it remains possible to load well-known DLLs. Before that patch, loading any well-known DLL would fail once the sandbox is started.

As noted by :bobowen, the Trend Micro DLLs depend on psapi.dll, which is a well-known DLL that Firefox itself does not load during content process initialization. And based on debugging, the Trend Micro DLLs load on the main thread after Trend Micro queues a user-mode APC.

We suspect that there could be a race condition between the non-deterministic point where the user-mode APC gets called and the call to RevertToSelf that starts the sandbox. More precisely, the crashes could be occuring when the DLLs successfully load after the sandbox is started. Before bug 1970638, the DLLs would fail to load if the sandbox is started, because of their dependency on psapi.dll. After bug 1970638, they will successfully load and they might not expect to be running their initialization code in a sandboxed environment where some initialization calls would fail, ultimately causing crashes.

We have not confirmed this theory, but it sounds like a good starting point for investigation if Trend Micro wants to address this issue in the DLLs directly.

No longer blocks: 1990299
Keywords: regression
Regressed by: 1970638
Blocks: 1990299
QA Whiteboard: [qa-investig-done-c145/b144]
No longer blocks: 1865569
See Also: → 1865569
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: