Closed Bug 1887833 Opened 9 months ago Closed 9 months ago

Crash in [@ mozilla::dom::cache::db::(anonymous namespace)::DeleteSecurityInfo]

Categories

(Core :: Storage: Cache API, defect, P2)

x86
Windows
defect

Tracking

()

RESOLVED FIXED
126 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox124 --- wontfix
firefox125 --- wontfix
firefox126 --- fixed

People

(Reporter: aryx, Assigned: asuth)

References

(Blocks 1 open bug)

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(1 file, 1 obsolete file)

This affects only early betas (b1-b6). <20 crashes for <=6 installs per beta affected. 85% of crashes with 32-bit builds.

Crash report: https://crash-stats.mozilla.org/report/index/c364aa86-3e6e-4370-8a9a-4f9900240326

MOZ_CRASH Reason: MOZ_DIAGNOSTIC_ASSERT(refcount >= aCount)

Top 10 frames of crashing thread:

0  xul.dll  mozilla::dom::cache::db::  dom/cache/DBSchema.cpp:1649
0  xul.dll  mozilla::dom::cache::db::  dom/cache/DBSchema.cpp:1687
1  xul.dll  mozilla::dom::cache::db::DeleteCacheId  dom/cache/DBSchema.cpp:696
2  xul.dll  mozilla::dom::cache:: const  dom/cache/Manager.cpp:129
2  xul.dll  mozilla::ReduceEach<`lambda at /builds/worker/workspace/obj-build/dist/include/mozilla/dom/quota/QuotaCommon.h:1218:7', mozilla::CheckedInt<long long>, `lambda at /builds/worker/checkouts/gecko/dom/cache/Manager.cpp:129:7'>::<lambda_1>::operator const  dom/quota/QuotaCommon.h:1197
2  xul.dll  mozilla::CollectEach  dom/quota/QuotaCommon.h:1180
2  xul.dll  mozilla::ReduceEach  dom/quota/QuotaCommon.h:1197
2  xul.dll  mozilla::Reduce  dom/quota/QuotaCommon.h:1217
2  xul.dll  mozilla::dom::cache::  dom/cache/Manager.cpp:129
3  xul.dll  mozilla::dom::cache::SyncDBAction::RunWithDBOnTarget  dom/cache/DBAction.cpp:140

FWIW this seems to happen only with Windows 32 Bit builds.

OS: All → Windows
Hardware: All → x86
Severity: -- → S3
Priority: -- → P2

The bug is linked to a topcrash signature, which matches the following criterion:

  • Top 20 desktop browser crashes on beta

:janv, could you consider increasing the severity of this top-crash bug?

For more information, please visit BugBot documentation.

Flags: needinfo?(jvarga)
Keywords: topcrash

Andrew, can you help with deciding if this should be S2 ?

Flags: needinfo?(jvarga) → needinfo?(bugmail)

My understanding is that this is just a diagnostic assertion and this crash will go away, as comment 0 says. But in release we would write negative values to our DB, IIUC. I do not know what consequences this would have, but it sounds like something will never be deleted from the table.

FWIW, I do not think that our recent change to DBAction::RunOnTarget affects things here, as the crash arrives inside RunWithDBOnTarget, which is beyond that check, which once true will be true for all callers.

(In reply to Jens Stutte [:jstutte] from comment #1)

FWIW this seems to happen only with Windows 32 Bit builds.

Unfortunately expanding the time range to 6 months shows Linux and Mac as well as amd64 and arm64, although the crashes are definitely biased towards these older machines.

A notable thing about the canonical crash https://crash-stats.mozilla.org/report/index/c364aa86-3e6e-4370-8a9a-4f9900240326 is that it is taking place during the cleanup of an unclean shutdown.

I don't know if this link encodes what I want it to, but the core idea is that faceting based on proto signature can let us tell what the high level action was. And we get:

  • Marker cleanup: 132 (66%)
    • SetupAction calling DeleteCacheId: 132 = (99 + 23 + 4 + 3 + 3)
  • Active usage: 68 (34%)
    • CachePutAllAction 45 = (25 + 14 + 6)
    • CacheDeleteAction 14 = (9 + 5)
    • DeleteOrphanedCacheAction 9 = (6 + 1 + 1 + 1)

So this is definitely catching an inconsistency and the possible options from more likely to less likely (where there can be multiple things going on amongst those crashes):

  1. Database corruption due to cosmic rays or similar. Everything we did involving the database was perfect, but the numbers don't add up now.
  2. Logic bug related to the context marker cleanup logic; this could manifest both directly during cleanup processing or subsequent actions if the math gets off.
  3. We're seeing inconsistent database state due to insufficient / inconsistent use of database transactions.
  4. Serious logic bug that happens even without shutdown.

I think we can rule out option 4 (big logic bug) based on the rate of incidence. I think option 3 is also unlikely because we explicitly use transactions at the manager level, so even though we currently do manual refcounting of the security data, that integrity should be maintained. Option 2 seems like something we need to take seriously since 2/3 of these crashes are startup recovery crashes and the other 1/3 might be fallout from bungled recovery. Option 1 is always a thing that can happen.

In terms of actions we can/should take, I think we should:

  • On this bug:
    • Drop the severity of the assertion to MOZ_ASSERT_DEBUG_OR_FUZZING and request uplift since this is clearly a thing that happens (although early beta goes away April 1st).
  • In follow-up(s), edit: bug 1888508 now filed for this:

The currently control-flow isn't well-suited to calling WipeDatabase and that would be a bit scary to uplift, plus there may be a correctness problem we'd want to address, hence why I'd suggest we just lower the assertion in this bug, so we can uplift it to beta where we're seeing the crashes.

(In reply to Jens Stutte [:jstutte] from comment #4)

My understanding is that this is just a diagnostic assertion and this crash will go away, as comment 0 says. But in release we would write negative values to our DB, IIUC. I do not know what consequences this would have, but it sounds like something will never be deleted from the table.

At a surface level we're looking at leaking security rows, yes. But depending on what's going on with the bad math, it also seems possible that mozilla::dom::cache::db::ReadResponse could fail in some cases.

FWIW, I do not think that our recent change to DBAction::RunOnTarget affects things here, as the crash arrives inside RunWithDBOnTarget, which is beyond that check, which once true will be true for all callers.

Agreed. That change might slightly increase the incidence of the bug if this is case 2 from above (given that some of the cleanup steps relate to deleted caches which might have otherwise been processed on shutdown), but it wouldn't be the cause of the bug and the shutdown hang timer already means we would be facing that scenario anyways.

Flags: needinfo?(bugmail)
Assignee: nobody → bugmail
Status: NEW → ASSIGNED

We are downgrading this assertion since it's clearly happening and we
want a patch that is upliftable to beta; we will investigate and
address the potential logic problem separately. (It could also just
be flipped memory bits; we need to investigate.)

Pushed by bugmail@asutherland.org: https://hg.mozilla.org/integration/autoland/rev/c6650caabc0c Downgrade Cache API DB inconsistency assertion. r=dom-storage-reviewers,janv
Blocks: 1888508

I filed bug 1888508 for next steps, consistent with comment 5.

We are downgrading this assertion since it's clearly happening and we
want a patch that is upliftable to beta; we will investigate and
address the potential logic problem separately. (It could also just
be flipped memory bits; we need to investigate.)

Original Revision: https://phabricator.services.mozilla.com/D206027

Attachment #9393855 - Flags: approval-mozilla-beta?

Uplift Approval Request

  • Code covered by automated testing: yes
  • Explanation of risk level: The assertion is being downgraded.
  • Needs manual QE test: no
  • Is Android affected?: yes
  • User impact if declined: Some users with unhappy databases are experience MOZ_DIAGNOSTIC_ASSERT crashes on beta until early beta ends.
  • String changes made/needed: none
  • Steps to reproduce for manual QE testing: n/a
  • Fix verified in Nightly: no
  • Risk associated with taking this patch: Extremely low
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 126 Branch
Attachment #9393855 - Attachment is obsolete: true
Attachment #9393855 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: