<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Assignee

Comment 5

•

6 years ago

(In reply to [:philipp] from comment #1)

the issue seems to become more common again at the start of the 71.0b cycle.

Maybe this is related to turning on DOH?
Valentin, could you take a look at this? Thanks.

Flags: needinfo?(kershaw) → needinfo?(valentin.gosu)

Updated

•

6 years ago

Group: core-security → network-core-security

Comment 6

•

6 years ago

(In reply to Kershaw Chang [:kershaw] from comment #5)

(In reply to [:philipp] from comment #1)

the issue seems to become more common again at the start of the 71.0b cycle.

Maybe this is related to turning on DOH?
Valentin, could you take a look at this? Thanks.

Looking at the spikes in crashes in beta & release it seems that the crash rate increased in Firefox 71.
Looking at DNS related bugs that landed in 71, there's bug 1587875 and bug 1586845

It's rather clear from bug 1530175, bug 1543331 and similar that we have a bug when removing things from mRecordDB, and maybe bug 1587875 is exercising that code path more often.

Flags: needinfo?(valentin.gosu)

Dragana Damjanovic [:dragana]

Updated

•

6 years ago

Priority: -- → P2

Whiteboard: [necko-triaged]

Updated

•

5 years ago

Assignee: dd.mozilla → valentin.gosu

Comment 7

•

5 years ago

This bug landed in 80 and the crashes seem to have gone away.
https://bugzilla.mozilla.org/show_bug.cgi?id=1649143

https://crash-stats.mozilla.org/report/index/e4f35ca6-70dc-4af8-8ea1-cbf9c0200914

Comment 8

•

5 years ago

(In reply to Valentin Gosu [:valentin] (he/him) from comment #7)

This bug landed in 80 and the crashes seem to have gone away.
https://bugzilla.mozilla.org/show_bug.cgi?id=1649143

There's still one crash report in 80.0.1 🙁

Comment hidden (obsolete)

Jens Stutte [:jstutte]

Comment 12

•

5 years ago

Talking with Valentin, there seem to be the theoretical possibility that double entries in mEvictionQ could lead to this situation. In order to check or exclude this, we could promote this assert to a diagnostic assertion. See also bug 1665979.

Updated

•

5 years ago

Depends on: 1666715

Christian Holler (:decoder)

Comment 13

•

5 years ago

My latest theory about this bug is that it's caused by unlocked removals from the LinkedList<RefPtr<nsHostRecord>>
While the refcounting of nsHostRecord is atomic, mNext and mPrev are unlocked pointers.
As such, we can get into a situation where:
T1: calls .remove on a hostRecord, that causes mNext and mPrev to be updated and decrements the refcount.
T2: calls .clear() which iterates through the linked list and removes each entry - but may be using old values of mNext so it may again decrement the refcount of the entry we just removed.
Later: we try to use the RefPtr we have from the hashtable, only to find that the memory has been freed already.

Comment 14

•

4 years ago

(In reply to Valentin Gosu [:valentin] (he/him) from comment #13)

My latest theory about this bug is that it's caused by unlocked removals from the LinkedList<RefPtr<nsHostRecord>>

Is the code in question Windows only? Otherwise I think this should show up in TSan.

Flags: needinfo?(valentin.gosu)

Comment 15

•

4 years ago

(In reply to Christian Holler (:decoder) from comment #14)

(In reply to Valentin Gosu [:valentin] (he/him) from comment #13)

My latest theory about this bug is that it's caused by unlocked removals from the LinkedList<RefPtr<nsHostRecord>>

Is the code in question Windows only? Otherwise I think this should show up in TSan.

Yes. It only seems to happen on Windows.
But it's possible that D106415 or bug 1513519 might have fixed this too.

Flags: needinfo?(valentin.gosu)

Assignee

Comment 16

•

4 years ago

Attached file Bug 1544190 - Check if the record is in the queue, r=#necko — Details

Assignee

Comment 17

•

4 years ago

Take this from Valentin.

Assignee: valentin.gosu → kershaw

Tom Ritter [:tjr] (OOTO until April)

Assignee

Comment 18

•

4 years ago

Comment on attachment 9224033 [details]
Bug 1544190 - Check if the record is in the queue, r=#necko

Security Approval Request

How easily could an exploit be constructed based on the patch?: Unknown, since we still don't know the root cause of this crash. Also, the crash rate is really low.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
Which older supported branches are affected by this flaw?: all
If not all supported branches, which bug introduced the flaw?: None
Do you have backports for the affected branches?: No
If not, how different, hard to create, and risky will they be?: The risk is low.
How likely is this patch to cause regressions; how much testing does it need?: Should be no rick. This patch only adds some diagnostic assertions.

Attachment #9224033 - Flags: sec-approval?

Comment 19

•

4 years ago

Comment on attachment 9224033 [details]
Bug 1544190 - Check if the record is in the queue, r=#necko

Approved to land and uplift.

Attachment #9224033 - Flags: sec-approval?

Attachment #9224033 - Flags: sec-approval+

Attachment #9224033 - Flags: approval-mozilla-beta+

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 20

•

4 years ago

Check if the record is in the queue, r=necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/65fbc94c6d9b92f87f7f605475ae044e22672c6c
https://hg.mozilla.org/mozilla-central/rev/65fbc94c6d9b

Group: network-core-security → core-security-release

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox91: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 91 Branch

Assignee

Comment 21

•

4 years ago

Sorry that I forgot to add leave-open flag.
The landed patch only adds some diagnostic assertions, not really fix the issue.

Status: RESOLVED → REOPENED

Keywords: leave-open

Resolution: FIXED → ---

Updated

•

4 years ago

Status: REOPENED → ASSIGNED

status-firefox67: affected → wontfix

status-firefox68: ? → wontfix

status-firefox71: affected → wontfix

status-firefox72: affected → wontfix

status-firefox89: --- → wontfix

status-firefox90: --- → fix-optional

status-firefox91: fixed → affected

status-firefox-esr78: --- → affected

Target Milestone: 91 Branch → ---

Comment 22

•

4 years ago

Since these crashes only happen on release, do you think it might be a good idea to turn them into MOZ_RELEASE_ASSERT conditioned on a pref?

Flags: needinfo?(kershaw)

https://hg.mozilla.org/releases/mozilla-beta/rev/21ac90433799

Comment 23

•

4 years ago

uplift

Comment 24

•

4 years ago

Comment on attachment 9224033 [details]
Bug 1544190 - Check if the record is in the queue, r=#necko

Clearing beta approval flag to get this off the needs-uplift queries.

Attachment #9224033 - Flags: approval-mozilla-beta+

Assignee

Comment 25

•

4 years ago

(In reply to Valentin Gosu [:valentin] (he/him) from comment #22)

Since these crashes only happen on release, do you think it might be a good idea to turn them into MOZ_RELEASE_ASSERT conditioned on a pref?

Yes, I think this is a good idea. I'll prepare a patch for this. Thanks.

Flags: needinfo?(kershaw)

Assignee

Comment 26

•

4 years ago

Attached file Bug 1544190 - Using MOZ_RELEASE_ASSERT, r=#necko (obsolete) — Details

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Assignee

Comment 27

•

4 years ago

We've seen this crash in beta. This means that the previous patch with diagnostic assertions is not working. Maybe there is another reason behind this.
We are going to land a refactor patch in bug 1713796 and hope it can help us figure out the root cause of this crash.

Comment 28

•

4 years ago

This causes also crashes with the signature [@ nsHostResolver::MaybeRenewHostRecordLocked], e.g. bp-e0f61756-b9d2-4aea-93b4-3a66a0210612. 4 crashes on Nightly so far.

Phabricator Automation

Updated

•

4 years ago

Attachment #9227190 - Attachment is obsolete: true

Assignee

Comment 29

•

4 years ago

Found a possible reason that causes a record be put into the high queue twice.
Imagine a flow below:

When TRRQuery::DispatchLookup is called, mTrrA and mTrrAAAA are set and we put these two TRR requests in an array.
Right before dispatching TRR requests in the array, TRRQuery::Cancel is called, so both mTrrA and mTrrAAAA are set to null.
The TRR requests in the array are still dispatched.
TRRQuery::CompleteLookup is called by mTrrA. Since mTrrA and mTrrAAAA are null, we assume there is no pending TRR request and mHostResolver->CompleteLookup is called.
TRRQuery::CompleteLookup is called by mTrrAAAA again, and we'll hit the diagnostic assertion.

Assignee

Comment 30

•

4 years ago

Attached file Bug 1544190 - Use a counter to track if there is a pending TRR request, r=#necko — Details

Assignee

Comment 31

•

4 years ago

Comment on attachment 9227525 [details]
Bug 1544190 - Use a counter to track if there is a pending TRR request, r=#necko

Security Approval Request

How easily could an exploit be constructed based on the patch?: Not easy, since this is most likely to be happened during shutdown.
Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
Which older supported branches are affected by this flaw?: all
If not all supported branches, which bug introduced the flaw?: None
Do you have backports for the affected branches?: No
If not, how different, hard to create, and risky will they be?: The risk is low. This issue is already existed for a long time and the crash rate is low.
How likely is this patch to cause regressions; how much testing does it need?: Low risk. We already have tests that exercise this call path.

Attachment #9227525 - Flags: sec-approval?

Daniel Veditz [:dveditz]

Comment 32

•

4 years ago

Comment on attachment 9227525 [details]
Bug 1544190 - Use a counter to track if there is a pending TRR request, r=#necko

sec-approval = dveditz

Attachment #9227525 - Flags: sec-approval? → sec-approval+

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

4 years ago

status-firefox90: fix-optional → wontfix

Comment 33

•

4 years ago

Use a counter to track if there is a pending TRR request, r=necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/9169af3a686327a9b9763c50b9826c4dca5a95d3
https://hg.mozilla.org/mozilla-central/rev/9169af3a6863

Can this bug get closed?

Flags: needinfo?(kershaw)

Assignee

Comment 34

•

4 years ago

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #33)

Use a counter to track if there is a pending TRR request, r=necko-reviewers,valentin
https://hg.mozilla.org/integration/autoland/rev/9169af3a686327a9b9763c50b9826c4dca5a95d3
https://hg.mozilla.org/mozilla-central/rev/9169af3a6863

Can this bug get closed?

I think we should wait a bit more to see if this crash is really fixed.

Flags: needinfo?(kershaw)

Jens Stutte [:jstutte]

Comment 35

•

4 years ago

Removing the signatures that do not happen any more (or only with unsupported versions).

Summary: Crash in [@ arena_t::DallocSmall | BaseAllocator::free | je_free | AddrHostRecord::~AddrHostRecord] → Crash in [@ AddrHostRecord::~AddrHostRecord]

Assignee

Comment 36

•

4 years ago

•

Edited

I think we can close this bug, since we added an MOZ_RELEASE_ASSERT in bug 1717778. This release assertion will be triggered if we add the same record into the linked list twice. Before double releasing the host record, the release assertion should be hit first, so this bug can be closed.

Status: ASSIGNED → RESOLVED

Closed: 4 years ago → 4 years ago

Resolution: --- → FIXED

Updated

•

4 years ago

status-firefox91: affected → fixed

Depends on: 1717778

Target Milestone: --- → 91 Branch

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Keywords: leave-open

Comment 37

•

4 years ago

As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.

Please visit this google form to reply.

Flags: needinfo?(kershaw)

Whiteboard: [necko-triaged] → [necko-triaged][sec-survey]

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

status-firefox-esr78: affected → wontfix

Mihai Boldan, Desktop QA [:mboldan]

Updated

•

4 years ago

QA Whiteboard: [post-critsmash-triage]

Flags: qe-verify-

Tom Ritter [:tjr] (OOTO until April)

Updated

•

4 years ago

Whiteboard: [necko-triaged][sec-survey] → [necko-triaged][sec-survey][adv-main90+r]

Tom Ritter [:tjr] (OOTO until April)

Updated

•

4 years ago

Whiteboard: [necko-triaged][sec-survey][adv-main90+r] → [necko-triaged][sec-survey][adv-main91+r]