Closed Bug 1635865 Opened 4 years ago Closed 4 years ago

Sentry: ConcurrentModificationException

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mdroettboom, Assigned: mdroettboom)

Details

Attachments

(3 files)

GitHub Pull Request 4 years ago Michael Droettboom [:mdroettboom] 41 bytes, text/x-github-pull-request		Details \| Review
Link to GitHub pull-request: https://github.com/mozilla/glean/pull/900 4 years ago GitHub Bugzilla PR Linker 41 bytes, text/x-github-pull-request		Details \| Review
Link to GitHub pull-request: https://github.com/mozilla/glean/pull/904 4 years ago GitHub Bugzilla PR Linker 41 bytes, text/x-github-pull-request		Details \| Review

Michael Droettboom [:mdroettboom]

Assignee

Description

•

4 years ago

java.util.ConcurrentModificationException: null
    at java.util.LinkedHashMap$LinkedHashIterator.nextNode(LinkedHashMap.java:757)
    at java.util.LinkedHashMap$LinkedKeyIterator.next(LinkedHashMap.java:780)
    at mozilla.telemetry.glean.GleanInternalAPI$initialize$1.invokeSuspend(Glean.kt:16)
    at kotlin.coroutines.jvm.internal.BaseContinuationImpl.resumeWith(ContinuationImpl.kt:2)
    at kotlinx.coroutines.DispatchedTask.run(DispatchedTask.kt:19)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:457)
    at java.util.concurrent.FutureTask.run(FutureTask.java:266)
    at java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:301)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1162)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:636)
    at java.lang.Thread.run(Thread.java:764)

https://sentry.prod.mozaws.net/operations/fenix/issues/7519801/

Michael Droettboom [:mdroettboom]

Assignee

Comment 1

•

4 years ago

There is at least a theoretical cause of this that I can see:

The known ping types in PingTypeQueue are iterated over here, and then the ping is added (again, which should be functionally a no-op) here.

Michael Droettboom [:mdroettboom]

Assignee

Updated

•

4 years ago

Assignee: nobody → mdroettboom

Michael Droettboom [:mdroettboom]

Assignee

Comment 2

•

4 years ago

Attached file GitHub Pull Request — Details

Michael Droettboom [:mdroettboom]

Assignee

Updated

•

4 years ago

Status: NEW → RESOLVED

Closed: 4 years ago

Resolution: --- → FIXED

Chenxia Liu [:liuche]

Comment 3

•

4 years ago

Hi Michael, I was going through the crashes on Nightly, and I'm seeing this signature continue to show up, even in buildids from 5/13.

https://crash-stats.mozilla.org/report/index/1aae6d08-110e-4ad1-b0e4-2c86c0200515

(See "More Reports" and sort by build id for other reports.)

For our 5.0 release, this was a top 3 crasher from Google Play Store (although that was before your fix landed). I don't have as good visibility into Nightly, so I just wanted to raise it in case those reports are helpful.

Feel free to close again if there's other reasons this could be showing up! We'll also know more when we release 5.1 next week.

Status: RESOLVED → REOPENED

Flags: needinfo?(mdroettboom)

Resolution: FIXED → ---

Michael Droettboom [:mdroettboom]

Assignee

Comment 4

•

4 years ago

Thanks @liuche. This has been tricky, because I still haven't been able to reproduce locally -- there's some combination of things folks are hitting the wild here. I will have a renewed look -- I agree that this is high priority.

Flags: needinfo?(mdroettboom)

Michael Droettboom [:mdroettboom]

Assignee

Updated

•

4 years ago

Priority: P3 → P1

Alessio Placitelli [:Dexter]

Updated

•

4 years ago

Whiteboard: [telemetry:glean-rs:m?]

Michael Droettboom [:mdroettboom]

Assignee

Comment 5

•

4 years ago

This Sentry link for Nightly shows the problem persisting with Glean 29.1.0 (which contains the first "fix"): https://sentry.prod.mozaws.net/operations/fenix-nightly/issues/7960114

Michael Droettboom [:mdroettboom]

Assignee

Comment 6

•

4 years ago

Update: I was able to write a unit test that registers pings while Glean is being initialized, but this still doesn't lead to a ConcurrentModificationException:

    @Test
    fun `Initializing while registering pings isn't a race condition`() {
        Glean.testDestroyGleanHandle()

        Dispatchers.API.setTaskQueueing(true)
        Dispatchers.API.setTestingMode(false)

        val stringMetric = StringMetricType(
                disabled = false,
                category = "telemetry",
                lifetime = Lifetime.Application,
                name = "string_metric",
                sendInPings = listOf("store1")
        )
        stringMetric.set("foo")

        for (i in 1..1000) {
        val ping = PingType<NoReasonCodes>(
                name = "race-condition-ping${i}",
                includeClientId = true,
                sendIfEmpty = false,
                reasonCodes = listOf()
        )
            Glean.registerPingType(ping)
        }
        val config = Configuration()
        Glean.initialize(context, true, config)

        val ping = PingType<NoReasonCodes>(
                name = "race-condition-ping",
                includeClientId = true,
                sendIfEmpty = false,
                reasonCodes = listOf()
        )
        while (Dispatchers.API.taskQueue.size > 0) {
            Glean.registerPingType(ping)
        }
    }

Michael Droettboom [:mdroettboom]

Assignee

Comment 7

•

4 years ago

This unit test (very similar to the one above, but registering pings from another coroutine thread rather than the main thread) reproduces the bug more than 50% of the time. Woohoo. Progress... Now to experiment with various mitigations.

    @Test
    fun `Initializing while registering pings isn't a race condition`() {
        Glean.testDestroyGleanHandle()

        Dispatchers.API.setTaskQueueing(true)
        Dispatchers.API.setTestingMode(false)

        val stringMetric = StringMetricType(
                disabled = false,
                category = "telemetry",
                lifetime = Lifetime.Application,
                name = "string_metric",
                sendInPings = listOf("store1")
        )
        stringMetric.set("foo")

        for (i in 1..1000) {
            val ping = PingType<NoReasonCodes>(
                    name = "race-condition-ping${i}",
                    includeClientId = true,
                    sendIfEmpty = false,
                    reasonCodes = listOf()
            )
            Glean.registerPingType(ping)
        }
        val config = Configuration()
        Glean.initialize(context, true, config)

        GlobalScope.launch {
            val ping = PingType<NoReasonCodes>(
                    name = "race-condition-ping", 
                    includeClientId = true,
                    sendIfEmpty = false,
                    reasonCodes = listOf()
            )
            while (Dispatchers.API.taskQueue.size > 0) {
                Glean.registerPingType(ping)
            }
        }
    }
}

GitHub Bugzilla PR Linker

Comment 8

•

4 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/glean/pull/900 — Details

GitHub Bugzilla PR Linker

Comment 9

•

4 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/glean/pull/904 — Details

Michael Droettboom [:mdroettboom]

Assignee

Comment 10

•

4 years ago

Not seeing any reports from Fenix Nightly with Glean 29.1.1 (with the second fix), so closing this. Feel free to reopen if we see them again.

Michael Droettboom [:mdroettboom]

Assignee

Updated

•

4 years ago

Status: REOPENED → RESOLVED

Closed: 4 years ago → 4 years ago

Resolution: --- → FIXED

Comment hidden (collapsed)

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Sentry: ConcurrentModificationException

Categories

(Data Platform and Tools :: Glean: SDK, defect, P1)

Tracking

(Not tracked)

People

(Reporter: mdroettboom, Assigned: mdroettboom)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(3 files)

Description

Comment 1

Updated

Comment 2

Updated

Comment 3

Comment 4

Updated

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Updated

Comment 11

Attachment

General

Description

File Name

Content Type