Closed Bug 1634310 Opened 5 years ago Closed 5 years ago

No database found error for Glean Python SDK version 28.0.0 on Linux

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: raphael, Assigned: mdroettboom)

References

Details

(Whiteboard: [telemetry:glean-rs:m13][glean-py])

Attachments

(1 file)

Link to GitHub pull-request: https://github.com/mozilla/glean/pull/854 5 years ago GitHub Bugzilla PR Linker 41 bytes, text/x-github-pull-request		Details \| Review

Raphael Aurich [:raphael] UTC+01:00

Reporter

Description

•

5 years ago

Log messages

I'm seeing the following error when submitting pings from Python on Linux:

cli2        | DEBUG:burnham.missions:Completed mission 'MISSION G: FIVE WARPS, FOUR JUMPS'
cli2        | DEBUG:burnham.missions:Submitting ping for mission 'MISSION G: FIVE WARPS, FOUR JUMPS'
cli2        | ERROR:glean._dispatcher:Timeout sending Glean telemetry
cli2        | thread '<unnamed>' panicked at 'No database found', glean-core/src/lib.rs:453:10
cli2        | note: run with `RUST_BACKTRACE=1` environment variable to display a backtrace
cli2        | [2020-04-30T09:14:54Z ERROR ffi_support::error] Caught a panic calling rust code: "No database found"
cli2        | thread '<unnamed>' panicked at 'No database found', glean-core/src/lib.rs:453:10
cli2        | [2020-04-30T09:14:54Z ERROR ffi_support::error] Caught a panic calling rust code: "No database found"
cli2        | thread '<unnamed>' panicked at 'assertion failed: error.get_code().is_success()', glean-core/ffi/src/handlemap_ext.rs:125:9
cli2        | fatal runtime error: failed to initiate panic, error 5

Steps to reproduce:

Clone https://github.com/hackebrot/burnham and run:

docker-compose up --build

The client, which produces the error, runs the following inside Docker:

burnham \
--verbose \
--telemetry \
--platform http://platform:5000 \
--test-run "11111111-aaaa-bbbb-cccc-123455555555" \
--test-name "test_cli1" \
--spore-drive "tardigrade-dna" \
"MISSION A: ONE WARP" \
"MISSION B: TWO WARPS" \
"MISSION D: TWO JUMPS" \
"MISSION E: ONE JUMP, ONE METRIC ERROR" \
"MISSION F: TWO WARPS, ONE JUMP" \
"MISSION G: FIVE WARPS, FOUR JUMPS"

Michael Droettboom [:mdroettboom]

Assignee

Comment 1

•

5 years ago

•

Edited

Here is the race condition I've found:

Glean currently has two atexit handlers: (a) to make sure the thread worked completes all of its tasks, and (b) that (among other things) deletes the data directory if it's a tmpdir. atexit handlers are run sequentially on the main thread, but the ordering is based on the order in which they are registered, which is somewhat non-deterministic in Glean.

If (b) runs before (a), the data directory is deleted, and then any operations that might be waiting the thread queue will fail with "Database not found".

The fix is to combine the atexit handlers into one, and join on the thread queue before deleting the tempdir.

This raised another issue in my mind that using a tempdir by default is probably not a good choice, and we are seeing this bug in burnham only because burnham doesn't override the data dir (as other "real" apps, such as mozregression have done). Changing the default to a retained directory probably makes sense, and I created bug 1634410 to track that work.

Whiteboard: [telemetry:glean-rs:m?] → [telemetry:glean-rs:m13][glean-py]

GitHub Bugzilla PR Linker

Comment 2

•

5 years ago

Attached file Link to GitHub pull-request: https://github.com/mozilla/glean/pull/854 — Details

Michael Droettboom [:mdroettboom]

Assignee

Updated

•

5 years ago

Assignee: nobody → mdroettboom

Michael Droettboom [:mdroettboom]

Assignee

Updated

•

5 years ago

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

Raphael Aurich [:raphael] UTC+01:00

Reporter

Updated

•

5 years ago

Bugzilla

No database found error for Glean Python SDK version 28.0.0 on Linux

Categories

(Data Platform and Tools :: Glean: SDK, defect, P3)

Tracking

(Not tracked)

People

(Reporter: raphael, Assigned: mdroettboom)

References

Details

(Whiteboard: [telemetry:glean-rs:m13][glean-py])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Log messages

Steps to reproduce:

Comment 1

Comment 2

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type