Closed Bug 1093064 Opened 5 years ago Closed 2 months ago

Intermittent test_file_resurrection_delete.html,test_file_resurrection_transaction_abort.html | Correct ref count - got 1, expected 0

Categories

(Core :: Storage: IndexedDB, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla70
Tracking Status
firefox-esr60 --- wontfix
firefox-esr68 --- fixed
firefox68 --- wontfix
firefox69 --- wontfix
firefox70 --- fixed
firefox71 --- fixed

People

(Reporter: cbook, Assigned: sg)

References

(Blocks 1 open bug, )

Details

(Keywords: intermittent-failure, regression)

Attachments

(2 files, 2 obsolete files)

Ubuntu VM 12.04 x64 mozilla-inbound debug test mochitest-2

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=3509078&repo=mozilla-inbound

20:48:47 INFO - 1801 INFO TEST-UNEXPECTED-FAIL | /tests/dom/indexedDB/test/test_file_resurrection_delete.html | Correct ref count - got 1, expected 0
Duplicate of this bug: 1093427
OS: Linux → All
Hardware: x86 → All
Summary: Intermittent test_file_resurrection_delete.html | Correct ref count - got 1, expected 0 → Intermittent test_file_resurrection_delete.html,test_file_resurrection_transaction_abort.html | Correct ref count - got 1, expected 0
Any time this fails, look first for something that Terrence pushed, then second for any GC push - that November spike was from the landing-then-backout of bug 1219498, the spike that Obot is going to talk about either tomorrow or this Sunday was from the landing-then-backout of bug 1239099.
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Priority: P3 → P5
Priority: P5 → P3
Priority: P3 → P5
Duplicate of this bug: 1523674
Duplicate of this bug: 1533854

There was a spike in failure frequency here on May 8 -- likely due to the consolidation of dups (failures are now being reported in just one place).

There are 54 total failures in the last 7 days: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2019-05-12&endday=2019-05-19&tree=trunk&bug=1093064

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=247189962&repo=autoland&lineNumber=13521

[task 2019-05-18T12:52:42.853Z] 12:52:42 INFO - TEST-START | dom/indexedDB/test/test_file_resurrection_transaction_abort.html
[task 2019-05-18T12:52:42.857Z] 12:52:42 INFO - GECKO(2214) | [Parent 2214, Main Thread] WARNING: Cannot set transparency mode on non-popup windows.: file /builds/worker/workspace/build/src/widget/gtk/nsWindow.cpp, line 4295
[task 2019-05-18T12:52:42.903Z] 12:52:42 INFO - [2189, Main Thread] WARNING: No active window: file /builds/worker/workspace/build/src/js/xpconnect/src/XPCJSContext.cpp, line 662
[task 2019-05-18T12:52:43.000Z] 12:52:43 INFO - GECKO(2214) | [Parent 2214, Main Thread] WARNING: Cannot set transparency mode on non-popup windows.: file /builds/worker/workspace/build/src/widget/gtk/nsWindow.cpp, line 4295
[task 2019-05-18T12:52:43.056Z] 12:52:43 INFO - GECKO(2214) | ++DOMWINDOW == 6 (0x7f13f2a0cc00) [pid = 2294] [serial = 122] [outer = 0x7f13f50d1020]
[task 2019-05-18T12:52:43.568Z] 12:52:43 INFO - GECKO(2214) | --DOMWINDOW == 5 (0x7f13f2a0e000) [pid = 2294] [serial = 120] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/indexedDB/test/test_file_resurrection_delete.html]
[task 2019-05-18T12:52:43.568Z] 12:52:43 INFO - GECKO(2214) | --DOMWINDOW == 4 (0x7f13f2a0c000) [pid = 2294] [serial = 121] [outer = (nil)] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
[task 2019-05-18T12:52:43.584Z] 12:52:43 INFO - TEST-INFO | started process screentopng
[task 2019-05-18T12:52:43.883Z] 12:52:43 INFO - [2189, Main Thread] WARNING: No active window: file /builds/worker/workspace/build/src/js/xpconnect/src/XPCJSContext.cpp, line 662
[task 2019-05-18T12:52:44.315Z] 12:52:44 INFO - TEST-INFO | screentopng: exit 0
[task 2019-05-18T12:52:44.316Z] 12:52:44 INFO - Buffered messages logged at 12:52:43
[task 2019-05-18T12:52:44.316Z] 12:52:44 INFO - Running
[task 2019-05-18T12:52:44.316Z] 12:52:44 INFO - Pushing preferences
[task 2019-05-18T12:52:44.317Z] 12:52:44 INFO - Pushing permissions
[task 2019-05-18T12:52:44.317Z] 12:52:44 INFO - Clearing old databases
[task 2019-05-18T12:52:44.318Z] 12:52:44 INFO - TEST-FAIL | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | Skipping test in a worker because it's not structured properly
[task 2019-05-18T12:52:44.319Z] 12:52:44 INFO - Running test in main thread
[task 2019-05-18T12:52:44.320Z] 12:52:44 INFO - TEST-PASS | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | Correct byte length
[task 2019-05-18T12:52:44.322Z] 12:52:44 INFO - TEST-PASS | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | Got correct event type
[task 2019-05-18T12:52:44.326Z] 12:52:44 INFO - TEST-PASS | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | Got correct event type
[task 2019-05-18T12:52:44.327Z] 12:52:44 INFO - TEST-PASS | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | Got result
[task 2019-05-18T12:52:44.327Z] 12:52:44 INFO - TEST-PASS | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | Correct db ref count
[task 2019-05-18T12:52:44.329Z] 12:52:44 INFO - TEST-PASS | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | Correct db ref count
[task 2019-05-18T12:52:44.330Z] 12:52:44 INFO - Buffered messages finished
[task 2019-05-18T12:52:44.331Z] 12:52:44 INFO - TEST-UNEXPECTED-FAIL | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | Correct ref count - got 1, expected +0
[task 2019-05-18T12:52:44.331Z] 12:52:44 INFO - SimpleTest.is@SimpleTest/SimpleTest.js:320:16
[task 2019-05-18T12:52:44.331Z] 12:52:44 INFO - testSteps@dom/indexedDB/test/test_file_resurrection_transaction_abort.html:79:5
[task 2019-05-18T12:52:44.332Z] 12:52:44 INFO - continueToNextStep/<@dom/indexedDB/test/helpers.js:189:19
[task 2019-05-18T12:52:44.336Z] 12:52:44 INFO - TEST-PASS | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | Async/await tests shouldn't call finishTest()
[task 2019-05-18T12:52:44.343Z] 12:52:44 INFO - GECKO(2214) | [Parent 2214, QuotaManager IO] WARNING: '!outputStream', file /builds/worker/workspace/build/src/dom/quota/ActorsParent.cpp, line 2046
[task 2019-05-18T12:52:44.343Z] 12:52:44 INFO - GECKO(2214) | [Parent 2214, QuotaManager IO] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/dom/quota/ActorsParent.cpp, line 7081
[task 2019-05-18T12:52:44.344Z] 12:52:44 INFO - GECKO(2214) | [Parent 2214, QuotaManager IO] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/dom/quota/ActorsParent.cpp, line 6952
[task 2019-05-18T12:52:44.345Z] 12:52:44 INFO - GECKO(2214) | [Parent 2214, QuotaManager IO] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/dom/quota/ActorsParent.cpp, line 6862
[task 2019-05-18T12:52:44.346Z] 12:52:44 INFO - GECKO(2214) | MEMORY STAT | vsize 2530MB | residentFast 136MB | heapAllocated 12MB
[task 2019-05-18T12:52:44.346Z] 12:52:44 INFO - TEST-OK | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | took 769ms
[task 2019-05-18T12:52:44.347Z] 12:52:44 INFO - GECKO(2214) | [Parent 2214, Main Thread] WARNING: Cannot set transparency mode on non-popup windows.: file /builds/worker/workspace/build/src/widget/gtk/nsWindow.cpp, line 4295
[task 2019-05-18T12:52:44.349Z] 12:52:44 INFO - GECKO(2214) | ++DOMWINDOW == 5 (0x7f13f2a0d400) [pid = 2294] [serial = 123] [outer = 0x7f13f50d1020]
[task 2019-05-18T12:52:44.349Z] 12:52:44 INFO - GECKO(2214) | [Parent 2214, Main Thread] WARNING: Cannot set transparency mode on non-popup windows.: file /builds/worker/workspace/build/src/widget/gtk/nsWindow.cpp, line 4295

We'll put this near the top of our "after 68 ships" queue :)

Flags: needinfo?(overholt)
Priority: -- → P2

Andrew any updates on this?

Flags: needinfo?(overholt)

68 hasn't shipped yet :)

Flags: needinfo?(overholt)

In the last 7 days, there have been 34 failures on linux64, linux64-qr, linux64-shippable, linux64-shippable-qr, macosx1014-64, windows10-64, windows10-64-shippable platforms, opt and debug build type.

An example of a recent log file:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=257662471&repo=autoland&lineNumber=4015

And the relevant part of the log:
11:13:18 INFO - TEST-PASS | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | Correct db ref count
11:13:18 INFO - Buffered messages finished
11:13:18 INFO - TEST-UNEXPECTED-FAIL | dom/indexedDB/test/test_file_resurrection_transaction_abort.html | Correct ref count - got 1, expected +0
11:13:18 INFO - SimpleTest.is@SimpleTest/SimpleTest.js:320:16
11:13:18 INFO - testSteps@dom/indexedDB/test/test_file_resurrection_transaction_abort.html:79:5
11:13:18 INFO - continueToNextStep/<@dom/indexedDB/test/helpers.js:189:19

:hsinyi, as you are the triage owner of this component, could you please take a look at this?
Thank you!

Flags: needinfo?(htsai)

:hsinyi Did you get the chance to take a look at this? This is still failing, details are in https://bugzilla.mozilla.org/show_bug.cgi?id=1093064#c112

Assignee: nobody → sgiesecke

Thanks Simon!

Flags: needinfo?(htsai)

I was wondering what guarantees scheduleGC could give. While I found no definitive answer to that, I found this comment on https://searchfox.org/mozilla-central/source/testing/specialpowers/content/SpecialPowersAPI.jsm#1222:

  // Due to various dependencies between JS objects and C++ objects, an ordinary
  // forceGC doesn't necessarily clear all unused objects, thus the GC and CC
  // needs to run several times and when no other JS is running.
  // The current number of iterations has been determined according to massive
  // cross platform testing.

a variant of which is ultimately called by scheduleGC.

This sounds as if other than by experimentation, there are no guarantees. So maybe, this is just a counterexample showing that the number of iterations is not sufficient?

Flags: needinfo?(shes050117)

Yeah, we can try to increase the number of iterations. It would be nice if you could verify that locally or on the try server.

(In reply to Simon Giesecke [:sg] from comment #116)

This sounds as if other than by experimentation, there are no guarantees. So maybe, this is just a counterexample showing that the number of iterations is not sufficient?

Yes, it sounds like that. As Jan said, it'd be nice if you can verify that. Maybe, it's a good chance to be familiar with rr chaos mode if you want to verify that in local. Otherwise, you need to re-run the task on try multiple time to see if you really fix the issue.

Flags: needinfo?(shes050117)

I increased the number of GC iterations to 3, and ran the typically failing jobs on Windows and Linux 9 times each on try, and this issue hasn't shown up: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=259197179&revision=69895fccb178752937441a7032e042cb73eb3896

Unfortunately, I am not yet set up to use rr with my VM.

(In reply to Simon Giesecke [:sg] from comment #119)

Patch looks okay to me, but please test your patch more times.

I increased the number of GC iterations to 3, and ran the typically failing jobs on Windows and Linux 9 times each on try, and this issue hasn't shown up: https://treeherder.mozilla.org/#/jobs?repo=try&selectedJob=259197179&revision=69895fccb178752937441a7032e042cb73eb3896

Could you try to retrigger each job at least 20-40 times? The failure rate is "38 failures in 4298 pushes (0.009 failures/push)". So, 9 times seems to be not enough. (Although 20 times are still not enough to guarantee, it's better to test that more times. So that we have more confidence to land the patch)

If you want to do that by "./mach try syntax", there is an option " --rebuild REBUILD Re-trigger all test jobs (up to 20 times)". Or you can retrigger existing tasks manually. Thanks!

Flags: needinfo?(sgiesecke)

Ok, I retriggered them from the UI for ca. 40 more times.

I triggered each of those 67 times now, and none had this failure.

Flags: needinfo?(sgiesecke)
Attachment #9082207 - Attachment description: Bug 1093064 - Removed obsolete code after 1068975 r=ttung → Bug 1093064 - Removed obsolete code after Bug 1068975 r=ttung

(Attachment #9082208 [details])[https://bugzilla.mozilla.org/attachment.cgi?id=9082208] can be landed, the other one is still pending review.

Keywords: checkin-needed

Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2c5b36e69041
Increase number of GC iterations r=ttung,janv

Keywords: checkin-needed

I imagine the patch that landed 11 hours ago to increase the number of gc iterations should help, I don't want to review/disable anything if there is a potential fix- lets wait 3 days.

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
See Also: → 1572982

The issue is still occurring after the fix but the failure rate has dropped significantly from 0.025 to 0.001, so this indicates that the approach of increasing the GC iterations is right, but 3 GC iterations are still not sufficient to cover all cases.

In general, no number of iterations can guarantee the expected condition. Ideally, the tests would somehow be changed to be more deterministic. Is there a deterministic mode of the GC for testing purposes? Otherwise, an alternative to statically increasing the number of GC iterations for all test runs, we could dynamically test the expected condition after each GC iteration, and set the maximum number of GC iterations to a much higher limit, to minimize the chance of failures.

Flags: needinfo?(jvarga)

(In reply to Simon Giesecke [:sg] [he/him] from comment #137)

Otherwise, an alternative to statically increasing the number of GC iterations for all test runs, we could dynamically test the expected condition after each GC iteration, and set the maximum number of GC iterations to a much higher limit, to minimize the chance of failures.

This sounds good to me. It's a good compromise between achieving maximum correctness and not slowing down the tests too much.

Flags: needinfo?(jvarga)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Duplicate of this bug: 1572982
Whiteboard: [stockwell disable-recommended]
Attachment #9082207 - Attachment is obsolete: true
Attachment #9084134 - Attachment is obsolete: true
Keywords: checkin-needed

Pushed by csabou@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/845c6cac2212
Repeat garbage collection until ref count becomes 0 r=janv

Keywords: checkin-needed
Status: REOPENED → RESOLVED
Closed: 3 months ago2 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.