1336848 - Intermittent dom/indexedDB/test/test_count.html | Got correct event type - got success, expected upgradeneeded

Reporter

Description

•

7 years ago

treeherder

Filed by: philringnalda [at] gmail.com

https://treeherder.mozilla.org/logviewer.html#?job_id=74646405&repo=mozilla-inbound

https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win32-debug/1486328348/mozilla-inbound_win7_vm-debug_test-mochitest-e10s-2-bm128-tests1-windows-build187.txt.gz

Geoff Brown [:gbrown]

Updated

•

7 years ago

Comment 3

•

7 years ago

:hsinyi, I want to make you aware of this bug, can you find someone to look at this as we are trending 100+ failures/week (this should be easy to reproduce on a windows7 machine).  I would expect to have this fixed or disabled in the next 2 weeks- let me know if you need more information.

Flags: needinfo?(htsai)

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 4

•

7 years ago

Hi Jan and Andrew,
I noticed that you've had experiences in bug 1333273. Will you be able to help out here?

Flags: needinfo?(jvarga)

Flags: needinfo?(htsai)

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Comment 5

•

7 years ago

I'm trying to focus on some critical-path multi-e10s service worker efforts[1]. But... I spent some time looking at the family of IndexedDB intermittent failures earlier today because I was worried that in some of the bugs the trend was noticed to start on Jan 29th and I landed my fix for bug 1319531. However, everything seems to suggest that the problem is something different. I have noticed in many of the bugs the issue seems to be on win7 opt & PGO which suggests a timing-related race.

In particular, I looked at/considered:
- If there were other obvious changes around that time to IndexedDB or quota manager. I didn't see any. (With mercurial it's a bit hard to tell when things actually landed without checking the pushlog, and I couldn't figure out how to limit that to the specific directories or otherwise use it locally, but by going back far enough I didn't see anything obvious.)
- If the fix for bug 1319531 was uplifted to mozilla-aurora and mozilla-beta, but neither of them seem to be showing massive IndexedDB intermittent test failures. If my fix was obviously causing systemic problems, I'd expect there to be an uptick there.
- If the test I added, test_file_put_deleted.html could be causing a problem for later tests. However, in the few failures I looked at, the test was not run, so it can't be affecting other tests.
- Whether my removal of the nulling of DatabaseFile's mBlobImpl and mFileInfo on ActorDestroy could have indirect effects; could keeping either of those references around keep other, more important object instances around? My analysis suggested no, they don't hold any meaningful references that would prevent database closure or anything like that. Also, I'd analyzed the direct effects in the patch and again when there was the resurgence of failures.
- Whether things like the now-failing web-platform tests were using blobs? (If they were, maybe my analysis was wrong and there was more to look into.) AFAICT, no web-platform tests use Blobs. There's one file that could, but that logic is disabled with "if (false)".

1: I do understand that having this many IndexedDB intermittents is a major problem for tree health and a major hassle for developers, so I'm not trying to shirk, but I'm hoping with the recent uptick in people who are hacking on IndexedDB they may be able to investigate without delaying multi-e10s. I believe :qDot has been working on IndexedDB for private-browsing and so may be a good candidate to assist.

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Comment 6

•

7 years ago

(In reply to Andrew Sutherland [:asuth] from comment #5)
> But... I spent some time looking at the family of IndexedDB
> intermittent failures earlier today because I was worried that in some of
> the bugs the trend was noticed to start on Jan 29th and I landed my fix for
> bug 1319531.

... early on January 30th.


Which also does suggest one possible area of investigation is to push-to-try a reversion of the fix from bug 1335054 (a follow-up to bug 1319531) and a reversion of the fix from bug 1319531 and seeing if that somehow magically makes the problem go away.  (I'm overdue for bed right now, or I would push that... apologies.)

Hsin-Yi Tsai (she/her) [:hsinyi]

Comment 7

•

7 years ago

(In reply to Andrew Sutherland [:asuth] from comment #5)
> 1: I do understand that having this many IndexedDB intermittents is a major
> problem for tree health and a major hassle for developers, so I'm not trying
> to shirk, but I'm hoping with the recent uptick in people who are hacking on
> IndexedDB they may be able to investigate without delaying multi-e10s.  I
> believe :qDot has been working on IndexedDB for private-browsing and so may
> be a good candidate to assist.
Thanks Andrew for the comment and sharing your priorities. Let me explore alternatives.

Jan Varga [:janv]

Comment 8

•

7 years ago

(In reply to Andrew Sutherland [:asuth] from comment #6)
> (In reply to Andrew Sutherland [:asuth] from comment #5)
> > But... I spent some time looking at the family of IndexedDB
> > intermittent failures earlier today because I was worried that in some of
> > the bugs the trend was noticed to start on Jan 29th and I landed my fix for
> > bug 1319531.
> 
> ... early on January 30th.
> 
> 
> Which also does suggest one possible area of investigation is to push-to-try
> a reversion of the fix from bug 1335054 (a follow-up to bug 1319531) and a
> reversion of the fix from bug 1319531 and seeing if that somehow magically
> makes the problem go away.  (I'm overdue for bed right now, or I would push
> that... apologies.)

I'll push it to try and we will see.

Bevis Tseng[:bevis][:btseng](Exited)

Comment 9

•

7 years ago

Not sure if they are related.
I have done some analysis to another IDB intermittent bug in bug 1300927 comment 11 for your information.

Comment hidden (Intermittent Failures Robot)

70 failures in 833 pushes (0.084 failures/push) were associated with this bug in the last 7 days. 

This is the #31 most frequent failure this week. 

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **

Repository breakdown:
* mozilla-inbound: 27
* try: 20
* autoland: 15
* mozilla-central: 8

Platform breakdown:
* windows7-32-vm: 69
* windows8-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1336848&startday=2017-02-13&endday=2017-02-19&tree=all

Bevis Tseng[:bevis][:btseng](Exited)

Comment 11

•

7 years ago

(In reply to Jan Varga [:janv] from comment #8)
> I'll push it to try and we will see.
Hi Jan,
Are you looking into this? Or you have already found something suspicious?
I am not sure if bug 1333273 is related to this one.
If no and you are working on bug 1333273, then I can help to follow up this one if you don't have enough bandwidth on this. :)

Jan Varga [:janv]

Comment 12

•

7 years ago

Hi Bevis,
feel free to take it.

Flags: needinfo?(jvarga)

Bevis Tseng[:bevis][:btseng](Exited)

Comment 13

•

7 years ago

After a quick glance, the following 2 test cases are failed in order everytime when symptom happens with the same problem that the onupgradeneeded callback was not triggered but the onsuccess one instead:
[test_complex_keyPaths.html]
[test_count.html]

Maybe we should start from bug 1333273 instead.

It seems that somehow the storage was not cleaned up in previous test in non-e10s if non-e10s goes first!?
However, these db names are unique and the the storage of the corresponding origin shall always be cleared before starting a new test.

I'll add more logs in the call path of IDB.open and QMS.clear to see what's happened when this error risen.

Bevis Tseng[:bevis][:btseng](Exited)

Updated

•

7 years ago

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → DUPLICATE

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Whiteboard: [stockwell disabled]

Comment hidden (Intermittent Failures Robot)

51 failures in 812 pushes (0.063 failures/push) were associated with this bug in the last 7 days. 

This is the #27 most frequent failure this week. 

** This failure happened more than 30 times this week! Resolving this bug is a high priority. **

** Try to resolve this bug as soon as possible. If unresolved for 2 weeks, the affected test(s) may be disabled. **

Repository breakdown:
* mozilla-inbound: 26
* autoland: 17
* mozilla-central: 7
* try: 1

Platform breakdown:
* windows7-32-vm: 50
* windows8-64: 1

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1336848&startday=2017-02-20&endday=2017-02-26&tree=all

Bugzilla

Quick Search

Intermittent dom/indexedDB/test/test_count.html | Got correct event type - got success, expected upgradeneeded

Categories

(Core :: Storage: IndexedDB, defect)

Tracking

()

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Updated

Updated

Comment 15