Closed Bug 1336848 Opened 7 years ago Closed 7 years ago

Intermittent dom/indexedDB/test/test_count.html | Got correct event type - got success, expected upgradeneeded

Categories

(Core :: Storage: IndexedDB, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1333273

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell disabled])

See Also: → 1333273
:hsinyi, I want to make you aware of this bug, can you find someone to look at this as we are trending 100+ failures/week (this should be easy to reproduce on a windows7 machine).  I would expect to have this fixed or disabled in the next 2 weeks- let me know if you need more information.
Flags: needinfo?(htsai)
Hi Jan and Andrew,
I noticed that you've had experiences in bug 1333273. Will you be able to help out here?
Flags: needinfo?(jvarga)
Flags: needinfo?(htsai)
Flags: needinfo?(bugmail)
I'm trying to focus on some critical-path multi-e10s service worker efforts[1].  But... I spent some time looking at the family of IndexedDB intermittent failures earlier today because I was worried that in some of the bugs the trend was noticed to start on Jan 29th and I landed my fix for bug 1319531.  However, everything seems to suggest that the problem is something different.  I have noticed in many of the bugs the issue seems to be on win7 opt & PGO which suggests a timing-related race.

In particular, I looked at/considered:
- If there were other obvious changes around that time to IndexedDB or quota manager.  I didn't see any.  (With mercurial it's a bit hard to tell when things actually landed without checking the pushlog, and I couldn't figure out how to limit that to the specific directories or otherwise use it locally, but by going back far enough I didn't see anything obvious.)
- If the fix for bug 1319531 was uplifted to mozilla-aurora and mozilla-beta, but neither of them seem to be showing massive IndexedDB intermittent test failures.  If my fix was obviously causing systemic problems, I'd expect there to be an uptick there.
- If the test I added, test_file_put_deleted.html could be causing a problem for later tests.  However, in the few failures I looked at, the test was not run, so it can't be affecting other tests.
- Whether my removal of the nulling of DatabaseFile's mBlobImpl and mFileInfo on ActorDestroy could have indirect effects; could keeping either of those references around keep other, more important object instances around?  My analysis suggested no, they don't hold any meaningful references that would prevent database closure or anything like that.  Also, I'd analyzed the direct effects in the patch and again when there was the resurgence of failures.
- Whether things like the now-failing web-platform tests were using blobs?  (If they were, maybe my analysis was wrong and there was more to look into.)  AFAICT, no web-platform tests use Blobs.  There's one file that could, but that logic is disabled with "if (false)".

1: I do understand that having this many IndexedDB intermittents is a major problem for tree health and a major hassle for developers, so I'm not trying to shirk, but I'm hoping with the recent uptick in people who are hacking on IndexedDB they may be able to investigate without delaying multi-e10s.  I believe :qDot has been working on IndexedDB for private-browsing and so may be a good candidate to assist.
Flags: needinfo?(bugmail)
(In reply to Andrew Sutherland [:asuth] from comment #5)
> But... I spent some time looking at the family of IndexedDB
> intermittent failures earlier today because I was worried that in some of
> the bugs the trend was noticed to start on Jan 29th and I landed my fix for
> bug 1319531.

... early on January 30th.


Which also does suggest one possible area of investigation is to push-to-try a reversion of the fix from bug 1335054 (a follow-up to bug 1319531) and a reversion of the fix from bug 1319531 and seeing if that somehow magically makes the problem go away.  (I'm overdue for bed right now, or I would push that... apologies.)
(In reply to Andrew Sutherland [:asuth] from comment #5)
> 1: I do understand that having this many IndexedDB intermittents is a major
> problem for tree health and a major hassle for developers, so I'm not trying
> to shirk, but I'm hoping with the recent uptick in people who are hacking on
> IndexedDB they may be able to investigate without delaying multi-e10s.  I
> believe :qDot has been working on IndexedDB for private-browsing and so may
> be a good candidate to assist.
Thanks Andrew for the comment and sharing your priorities. Let me explore alternatives.
(In reply to Andrew Sutherland [:asuth] from comment #6)
> (In reply to Andrew Sutherland [:asuth] from comment #5)
> > But... I spent some time looking at the family of IndexedDB
> > intermittent failures earlier today because I was worried that in some of
> > the bugs the trend was noticed to start on Jan 29th and I landed my fix for
> > bug 1319531.
> 
> ... early on January 30th.
> 
> 
> Which also does suggest one possible area of investigation is to push-to-try
> a reversion of the fix from bug 1335054 (a follow-up to bug 1319531) and a
> reversion of the fix from bug 1319531 and seeing if that somehow magically
> makes the problem go away.  (I'm overdue for bed right now, or I would push
> that... apologies.)

I'll push it to try and we will see.
Not sure if they are related.
I have done some analysis to another IDB intermittent bug in bug 1300927 comment 11 for your information.
(In reply to Jan Varga [:janv] from comment #8)
> I'll push it to try and we will see.
Hi Jan,
Are you looking into this? Or you have already found something suspicious?
I am not sure if bug 1333273 is related to this one.
If no and you are working on bug 1333273, then I can help to follow up this one if you don't have enough bandwidth on this. :)
Hi Bevis,
feel free to take it.
Flags: needinfo?(jvarga)
After a quick glance, the following 2 test cases are failed in order everytime when symptom happens with the same problem that the onupgradeneeded callback was not triggered but the onsuccess one instead:
[test_complex_keyPaths.html]
[test_count.html]

Maybe we should start from bug 1333273 instead.

It seems that somehow the storage was not cleaned up in previous test in non-e10s if non-e10s goes first!?
However, these db names are unique and the the storage of the corresponding origin shall always be cleared before starting a new test.

I'll add more logs in the call path of IDB.open and QMS.clear to see what's happened when this error risen.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Whiteboard: [stockwell disabled]
You need to log in before you can comment on or make changes to this bug.