Closed Bug 1182961 Opened 4 years ago Closed 4 years ago

Use nsTHashTable::Iterator in netwerk/

Categories

(Core :: Networking, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla42
Tracking Status
firefox42 --- fixed
firefox43 --- fixed

People

(Reporter: njn, Assigned: njn)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 5 obsolete files)

9.70 KB, patch
michal
: review+
Details | Diff | Splinter Review
10.60 KB, patch
michal
: review+
Details | Diff | Splinter Review
6.59 KB, patch
michal
: review+
Details | Diff | Splinter Review
2.33 KB, patch
michal
: review+
Details | Diff | Splinter Review
16.47 KB, patch
michal
: review+
Details | Diff | Splinter Review
Because iterators are so much nicer than enumerate functions.

There are fiftenn occurrences of EnumerateEntries() in netwerk/ to be dealt with.
Assignee: nobody → n.nethercote
Attachment #8639158 - Flags: review?(michal.novotny)
Attachment #8639159 - Flags: review?(michal.novotny)
Attachment #8639160 - Flags: review?(michal.novotny)
Attachment #8639161 - Flags: review?(michal.novotny)
Attachment #8639164 - Flags: review?(michal.novotny)
(In reply to Nicholas Nethercote [:njn] from comment #1)
> Created attachment 8639158 [details] [diff] [review]
> (part 1) - Use nsTHashtable::Iterator in CacheIndex

There's something wrong with the change I've made to ProcessPendingOperations(). I'm getting assertion failures ('debugStats == mIndexStats') on try, e.g. https://treeherder.mozilla.org/#/jobs?repo=try&revision=9b3ebf57184c

I'm also getting some ASan failures (use after frees) in one of the later patches, as shown by https://treeherder.mozilla.org/#/jobs?repo=try&revision=fde1b4ed4724. I haven't looked closely at these failures yet.

This is surprising. I've done more than 100 of these enumeration-to-iteration conversions now and this is the first one I've had significant trouble with. They're generally very mechanical. There must be something subtle about this code that I'm unaware of.
The ASAN failures helped me understand what was going wrong. The problem was in cases where the CacheIndexEntryAutoManage class was used in conjunction with PL_DHASH_REMOVE. My patches changed those cases to use iter.Remove(), which changed the ordering of things.

- In the old code, CacheIndexEntryAutoManage was destroyed prior to the element being removed.

- In the new code, CacheIndexEntryAutoManage was destroyed after the element was removed.

I've changed my patches to use block-scoping around the CacheIndexEntryAutoManage instances to reinstate the original ordering. I have a try run going and if all goes well I'll post new patches once it finishes.
Attachment #8639158 - Attachment is obsolete: true
Attachment #8639158 - Flags: review?(michal.novotny)
Attachment #8639159 - Attachment is obsolete: true
Attachment #8639159 - Flags: review?(michal.novotny)
Attachment #8639160 - Attachment is obsolete: true
Attachment #8639160 - Flags: review?(michal.novotny)
Attachment #8639161 - Attachment is obsolete: true
Attachment #8639161 - Flags: review?(michal.novotny)
Attachment #8639164 - Attachment is obsolete: true
Attachment #8639164 - Flags: review?(michal.novotny)
Attachment #8640295 - Flags: review?(michal.novotny) → review+
Comment on attachment 8640293 [details] [diff] [review]
(part 2) - Use nsTHashtable::Iterator in CacheIndex

Review of attachment 8640293 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheIndex.cpp
@@ +2318,5 @@
> +         LOGSHA1(entry->Hash())));
> +
> +    CacheIndexEntry* entry2 = mIndex.GetEntry(*entry->Hash());
> +    {
> +      CacheIndexEntryAutoManage emng(entry->Hash(), this);

iter.Remove() removes the entry from mTmpJournal. CacheIndexEntryAutoManage looks only into mIndex and mPendingUpdates so it's not necessary to put it into a block scope.
Attachment #8640293 - Flags: review?(michal.novotny) → review+
Attachment #8640296 - Flags: review?(michal.novotny) → review+
Michal, you reviewed three patches but did not review (or, did not set a review result) on two. Was this deliberate?
Flags: needinfo?(michal.novotny)
Status: NEW → ASSIGNED
I didn't understand what causes the problem that you described in comment #7. Ordering in the old code was unwanted behavior and methods DoNotSearchInIndex() and DoNotSearchInUpdates() are there to cope with it. It would be great to get rid of them now when the ordering has changed. It took me a while to figure out what causes the wrong statistics. CacheIndexEntryAutoManage hold just a pointer to the hash which is freed when the entry is released, so the hash is no longer valid in the destructor. I would prefer to change it so that CacheIndexEntryAutoManage copies the hash, then methods DoNotSearchIn...() can be removed and no block-scoping is needed.
Flags: needinfo?(michal.novotny)
(In reply to Michal Novotny (:michal) from comment #16)
> I would prefer to change it so that CacheIndexEntryAutoManage
> copies the hash, then methods DoNotSearchIn...() can be removed and no
> block-scoping is needed.

Can we do that as a follow-up? As it stands, these patches are just mechanical refactorings that don't require any netwerk/-specific knowledge. In contrast, changing CacheIndexEntryAutoManage in the way you suggest is scope creep that does require netwerk/-specific knowledge.
Flags: needinfo?(michal.novotny)
Attachment #8640294 - Flags: review?(michal.novotny) → review+
Comment on attachment 8640292 [details] [diff] [review]
(part 1) - Use nsTHashtable::Iterator in CacheIndex

Review of attachment 8640292 [details] [diff] [review]:
-----------------------------------------------------------------

::: netwerk/cache2/CacheIndex.cpp
@@ +1490,5 @@
>  
> +  for (auto iter = mPendingUpdates.Iter(); !iter.Done(); iter.Next()) {
> +    CacheIndexEntryUpdate* update = iter.Get();
> +
> +    LOG(("CacheFile::ProcessPendingOperations() [hash=%08x%08x%08x%08x%08x]",

Please fix the existing typo CacheFile -> CacheIndex.
Attachment #8640292 - Flags: review?(michal.novotny) → review+
(In reply to Nicholas Nethercote [:njn] from comment #17)
> Can we do that as a follow-up? As it stands, these patches are just
> mechanical refactorings that don't require any netwerk/-specific knowledge.
> In contrast, changing CacheIndexEntryAutoManage in the way you suggest is
> scope creep that does require netwerk/-specific knowledge.

OK, I filed bug 1190839.
Flags: needinfo?(michal.novotny)
Blocks: 1190839
Depends on: 1191907
I backed out all five patches due to problems:

https://hg.mozilla.org/integration/mozilla-inbound/rev/d7734e7adeb2

For bug 1191447 it looks like I screwed up ProcessPendingOperations().

For bug 1191907 it looks like I screwed up CacheIndex::WriteRecords().

I'll probably break these patches up into smaller pieces, so there is one enumeration-to-iteration change per patch, and then land them more gradually. I'll also look closely at the screwed-up changes and won't re-land them until I see what went wrong.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
> For bug 1191447 it looks like I screwed up ProcessPendingOperations().

I was able to reproduce the assertion just once, which wasn't enough to work out which patch is at fault. Part 1 definitely looks most likely, though.

> For bug 1191907 it looks like I screwed up CacheIndex::WriteRecords().

I was able to reproduce this fairly reliably, and I'm pretty sure this is right and part 2 is the problem.
(In reply to Pulsebot from comment #25)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/322a318dafa8
> https://hg.mozilla.org/integration/mozilla-inbound/rev/5b6b7472ff5a

I relanded parts 4 and 5 because I'm confident they weren't related to either of the problems.
https://hg.mozilla.org/mozilla-central/rev/322a318dafa8
https://hg.mozilla.org/mozilla-central/rev/5b6b7472ff5a
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Is this supposed to be RESOLVED-FIXED right now?
Flags: needinfo?(n.nethercote)
Status: RESOLVED → REOPENED
Flags: needinfo?(n.nethercote)
Resolution: FIXED → ---
Keywords: leave-open
(In reply to Pulsebot from comment #29)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/2e4b173197e2

That's part 3 re-landing, because I established locally that it isn't related to the problems.
> ::: netwerk/cache2/CacheIndex.cpp
> @@ +2318,5 @@
> > +         LOGSHA1(entry->Hash())));
> > +
> > +    CacheIndexEntry* entry2 = mIndex.GetEntry(*entry->Hash());
> > +    {
> > +      CacheIndexEntryAutoManage emng(entry->Hash(), this);
> 
> iter.Remove() removes the entry from mTmpJournal. CacheIndexEntryAutoManage
> looks only into mIndex and mPendingUpdates so it's not necessary to put it
> into a block scope.

It is necessary. I don't understand why because I don't really understand this code, but I have determined that the lack of the block scope around this CacheIndexEntryAutoManage is what caused bug 1191447.
(In reply to Pulsebot from comment #32)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/087e4e515ebe

That's part 1 re-landing, because I established locally that it isn't related to the problems. In other words, both the problems came from part 2.
> I have determined that the lack of the block scope around
> this CacheIndexEntryAutoManage is what caused bug 1191447.

And now I've confirmed that it also caused by 1191907.

I'll re-land part 2 with the extra scope in the next day or two. I will wait so that it'll end up in a different Nightly to parts 1 and 3, just to be cautious.
Keywords: leave-open
(In reply to Pulsebot from comment #36)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/9091e62b0646

That's part 2. All the patches have now re-landed.
https://hg.mozilla.org/mozilla-central/rev/9091e62b0646
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Nathan, how long do you need to verify this is fixed before I land bug 1082735?  It touches the same code, so regression hunt would be harder.
Flags: needinfo?(n.nethercote)
(In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #39)
> Nathan, how long do you need to verify this is fixed before I land bug
> 1082735?  It touches the same code, so regression hunt would be harder.

I'm Nick, not Nathan :)

I'm confident it's fixed. I was able to reproduce the problems locally and the new versions didn't trigger them. Also, both problems were reported pretty much as soon as the code hit Nightly last time. So as long as it's been present in Nightly for a full day without complaint you should be ok. I'm not sure when Nightly is built from mozilla-central, but given that it landed on mozilla-central early yesterday, I think 24 hours from now should be safe, and 48 hours should be super-safe.

Sorry for causing the delay.
Flags: needinfo?(n.nethercote)
(In reply to Nicholas Nethercote [:njn] from comment #40)
> (In reply to Honza Bambas (not reviewing) (:mayhemer) from comment #39)
> > Nathan, how long do you need to verify this is fixed before I land bug
> > 1082735?  It touches the same code, so regression hunt would be harder.
> 
> I'm Nick, not Nathan :)
> 
> I'm confident it's fixed. I was able to reproduce the problems locally and
> the new versions didn't trigger them. Also, both problems were reported
> pretty much as soon as the code hit Nightly last time. So as long as it's
> been present in Nightly for a full day without complaint you should be ok.
> I'm not sure when Nightly is built from mozilla-central, but given that it
> landed on mozilla-central early yesterday, I think 24 hours from now should
> be safe, and 48 hours should be super-safe.
> 
> Sorry for causing the delay.

Sorry for name mismatch :D  Thanks, I'll land my patch in few days.
You need to log in before you can comment on or make changes to this bug.