Closed Bug 1154124 Opened 5 years ago Closed 4 years ago

Stack overflow by a sync call of net::CacheEntry's callbacks in a chain

Categories

(Core :: Networking: Cache, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox40 --- affected
firefox46 --- fixed

People

(Reporter: ws.bugzilla, Assigned: mayhemer)

References

Details

(Keywords: crash)

Crash Data

Attachments

(3 files, 1 obsolete file)

Steps to reproduce:

1. Go to this URL: http://www.geeksofdoom.com/2015/01/13/rockstar-delays-grand-theft-auto-v-pc-release-minimum-specs-4k-screenshots-shared

2. Middle click three of the screenshots at the bottom of this post.

3. Wait. First the browser becomes unresponsive, showing the dialog. "Stop Script" is either unclickable or ineffective. Eventually the browser crashes.

Here are two crash signatures I got for this crash:

https://crash-stats.mozilla.com/report/index/56a058a3-d1aa-4c50-adc6-564392150414
https://crash-stats.mozilla.com/report/index/4f42870b-d66a-452d-bff9-fcf9c2150414

(the latter is a Windows XP VirtualBox VM with Nightly, following the same STR)
OS: Windows 7 → All
Hardware: x86_64 → All
Stack overflow:

mozilla::net::CacheEntryHandle::Release()
nsRefPtr<nsIAtom>::~nsRefPtr<nsIAtom>()
mozilla::net::CacheEntry::InvokeAvailableCallback(mozilla::net::CacheEntry::Callback const&)
mozilla::net::CacheEntry::InvokeCallback(mozilla::net::CacheEntry::Callback&)
mozilla::net::CacheEntry::InvokeCallbacks(bool)
mozilla::net::CacheEntry::InvokeCallbacks()
mozilla::net::CacheEntry::OnHandleClosed(mozilla::net::CacheEntryHandle const*)
mozilla::net::CacheEntryHandle::~CacheEntryHandle()
mozilla::net::CacheEntryHandle::`scalar deleting destructor'(unsigned int)
mozilla::net::CacheEntryHandle::Release()
...
Severity: major → critical
Component: General → Networking: Cache
Keywords: crash, crashreportid
Product: Firefox → Core
Roman, are you using a fresh profile?  If not, can you retry with it?  I was trying to reproduce this bug with Nightly (e10s on and off) and Fx 37 (all fresh profile) but no luck.
Assignee: nobody → honzab.moz
Flags: needinfo?(ws.bugzilla)
nsHttpChannel::OnCacheEntryCheck apparently returns ENTRY_NEEDS_REVALIDATION, hence we go to nsHttpChannel::OnCacheEntryAvailable.  But that doesn't take the entry handle ownership (and returns OK) and also somehow restarts wait for the same entry (ending up added as a callback to the list) so on releasing the handle inside CacheEntry::InvokeAvailableCallback  (line 832, not 802) we do immediately another loop.

How this happens state happens, is absolutely unclear to me.

Anyway, we should be able resilient.  Have to think on a fix.



@Roman, can you please produce an NSPR log while reproducing the bug?  set two environment vars (best in a cmd.exe window):
NSPR_LOG_MODULES=cache2:5,nsHttp:4,timestamp,sync
NSPR_LOG_FILE=\path\to\the\file

and then run firefox with that env.  If you could not reproduce, remove the 'sync' module from the list and retry.

Thanks!
Attached file nspr.7z
The crash reproduces on today's nightly with a clean profile (previously the profile was almost-clean).

Add one step to the STR: close the browser. That seems to be what finally triggers the crash reporter. What I observe when I open the screenshots in new tabs is that the browser becomes very unresponsive, but the crash reporter seems to only be triggered when the browser is closed.

NSPR log (sync module included): attached. This was done on a WinXP VM, because that's where my nightly lives after a couple of incidents with accidentally running the nightly on my main profile. Let me know if you still can't reproduce and require further logs from Win7.
Flags: needinfo?(ws.bugzilla)
Thanks.  Now I know what's going on.  A script schedules an enormous amount of requests to a single URL.  It means there are 100+ consumer waiting for a single entry.  nsHttpChannel::OnCacheEntryCheck returns REVAL_NEEDED but nsHttpChannel::OnCacheEntryAvailable later fails since the channel has been canceled in the mean time.  This leads to processing (no redispatch) of the next callback in chain => stack overflow.

I am in favor of leaving the code as much as possible w/o redispatching (i.e. breaking the stack).  But in this case we need to make an exception, tho there may be more places.

This will not fix the unresponsive script.  e10s turned on would, but still, this is somewhat malicious behavior.
Status: NEW → ASSIGNED
Perfect!

As for e10s, my nightly was pretty unresponsive with e10s enabled - a bit better than v37, but only slightly. Is it supposed to remain perfectly responsive? If so, maybe I should file another bug related to how these pages affect UI responsiveness with e10s enabled?
Duplicate of this bug: 1160824
Crash Signature: [@ choose_arena ] [@ nsACString_internal::Replace(unsigned int, unsigned int, char const*, unsigned int) ]
Keywords: crashreportid
Summary: Reproduceable crash in 37 through nightly, preceded by unresponsive script warning → Reproducible crash preceded by Unresponsive Script warning. high cache hit results in stack overflow
Crash Signature: [@ choose_arena ] [@ nsACString_internal::Replace(unsigned int, unsigned int, char const*, unsigned int) ] → [@ choose_arena ] [@ nsACString_internal::Replace(unsigned int, unsigned int, char const*, unsigned int) ] [@ nsACString_internal::Replace ]
Summary: Reproducible crash preceded by Unresponsive Script warning. high cache hit results in stack overflow → Stack overflow by a sync call of net::CacheEntry's callbacks in a chain
Duplicate of this bug: 1224096
Attached patch v1 (obsolete) — Splinter Review
The fix is actually that easy...
Attachment #8706917 - Flags: review?(michal.novotny)
Comment on attachment 8706917 [details] [diff] [review]
v1

I think we should also prevent the race with closing the output stream that btw invokes (by a dispatch!) CacheEntry::OnOutputClosed() with invokes the callbacks again.

I will update the patch accordingly.
Attachment #8706917 - Flags: review?(michal.novotny)
Attached patch v2Splinter Review
- the phantom output stream can be easily closed and released under the lock
- mOutputStream->Close() invokes callbacks (asynchronously), so no need to do invoke callbacks twice

https://treeherder.mozilla.org/#/jobs?repo=try&revision=6b43e1905e73

-w patch follows
Attachment #8706917 - Attachment is obsolete: true
Attachment #8708032 - Flags: review?(michal.novotny)
Attached patch v2 -wSplinter Review
Attachment #8708032 - Flags: review?(michal.novotny) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/85bc8c6ea180
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Duplicate of this bug: 1033461
You need to log in before you can comment on or make changes to this bug.