Closed Bug 1282163 Opened 8 years ago Closed 8 years ago

att.com leaks windows

Categories

(Core :: DOM: Core & HTML, defect, P2)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
e10s - ---
firefox48 + wontfix
firefox49 + fix-optional

People

(Reporter: bkelly, Unassigned)

References

Details

(4 keywords, Whiteboard: [MemShrink:P2])

Attachments

(1 file)

In FF48.0b2 beta in e10s mode I got a leak on the att.com website.  I was navigating through my account pages to try to find a receipt for international data.

The leak looks like this in about:memory:

├───30.89 MB (05.22%) -- window-objects
│   ├──30.57 MB (05.16%) -- top(none)
│   │  ├──30.57 MB (05.16%) -- ghost
│   │  │  ├──12.88 MB (02.17%) ++ window(https://www.att.com/olam/passthroughAction.myworld?actionType=ViewBillHistory)
│   │  │  ├───9.21 MB (01.56%) ++ window(https://www.att.com/scripts/touchcommerce/inqChat.html?IFRAME)
│   │  │  ├───8.34 MB (01.41%) ++ window(https://www.att.com/eos/processOSRequest)
│   │  │  └───0.14 MB (00.02%) ++ window(https://www.att.com/opss/jsp/inProgress.jsp?languageId=E&appId=EOS)
│   │  └───0.00 MB (00.00%) -- detached
│   │      ├──0.00 MB (00.00%) ── window(https://www.att.com/scripts/touchcommerce/inqChat.html?IFRAME)/dom/other [2]
│   │      ├──0.00 MB (00.00%) ── window(https://www.att.com/olam/passthroughAction.myworld?actionType=ViewBillHistory)/dom/other
│   │      └──0.00 MB (00.00%) ── window(https://www.att.com/opss/jsp/inProgress.jsp?languageId=E&appId=EOS)/dom/other

I have the GC and CC logs.  I traced a couple windows and they all seem to be held alive by an HTMLScriptElement root:

$ /c/devel/heapgraph/find_roots.py cc-edges.8412.1466800204.log 00000272B5F57C00
Parsing cc-edges.8412.1466800204.log. Done loading graph.

0000027291223550 [JS Object (Performance)]
    --[UnwrapDOMObject(obj)]--> 00000272BD0DADF0 [DOMEventTargetHelper ]
    --[mParentPerformance]--> 00000272B9779870 [DOMEventTargetHelper https://www.att.com/eos/process
OSRequest]
    --[Preserved wrapper, Preserved wrapper]--> 00000272BF4FB250 [JS Object (Performance)]
    --[global]--> 0000027509A33240 [JS Object (Window)]
    --[CLASS_OBJECT(Object), CLASS_OBJECT(Function), **UNKNOWN SLOT 187**]--> 000002751ABAABE0 [JS O
bject (Proxy)]
    --[private]--> 000002751ABAA8A0 [JS Object (Proxy)]
    --[global, private]--> 0000027509A33600 [JS Object (Window)]
    --[Inq]--> 00000275038E9560 [JS Object (Proxy)]
    --[private]--> 00000275038E9380 [JS Object (Object)]
    --[global]--> 0000027509A339C0 [JS Object (Window)]
    --[UnwrapDOMObject(obj)]--> 00000272B5F57C00 [nsGlobalWindow # 2147488921 inner https://www.att.
com/scripts/touchcommerce/inqChat.html?IFRAME]

    Root 0000027291223550 is a marked GC object.

via persistent-Object :
0000027291223580 [HTMLScriptElement <no private>]
    --[shape]--> 00000272BF487290 [shape]
    --[base]--> 0000027509AB2D30 [base_shape]
    --[global]--> 0000027509A33100 [Window <no private>]
    --[CLASS_OBJECT(Boolean)]--> 0000027291223550 [Performance <no private>]

Olli, do you have any ideas?
Flags: needinfo?(bugs)
I can provide CC logs if anyone wants to look at them.
Seems easily reproduceable.  STR:

1) Open fresh browser
2) Open att.com
3) Log in to your account (sorry for this)
4) Open another content tab to keep the process alive (I used example.com)
5) close att.com
6) In an about:memory minimize and measure memory
7) Note ghost and detached windows in content process

I got this to trigger on the first try following these steps in dev edition.
Better STR:

1) open fresh browser
2) Open att.com
3) Click "myAT&T" in the black navigation bar
4) Open another content tab to keep the process alive (I used example.com)
5) close att.com
6) In an about:memory minimize and measure memory
7) Note ghost and detached windows in content process

There is no need to log in.
Summary: att.com account pages leak windows → att.com leaks windows
We leak through shutdown too.
Keywords: mlk
This appears to be a bad interaction between an async script and a sync XHR.

We end up in nsScriptLoader::OnStreamComplete in the nested event loop of a sync XHR.  We put the nsScriptLoadRequest into mLoadedAsyncRequests.  At the end of OnStreamComplete, we try to ProcessPendingRequests, but it's not ReadyToExecuteScripts() so we don't do anything.

Eventually the sync XHR finishes and we RemoveExecuteBlocker().  But now, there's no mParserBlockingRequest and there's no mPendingChildLoaders, so we don't actually post an event.  And we never appear to call ProcessPendingRequests on this script loader again, so the nsScriptLoadRequest sits in the queue forever and we leak.
Is this a recent regression?
Or would Bug 1281366 help here?
I cannot reproduce in FF45 ESR.
Note, RemoveExecuteBlocker was added in FF49, and this bug was filed on FF48.
I only attempted to reproduce this on Aurora.
I also cannot reproduce in FF47 release.  So seems this was introduced in FF48.
And I reproduced in FF50 from a June 20 build, but cannot reproduce in a June 27 build.  That suggests that bug 1281366 does fix it.
Well, bug 1267989 changed the setup quite a bit for sync XHR.
Flags: needinfo?(bugs)
FWIW, I found some leaked windows on my wife's FF47 which looks similar to comment 0:

$ /c/devel/heapgraph/find_roots.py cc-edges.140.1467123986.log 0x18a879c00
Parsing cc-edges.140.1467123986.log. Done loading graph.

0x1272ad740 [JS Object (Window)]
    --[UnwrapDOMObject(obj)]--> 0x18a879c00 [nsGlobalWindow # 4098 inner https://www.amazon.com/EO-O
rganic-Deodorant-Spray-Citrus/dp/B007763OS0/ref=sr_1_6_a_it?ie=UTF8&qid=1466555485&sr=8-6&keywords=n
atural+spray+deodorant]

    Root 0x1272ad740 is a marked GC object.


bkelly@kosh /c/devel/tmp/cclogs/katylin
$ /c/devel/heapgraph/find_roots.py gc-edges.140.1467123986.log -bro 0x1272ad740
Parsing gc-edges.140.1467123986.log. Done loading graph.

via persistent-Object :
0x1e5da1040 [HTMLScriptElement <no private>]
    --[shape]--> 0x1cb39cb28 [shape]
    --[base]--> 0x1e6d5b830 [base_shape]
    --[global]--> 0x1272ad420 [Window <no private>]
    --[CLASS_OBJECT(Object), CLASS_OBJECT(Function), **UNKNOWN SLOT 184**]--> 0x1d92ee060 [Proxy <no
 private>]
    --[private]--> 0x1d92ee020 [Proxy <no private>]
    --[private]--> 0x1272ad740 [Window <no private>]
Since the leaks seem to get fixed in FF49+ by bug 1281366, lets mark this as a dependency.  Not sure we can fix it for pre-49 since everything changed for sync xhr.
Depends on: 1281366
Yea, it seems amazon also uses a sync xhr to download a script:

     function w(a, b) {
        b.src ? p.ajax({
          url: b.src,
          async: !1,
          dataType: 'script'
        })  : p.globalEval((b.text || b.textContent || b.innerHTML || '').replace(bb, '/*$0*/'));
        b.parentNode && b.parentNode.removeChild(b)
      }
Okay so it seems like this is isolated to 48 (which is beta right now), bug 1281366 has landed on 50 and is nominated for 49. Is there something we can do specifically on beta to work around this?
regression range would be really great here. Since if the regressing change is trivial, perhaps we could back it out rather than taking possibly risky bug 1267989 and 1281366.
Ben it sounds like from comment 18 you can reproduce on Amazon. Can you provide STR for that as that might be easier to repro than on att.com (which requires a log-in).
Flags: needinfo?(bkelly)
I do t have steps for Amazon, but att does not require a log in to trigger.
Flags: needinfo?(bkelly)
(In reply to Ben Kelly [:bkelly] from comment #22)
> I do t have steps for Amazon, but att does not require a log in to trigger.

Ah right, comment 4 indicates no login.
Has STR: --- → yes
Priority: -- → P2
[Tracking Requested - why for this release]:

Seems like a regression in 48. Investigation is ongoing and we may have a patch that could be uplifted in bug 1281366.
Keywords: regression
This would be nice to have, but it's possible we'll be able to uplift bug 1281366 and friends instead. P2 for now.
Keywords: qawanted
Whiteboard: [MemShrink] → [MemShrink:P2]
Tracking for 48/49. The fix from bug 1281366 should make it into aurora at least. The next beta build will be beta 6, next thursday so we have a little time to work it out.
We should not land Bug 1267989 and bug 1281366 to FF48 before we have regression range for this, at least if landing those would happen because of this bug.
100% agreed with comment 27.  Uplifting known-risky patches to beta to probably fix an issue that is not obviously related to them is ... pretty questionable.
I tried to reproduce in current beta with e10s enabled using the steps from comment 4, but can't seem to reproduce.  :(
I'm not able to reproduce either with Fx48b6.
Mozilla/5.0 (Windows NT 10.0; WOW64; rv:48.0) Gecko/20100101 Firefox/48.0
Flags: needinfo?(ryanvm)
We shipped 48 with it.
Happy to take a potential fix (if it is still an actual issue) in 49 but we won't block the release on it
Is this still reproducible?
Flags: needinfo?(bkelly)
I cannot reproduce on FF49.  It appears uplifting bug 1281366 fixed this.
Status: NEW → RESOLVED
Closed: 8 years ago
Flags: needinfo?(bkelly)
Resolution: --- → WORKSFORME
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: