Closed Bug 936784 Opened 11 years ago Closed 10 years ago

Firefox crashes while left unattended, memory climbs, due to webaudio leak

Categories

(Core :: Web Audio, defect)

25 Branch
x86_64
Windows 7
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla29
blocking-b2g koi+
Tracking Status
firefox26 --- wontfix
firefox27 + verified
firefox28 + verified
firefox29 --- verified
firefox-esr24 --- unaffected
b2g-v1.2 --- fixed
b2g-v1.3 --- fixed

People

(Reporter: codacodercodacoder, Assigned: padenot)

References

Details

(Keywords: crash, regression, Whiteboard: [MemShrink:P2])

Attachments

(11 files)

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0 (Beta/Release)
Build ID: 20131025151332

Steps to reproduce:

Like the subject implies, nothing.  DevTools were open too but essentially idle.


Actual results:

https://crash-stats.mozilla.com/report/index/0250b702-e73b-47e0-b73f-c97ac2131109

https://crash-stats.mozilla.com/report/index/e25ede62-ea51-4be8-a974-eb6142131109
Leave desk, make coffee, walk back to desk - 10 mins?

https://crash-stats.mozilla.com/report/index/962cfe4c-f6ff-4954-a085-0f3002131109

FF25 is unusable
Aurora (26) is borked - no profiler, missing scrollbars in debugger...  I'm now using FF24 and advising all customers and clients to do the same.
There is the DLL hooxpot.dll injected in firefox.exe, it's a part of the virtual desktop Dexpot. Could you uninstall this application and test again, please.

NB: providing empty crash reports doesn't help.
Severity: normal → critical
Flags: needinfo?(russgthomas)
Keywords: crash
> There is the DLL hooxpot.dll injected in firefox.exe, it's a part of the virtual desktop Dexpot. Could you uninstall this application and test again, please.

Yes, I *could*, but others in our org also see the same issues since upgrading yesterday - they do not use dexpot.

> NB: providing empty crash reports doesn't help.

Doesn't help who - you? Providing the information that the crash reporter is unable to build a stack *is* information. Asking users to "filter" crashes and reports, IMO, is unwise.

Remember, Moz don't pay users like me, be grateful when some of us choose to help you.

So choose: 
1 - send crash reports. Period.
2 - Send none.
Flags: needinfo?(russgthomas)
I'm not paid too. Just click on the crash link and if it's empty, discard it. 
There is one useful CR anyway:
https://crash-stats.mozilla.com/report/index/e25ede62-ea51-4be8-a974-eb6142131109
Perhaps my point was a little to subtle - I am not going to spend any time filtering crash reports. That is not my job, sounds like you don't want it to be your job either?  A significant part of my day is already taken up by debugging firefox instead of fixing the bugs in my own code (for which I *am* being paid) made no easier by broken firefox DevTools.  Seriously, running firefox with devtools debug logging enabled is not exactly ideal - or efficient.

I repeat the above, take 'em or leave 'em.
Did you try to repro the crash in safe mode and with a new profile? (as it seems to crash even when Firefox is idle, you can test that easily)
Oh boy... I think you'll find I've installed FF24 if you read back.

I can tell you that Aurora does NOT crash when left idle (FF26).  However, since Aurora has so many other issues relating to the DevTools, that's not a viable dev environment at present (and obviously isn't something I can advise my users to use).

All this time we've been swapping comments, both Aurora and FF24 have been sitting idle with my app loaded. My take away from this regarding FF25 is: bad tests, period.
Summary: Firefox crashes while left unattended → Firefox crashes while left unattended, potentially due to hooxpot.dll
Redownloaded FF25, used profilemanager to create a -no-remote for it, ensured all plugins were uptodate (flash and adobe pdf).  

Updated Aurora (FF27).  

Updated dexpot. (1.6.0) (This is a red herring IMO)

Fired up FF24, FF25 and FF27 same app, same URL.

While typing this and the above had been running for around 20 mins, FF25 crashed...

FF25 crash report after uptime of 2242: (EMPTY!)
https://crash-stats.mozilla.com/report/index/aab0982a-7c9e-4d4b-8ba6-617272131110

FF24 and Aurora still running.
cc'ing Nick. The crash report mentioned in C#5 looks like it's coming from our memory system. Do you have anyone who could take a look into this?

I'm suspicious of the reporter's environment as he's been reporting bugs with devtools we're just not seeing in any quantity.

Russ, have you tried running Fx25 with a completely new profile?
I thought that's what I just tried?
NB - does NOT occur in Aurora
Rob, I saw that memory reference too, so, restarted the experiment depicted in #10 and watched firefox.exe in Windows Task Manager...

FF24 stable at 144,860K
FF25 CLIMBING at 844,nnnK approx rate of 1M/sec (now 904nnn at I type)

same URL in both.
[10:08:40.938] Could not read chrome manifest 'file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox%2025/chrome.manifest'.
[10:08:40.981] Could not read chrome manifest 'file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox%2025/browser/extensions/%7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D/chrome.manifest'.
[10:08:40.982] While creating services from category 'profile-after-change', could not create service for entry 'Disk Space Watcher Service', contract ID '@mozilla.org/toolkit/disk-space-watcher;1'
[10:08:51.535] 1384099731535	Services.HealthReport.HealthReporter	WARN	Saved state file does not exist.

[10:08:51.535] 1384099731535	Services.HealthReport.HealthReporter	WARN	No prefs data found.
The same memory climb occurs in safe mode, too.  We can dismiss plugins... anything else that this "means"?

Then I realised, I'm always running inside my test harness which wraps the app in an iframe.  When the app runs outside the test env, memory climb STOPS.

The question becomes (for me) what am I doing in the test env that:
  - has never had an issue before (<FF25),
  - breaks FF25,
  - does not break >FF25 (FF26, 27/Aurora)

Of course, there is an assumption here... that the memory climb is the cause of (or leads to) the crash.  Comments?
The memory climbing is consistent with the non-empty out-of-memory crash in comment 5.  The fact that it happens in safe mode also rules out extensions and (I think) hardware acceleration issues (which are reponsible for a surprising fraction of problems of this nature).

Russ, can you please do the following with FF25:
- Visit about:memory (type it in the address bar) in a separate tab.
- Take two memory measurements using the "Measure and save..." button -- one at the start, and one after memory consumption has been climbing for a while but before it crashes.
- Attach the two files to this bug.

It probably doesn't matter if you do it in safe mode, but using a new (or almost new) profile would be helpful, just keep things simpler.

We can then do a diff of the two files (using about:memory's "Load and diff" button, you can try it too if you want).  That's the most likely way we'll make progress here.  Thanks.

(BTW, the messages in comment 15 are harmless and not relevant, AFAIK.)
Whiteboard: [MemShrink]
Will do.  I had already taken a peek in there (in a previous issue, Rob had asked me to send him memory reports) but to my uneducated eyes, comparing FF24, FF25 and FF27 (Aurora) (measure, measure, measure, over and over) I saw nothing that stood out.  But, as you know, there's a lot to digest in there... :/

And yes, shiny new profile :)

And btw, my laptop behaves the exactly same (non-dev environment, FF25, email, IMs... twitter... ****... you get the idea).  I wanted to rule out as much suspicious tooling as possible (not least any wonderings Rob may be having re comment 11). 

Back soon, hopefully.
Oh, one more thing Nick...
While in about:memory, hitting "Minimize memory usage" has an obvious effect against both FF24 and 27.  No (noticeable) effect at all on 25. So what ever this is, the browser will not GC it.
And Nick, for Aurora, the session described in comment 16 is STILL running, sitting idle, same URL, memory climbing very slowly in 8K chunks (now at 505MB).
Attachment #829944 - Flags: feedback+
Attachment #829945 - Flags: feedback+
Russ, thanks for the data.  Unfortunately, the diff isn't helpful:

400.96 MB (100.0%) -- explicit
├──381.23 MB (95.08%) ── heap-unclassified
├───18.56 MB (04.63%) -- heap-overhead
│   ├──22.22 MB (05.54%) ── waste
│   └──-3.65 MB (-0.91%) ── page-cache
└────1.17 MB (00.29%) ++ (5 tiny)

It's mostly "heap-unclassified", which corresponds to memory consumption that we don't have explicit measurement of within Firefox.

I'm running low on ideas for how to investigate this further.
Frustrating. I guess I'll just "skip" 25.  Thanks for your help, Nick.

Anyone else have any ideas?
If you can be bothered redoing the memory dumps for FF26, it would be good to check if that slow increase in memory consumption is also falling under "heap-unclassified", or somewhere more useful.
Nick, did anything specific change in FF25 wrt our garbage collector. From Russ' emails this weekend, it sounds like his monolothic webapp's testing harness is leaking memory. If this was handled in FF24 and now better in 26, I'm wondering if changes to our generational GC might have missed something in that timeframe.

Of course, without additional debugging access this is going to be very difficult to pin down.

Russ, you said your test harness is running something in an interval. Can you describe what's on those intervals? Are you caching data, dom nodes, ... stuff? Can you give us enough detail so we can write a test based on your information?
Keywords: qawanted
Target Milestone: --- → Firefox 25
(In reply to Rob Campbell [:rc] (:robcee) from comment #26)
> Nick, did anything specific change in FF25 wrt our garbage collector. From
> Russ' emails this weekend, it sounds like his monolothic webapp's testing
> harness is leaking memory. If this was handled in FF24 and now better in 26,
> I'm wondering if changes to our generational GC might have missed something
> in that timeframe.

Our memory management changes all the time, but there's nothing particularly specific to 25 I can think of.  Generational GC isn't enabled anywhere, so that shouldn't be an issue.
Summary: Firefox crashes while left unattended, potentially due to hooxpot.dll → Firefox crashes while left unattended, memory climbs
ok, thanks Andrew.
@Rob
Wow... sheesh... lemme think...

Ok, the TH is NOT executing tests and is therefore NOT running that code I mentioned. It is loaded in the main browser window (ie top).  The "monolithic app" (ha! not far off!) sits in an iframe that occupies (guessing) 95% of the viewport, the TH displaying a strip across the top - make sense?  Graphically...

+ --------------+
|      TH       |  <<== Hitting play/pause does NOT break into running code - hence "idle"
+ --------------+
|               |
|               |
|     APP       |  <<== Hitting play/pause does NOT break into running code - hence "idle"
|               |
|               |
|               |
+ --------------+

So, no tests loaded, TH does its startup stuff and sits waiting on events (button clicks, essentially).

And apart from that stupid brain-fade typo (a clearTimeout that should have been a clearInterval), last bugfix was in Feb - it's pretty stable.  And again, that code has not executed during the lifetime of this conversation (not testing right now - I just happen to dev in the TH).

So, yes, it's an obvious suspect but it has (so far) rock solid alibis ;)

Aside: my Aurora is now 27.0a2 So, ...

@Nick: I will carry out your request, but it will be on FF27 (assuming the small 8K leak is still there).

Rob - I'll email you some bits.
@Nick FF27 is climbing in 16 and 20K chunks.  I'll do the test in a few hours.
> Our memory management changes all the time, but there's nothing particularly
> specific to 25 I can think of.

And our profiling coverage of JS memory consumption is really good, so if JS objects (or strings, or other JS things) were leaking we wouldn't be getting the rise entirely in "heap-unclassified".  We have coverage for orphan DOM nodes as well, so it shouldn't be that either.

The fact that there's such a drastic difference in behaviour between different Firefox versions makes it definitely smell like a bug on our side.
Russ sent me some about:memory snapshots for FF27, where he's seeing the small memory increase.  Again, the increase falls entirely into "heap-unclassified":

41.79 MB (100.0%) -- explicit
├──38.47 MB (92.06%) ── heap-unclassified
├───2.83 MB (06.77%) -- workers/workers()/worker(resource://gre/modules/PageThumbsWorker.js, 0xNNN)
│   ├──1.50 MB (03.59%) -- gc-heap
│   │  ├──1.47 MB (03.51%) ── unused-arenas [+]
│   │  └──0.03 MB (00.07%) ── chunk-admin [+]
│   ├──0.68 MB (01.63%) ++ runtime
│   └──0.65 MB (01.54%) ++ (3 tiny)
├───2.39 MB (05.72%) -- js-non-window
│   ├──2.39 MB (05.72%) -- zones/zone(0xNNN)
│   │  ├──2.18 MB (05.22%) -- compartment([System Principal], resource://gre/modules/TelemetryFile.jsm)
│   │  │  ├──2.12 MB (05.08%) -- objects
│   │  │  │  ├──1.80 MB (04.30%) -- malloc-heap
│   │  │  │  │  ├──1.77 MB (04.23%) ── elements/non-asm.js
│   │  │  │  │  └──0.03 MB (00.07%) ── slots [+]
│   │  │  │  └──0.32 MB (00.78%) ++ gc-heap
│   │  │  └──0.06 MB (00.15%) ++ (2 tiny)
│   │  └──0.21 MB (00.49%) ++ (38 tiny)
│   └──0.00 MB (00.00%) ++ runtime/code
└──-1.90 MB (-4.54%) ++ (7 tiny)

Argh.

Quoting Russ's summary from his email:

FF24 is now my workhorse - no known issues
FF25 - we know about [bad, fast leak]
FF26 - *was* Aurora when I first noticed the 8K creep... that was updated to...
FF27 - I'm assuming 27.0a1? this showed the 8K creep too
FF27.0a2 - current Aurora - showing 16K or 20K or 24K creep

(The diff above is for FF27.0a2)

I asked Russ about running the web app myself, but apparently getting permission to do so is almost impossible.
Whiteboard: [MemShrink] → [MemShrink:P3]
Does it happen after Windows hibernates/sleeps?

Can you try deactivating hibernation/sleep mode?

I reproduced similar issues in the past.
Flags: needinfo?(russgthomas)
Keywords: regression
I cannot say since I don't use hibernate/sleep. Do I think it would occur after either? Yes - it's unlikely to "fix" it but no, sleep is not "the cause".
Flags: needinfo?(russgthomas)
FYI...

On a whim I decided to download nightly: 28.0a1

The bug remains but far less aggressive - relentless 8K increments (approx 16K/second).

@Nick: As before, if I disable the read of the config data in the app, the bug disappears.  And yes, I'm hoping to chop that into pieces in the hope that I can nail it at least a little closer.

Aside: re Australis... I know moz have a thing about protecting privacy and anonymity, but was the plan really to make the browser itself so anonymous? Would joe public recognize it in a movie?
After bug 819839 merges to mozilla-central, it might be possible to detect these leaks in a custom build on Windows.
Emanuel, what are the steps that users should take with a DMD build, now that it's on mozilla-central? The goal is to diagnose the heap-unclassified leak, correct?
Flags: needinfo?(emanuel.hoogeveen)
I'm not the expert, but instructions for using DMD are here:
https://wiki.mozilla.org/Performance/MemShrink/DMD

In particular, see
https://wiki.mozilla.org/Performance/MemShrink/DMD#Desktop
and
https://wiki.mozilla.org/Performance/MemShrink/DMD#Desktop_2
for instructions on how to start Firefox with DMD enabled and use it to get an alaysis. Regardless of how you set the environment variables, you'll want to do this from commandline so you can confirm that it's active.

From there, you probably want to run an analysis just after opening Firefox, and then another one a while later for the sake of comparison. Since this is a slow leak that happens regardless of what you do, it's probably best to leave it be for a while and get a second trace (rather than browsing around).

However, I don't think there are any automated builds that have DMD enabled right now. It might be possible to get a try build, or you could try to build it locally. Nicholas, is that right?
Flags: needinfo?(emanuel.hoogeveen) → needinfo?(n.nethercote)
> However, I don't think there are any automated builds that have DMD enabled
> right now. It might be possible to get a try build, or you could try to
> build it locally. Nicholas, is that right?

Local builds certainly work.

Try builds should be possible;  I just added a paragraph to
https://wiki.mozilla.org/Performance/MemShrink/DMD#Everything_other_than_B2G-device_builds on how to do a try build.  I think it is correct but I haven't actually tested it.
Flags: needinfo?(n.nethercote)
I tried doing a DMD build via try server, but got linking errors:  https://tbpl.mozilla.org/php/getParsedLog.php?id=31460041&tree=Try#error0

glandium, any idea what went wrong?
Flags: needinfo?(mh+mozilla)
Nick, you say local builds certainly work but maybe not all combinations do.

My Windows .mozconfig has:
ac_add_options --enable-debug
ac_add_options --enable-dmd
mk_add_options MOZ_MAKE_FLAGS="-j4"

I don't think I ever tested --enable-dmd without --enable-debug (and for sure not with --enable-release). It didn't occur to me but looking retrospectively I should have done it.

I can try to look at it over the next days but don't promise I'll find time for it.
Ok, the debug try build worked!  https://tbpl.mozilla.org/?tree=Try&rev=cfab371e2020 is the run.

Russ:  your turn now!

Note that Windows support for DMD only landed a few days ago, so this is all
very new.  Fingers crossed that it works.  And debug builds are significantly slower
than optimized builds, so be prepared for that.

The following instructions are mostly from
https://wiki.mozilla.org/Performance/MemShrink/DMD.  I've copied the
Windows-specific parts here to make things easier for you.

GET THE BUILD
-------------
Russ:  please go to http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nnethercote@mozilla.com-cfab371e2020/try-win32-debug/ ... and I guess download the installer.exe file?  Or maybe the win32.zip file?  Not sure, but it must be one of those two.

RUN
---
To run it, do this from the command line:

  MOZ_REPLACE_MALLOC_LIB=path\\to\\dmd.dll \
  DMD=1 \
  <command>

I'm not sure what "path\\to\\" will be, hopefully you can search for dmd.dll
and work it out.

<command> will be something like "firefox -P <profilename>".  If you give a <profilename> that doesn't already exist, it'll ask if you want to create it.

On start-up, you'll see some commentary on stderr, such as:

  DMD[20523] $DMD = '1'
  DMD[20523] DMD is enabled

The number in brackets is the process ID.

The browser will run a little slower than usual.

ANALYZE
-------
DMD doesn't do anything notable until you ask it to.

You should trigger analysis once Firefox has been running a while and memory
consumption has climbed significantly.  To trigger DMD's analysis phase, enter
this URI:

 javascript:DMDReportAndDump(<filename>)

where <filename> is a string containing the name of the file that output will
be written to. For example:

 javascript:DMDReportAndDump("out.dmd")

The output file will be written to the current working directory.

Note that Firefox doesn't let you do this directly in the address bar, so you
must either (a) create a bookmark for it, or (b) run it from the error console.
The latter is discouraged because opening the error console allocates a lot of
memory that will skew your results. Furthermore, using the bookmark fails from
certain pages such as about:memory; if that happens in a debug build you'll get
a console warning like this:

  WARNING: No principal to execute JS with: file /home/njn/moz/mi2/dom/src/jsurl
/nsJSProtocolHandler.cpp, line 157

This command runs all the memory reporters and then invokes DMD analysis of the
reports, which triggers this commentary:

 DMD[20600] Dump 1 {
 DMD[20600]   gathering live block groups...
 DMD[20600]   creating and sorting double-reported block group array...
 DMD[20600]   creating and sorting unreported block group array...
 DMD[20600]   printing unreported block group array...
 DMD[20600]   creating and sorting unreported frame group array...
 DMD[20600]   printing unreported frame group array...
 DMD[20600]   creating and sorting reported block group array...
 DMD[20600]   printing reported block group array...
 DMD[20600]   creating and sorting reported frame group array...
 DMD[20600]   printing reported frame group array...
 DMD[20600] }

I'm not sure how long this takes on Windows.  On Mac it can take 30+ seconds.

Once that's finished printing, please upload <filename> to the bug report!

You can run DMDReportAndDump multiple times in a single session.
I filed bug 947117 about the DMD link error in non-debug builds.
Flags: needinfo?(mh+mozilla) → needinfo?(russgthomas)
@Nick: that's a 28 build.  Killer bug is in the 25 branch.  Please confirm this is the right target?
Flags: needinfo?(russgthomas)
You said that 28 still exhibits a leak, just a smaller one, right?  That's the version that's currently under development.  25 is past that point now...
NICK > I'm not sure what "path\\to\\" will be, hopefully you can search for dmd.dll
and work it out.

That's a big fat 404.  Not in the zip, and I aborted the .installer.exe since it wants to install nightly again (which I have).
Sure. 28 it is.
>> I'm not sure what "path\\to\\" will be, hopefully you can search for
>> dmd.dll
>> and work it out.
> 
> That's a big fat 404.  Not in the zip, and I aborted the .installer.exe
> since it wants to install nightly again (which I have).

glandium, is dmd.dll not included by whatever does the packaging?
Flags: needinfo?(mh+mozilla)
dmd.dll is not in package-manifest.in, so yes, it's not included.
Flags: needinfo?(mh+mozilla)
It's Friday night here, so it'll have to wait until Monday, but I can try to work out how to add it to package-manifest and do another try run.  Or if someone else wants to have a go in the mean-time, just note that you have to disable trace-malloc when you enable DMD, as I did in https://hg.mozilla.org/try/rev/9b92b8c295a7.
I bet you need to add
58 #ifdef MOZ_DMD
59 @BINPATH@/@DLL_PREFIX@dmd@DLL_SUFFIX@
60 #endif
from b2g/installer/package-manifest.in to browser/installer/package-manifest.in (and possibly the mobile and xulrunner equivalents).
Thanks, nthomas!

I did another try build, the output is here:

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nnethercote@mozilla.com-8f49111414e6/try-win32-debug/

I verified that firefox-28.0a1.en-US.win32.zip contains dmd.dll.

Russ, your turn again! :)
Flags: needinfo?(russgthomas)
Attached file out.dmd
out.dmd generated as per Nick Nethercote's instructions in comment 42.  Two invokations, first as soon as the 8K allocations started to appear in windows Task Manager; second "some time later" ;)

Firefox 28.0a1 new profile, Win7/64.
Flags: needinfo?(russgthomas)
Attached file console.txt
Output from windows console during DMD session FWIW.
Attachment #8346715 - Attachment mime type: text/x-vhdl → text/plain
> Two
> invokations, first as soon as the 8K allocations started to appear in
> windows Task Manager; second "some time later" ;)

Thanks, Russ!  I only see output from a single run in that attachment, and it looks like it's probably the earlier one.  Did you forget to attach the second one?
> You can run DMDReportAndDump multiple times in a single session.

No... I assumed the above meant it would keep on appending? If that causes an overwrite of previous data then you have the last invokation.  Let me know - I can easily re-run it.

For clarity: "invoke" means "click the bookmarklet" with URI: javascript:DMDReportAndDump("out.dmd")
Confirmed - it overwrites.  I'll time this one for approx 15 mins.
(In reply to Russ from comment #56)
> > You can run DMDReportAndDump multiple times in a single session.
> 
> No... I assumed the above meant it would keep on appending? If that causes
> an overwrite of previous data then you have the last invokation.  Let me
> know - I can easily re-run it.

Oh, right.  Yes, if you give the same filename to DMDReportAndDump() twice the second invocation will overwrite the file written by the first invocation.

> For clarity: "invoke" means "click the bookmarklet" with URI:
> javascript:DMDReportAndDump("out.dmd")

Yes.

So that means the attached file is from the second invocation.  This line near the end:

> Total:           ~99,507,611 bytes (100.00%) in ~14,353 blocks (100.00%)

Indicates the heap was just under 100MB, which is a lot smaller than I expected.  Hmm.
> Confirmed - it overwrites.  I'll time this one for approx 15 mins.

Yeah... ideally you'd invoke DMD shortly before the crash happens, or at least at a point where memory consumption has climbed greatly.
Not sure we'll reach the crash point (these incs are 8K, the crash was occurring when the incs were of the order of 1M).
Attached file out.dmd.2
Dump after ~15mins
15 minutes isn't that long -- if we're seeing 8KB per second growth (is that right?) that's only about 7 MB.  A long run would be useful... except that the stacks DMD is producing look totally bogus.  Consider the biggest unreported record:

Unreported: 1 block in stack trace record 1 of 1,516
 15,765,504 bytes (15,765,504 requested / 0 slop)
 14.53% of the heap (14.53% cumulative);  23.96% of unreported (23.96% cumulative)
 Allocated at
   moz_xrealloc[mozalloc +0x1123] 0x6a8f1123
   mozilla::detail::GuardObjectNotificationReceiver::operator=[xul +0x2E08] 0x2f72e08
   mozilla::detail::GuardObjectNotificationReceiver::operator=[xul +0x4B7CE] 0x2fbb7ce
   mozilla::detail::GuardObjectNotificationReceiver::operator=[xul +0x4C876] 0x2fbc876
   JSPrincipals::dump[xul +0xF5DDE6] 0x3ecdde6
   JSPrincipals::dump[xul +0xF5E2CB] 0x3ece2cb
   JSPrincipals::dump[xul +0xF5E680] 0x3ece680
   soundtouch::SoundTouch::operator=[xul +0xF83521] 0x3ef3521
   soundtouch::SoundTouch::operator=[xul +0xF87DA2] 0x3ef7da2
   soundtouch::SoundTouch::operator=[xul +0xF881C3] 0x3ef81c3
   XRE_AddJarManifestLocation[xul +0x16B4E3] 0x30db4e3
   mozilla::services::_external_GetHistoryService[xul +0x108615] 0x3078615
   XRE_AddJarManifestLocation[xul +0x16B924] 0x30db924
   PRP_TryLock[nss3 +0x16DE4F] 0x56edde4f
   PR_MD_UNLOCK[nss3 +0x17382F] 0x56ee382f
   beginthreadex[MSVCR100D +0x4A273] 0x5777a273
   beginthreadex[MSVCR100D +0x4A204] 0x5777a204
   BaseThreadInitThunk[kernel32 +0x1336A] 0x7590336a
   RtlInitializeExceptionChain[ntdll +0x39F72] 0x77dd9f72
   RtlInitializeExceptionChain[ntdll +0x39F45] 0x77dd9f45

GuardObjectNotificationReceiverd::operator= doesn't exist, and if it did, I doubt it would call itself.  JSPrincipals::dump exists but doesn't call itself.  SoundTouch::operator= doesn't exist, and it wouldn't call itself if it did.

And these bogus |operator=| entries appear in all the large records.  I don't know what to make of this.  Dammit.
The only other thing I can think of is this:  on Linux and Mac, DMD output needs post-processing to turn the symbol names and offsets into filenames and line numbers.  On Windows, this isn't necessary... at least, if you build and run it locally the filename/line-numbers are obtained automatically.  I don't know how this works, but it's clearly not happening in Russ's run (which came from a try build).  I wonder if we were able to post-process the output to produce filenames/line numbers whether the results would be more sensible.

bsmedberg, do you know about this?
Flags: needinfo?(benjamin)
Attached file out.dmd.3
Something like 90 mins
khuey said ted might also know about windows debug info stuff.
Flags: needinfo?(ted)
Did you do MOZ_CRASHREPORTER_UPLOAD_FULL_SYMBOLS on the try run? I think we need the original .pdb files; I don't think any Windows tools know how to handle .sym.

Plus the automatic PDB hookup only works if the .pdb's are at the same absolute path as the path where they were build (which is usually the case if it's a local build, but usually not otherwise). To set the path manually, you'll want to set the _NT_SYMBOL_PATH environment variable to point to the folder with the .pdb's (and make sure you launch in such a way that the browser inherits this environment variable).
dmajor and I are working on a new build that incorporates the symbols.  We'll test locally before asking Russ to try it!
(In reply to comment #66)
> Plus the automatic PDB hookup only works if the .pdb's are at the same absolute
> path as the path where they were build (which is usually the case if it's a
> local build, but usually not otherwise). To set the path manually, you'll want
> to set the _NT_SYMBOL_PATH environment variable to point to the folder with the
> .pdb's (and make sure you launch in such a way that the browser inherits this
> environment variable).

FWIW we could probably do this within our own code if we detect where the .pdb files are (for example, if they're in the installation directory.)
I confirmed Nick's latest try build works on my machine. Here were the steps I took:

https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nnethercote@mozilla.com-af379511c3c4/try-win32-debug/
Unzip firefox-29.0a1.en-US.win32.zip to c:\firefox
Unzip firefox-29.0a1.en-US.win32.crashreporter-symbols-full.zip to c:\ffsymbols

Open a command prompt (as administrator since I dumped stuff into c:\)
cd \ffsymbols
for /f %i in ('dir /s /b *.pd_') do @expand -r %i

cd \firefox
mkdir .profile
set MOZ_REPLACE_MALLOC_LIB=dmd.dll
set DMD=1
set _NT_SYMBOL_PATH=c:\ffsymbols
firefox.exe -no-remote -profile .profile

This produced an out.dmd with reasonable symbols. The "for...expand" piece is crucial; I always forget this step and end up wondering where my symbols went.
Flags: needinfo?(benjamin)
FYI you can also unzip the compressed symbols somewhere and use that as an upstream store, as per:
https://wiki.mozilla.org/ReleaseEngineering/TryServer#Debugging_with_debug_symbols

(and make dbghelp do the decompression for you.) But yeah, we could also bake some of this logic in if we're expecting people to use these DMD builds.
Flags: needinfo?(ted)
BTW, I updated https://wiki.mozilla.org/Performance/MemShrink/DMD to explain how to do DMD-enabled try builds and runs on Windows.
Ok, I'm trying this new build and again following instructions in comment 42, specifically, the advice under the heading "ANALYZE" where the suggestion is to use a bookmark.  On my previous runs I did this and it worked just fine (new tab, click bookmarklet) but on this occasion, I get this:

[73356] WARNING: No principal to execute JS with: file c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/dom/src
/jsurl/nsJSProtocolHandler.cpp, line 156

Which is similar to the warning mentioned in comment 42.

I eventually managed to get the tab/bookmark to work by middle-clicking it.  Problem: memory had suddenly more than doubled (from about 210M to 500+M).  So, if I post these results they're going to be falsely inflated by the opening/closing tabs code (which in itself sounds buggy to me but that's for another day, perhaps).

So, question: should I post the out.dmd anyway?  I've chosen to restart the experiment and provide a less polluted output but I have the out.dmd here if it's needed.

@David Major: I followed your steps but chose not to pollute my root and also put the statements in a batch file.  Worked fine except for the for statement - what would be the syntax changes required? I'm guessing there's something up with the quotes on expansion?
(In reply to Russ from comment #72)
> @David Major: I followed your steps but chose not to pollute my root and
> also put the statements in a batch file.  Worked fine except for the for
> statement - what would be the syntax changes required? I'm guessing there's
> something up with the quotes on expansion?

If you use a batch file then you'll need to double the percents: %i -> %%i
Though once you have the symbols expanded, you don't need to do it again on every run.
Thanks David - been a while since I wrote batch code.
Following on from comment 72, I restarted and let the app run as previously described.  I watched memory climb from ~194M to ~260M.  I middle clicked my bookmarklet and...

  "Nightly has stopped working"

Ouch. 

Last line printed to the console was :

Assertion failure: mutationCount == table_->mutationCount, at c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\
obj-firefox\dist\include\js/HashTable.h:872

And no, crash reporter did not appear.

So, I fired up the build again (again from the command prompt)... tried to load my app... BANG!

[64296] ###!!! ASSERTION: Adding a child where we already have a child? This may misbehave: 'Error', file c:/builds/moz2
_slave/try-w32-d-00000000000000000000/build/docshell/shistory/src/nsSHEntry.cpp, line 622

Over to you, Nick.
Flags: needinfo?(n.nethercote)
Ah, the joys of working with immature tools...

> [73356] WARNING: No principal to execute JS with: file
> c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/dom/src
> /jsurl/nsJSProtocolHandler.cpp, line 156
> 
> Which is similar to the warning mentioned in comment 42.

Yes.  What tab did you have open when you clicked on the bookmark?  Was it about:memory or some other non-content tab?  That's the only cause of that error that I'm aware of.

> Assertion failure: mutationCount == table_->mutationCount, at
> c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\
> obj-firefox\dist\include\js/HashTable.h:872

Ugh.  No idea what would cause that.

> [64296] ###!!! ASSERTION: Adding a child where we already have a child? This
> may misbehave: 'Error', file c:/builds/moz2
> _slave/try-w32-d-00000000000000000000/build/docshell/shistory/src/nsSHEntry.
> cpp, line 622

Did that kill the browser?  We have lots of "soft" assertions that cause an error to be printed to the console but don't kill the browser, and that one looks like a soft assertion?
Flags: needinfo?(n.nethercote)
> What tab did you have open when you clicked on the bookmark?  Was it about:memory or some other non-content tab?

New, fresh, virginal tab.

> Did that kill the browser?

Sure.  As before, Bang as in dead as in gone to meet its maker as in this is an ex-parrot... 

Maybe I should be more clear; that last was a restart, followed by me launching my app (via a bookmark) and then an immediate bang.  That line was posted to the console "as it happened" near enough.

Reason for the NEEDSINFO: I really don't know how to proceed - I'd prefer not to guess.  Was that a bad nightly?
> > What tab did you have open when you clicked on the bookmark?  Was it about:memory or some other non-content tab?
> 
> New, fresh, virginal tab.

It appears that empty tabs also don't permit DMD to be run.  I'll update the docs.


> Reason for the NEEDSINFO: I really don't know how to proceed - I'd prefer
> not to guess.  Was that a bad nightly?

Not sure.  Assertions are only enabled in debug builds, which you have, so you might be hitting latent bugs that wouldn't necessarily manifest in an observable way in a release build.

Bugzilla has no record of the mutationCount assertion failure;  it's possible that DMD is involved, because DMD uses those hash tables.  But the "Adding a child" one has a pedigree -- see bug 645229, bug 751208, bug 859156, 864383.

Optimized DMD-enabled builds for Windows have just started working (see bug 947117 and bug 948621).  So my best suggestion is to roll the dice one more time... I'll do such a build, and then you can try it, Russ.  You won't get the assertion failures (because assertions are off) and hopefully it'll work sufficiently that we can get some useful data.  Sorry this has been such a drawn-out process :(
> It appears that empty tabs also don't permit DMD to be run.  I'll update the
> docs.

If you need to open a tab in which to trigger DMD (though wouldn't you have a content tab open somewhere?) then about:home is a good choice -- it's simple enough to not perturb the results much.
> If you need to open a tab in which to trigger DMD 

As I said, middle clicking DMD is working fine.

> (though wouldn't you have a content tab open somewhere?) 

Sure - my app.
Attached file out.zip
> out.zip

Russ, I guess you managed to get some output without crashing?  Great!  If we can get data from a longer run that will be more conclusive (I'm building the optimized DMD build right now), but it looks like webaudio is the issue.

I've attached the relevant part of the DMD output.
ehsan, padenot, karlt:  Russ saw a major memory leak on Windows (~1MB per second) when upgrading from FF24 to FF25.  FF26 and later are better, but still leak ~8KB per second.  His site is proprietary so he's been gallantly doing DMD runs for us.  He just obtained the first usable data and it points the finger at webaudio.  Please see the "partial DMD output" attachment.  We obviously don't have memory reporters for web audio (that's bug 884368), but if you could evaluate this for likelihood of leakiness that would be great!
Flags: needinfo?(paul)
Flags: needinfo?(ehsan)
Attached file out2.zip
Same session Nick, taken moments ago.
Aha!  Comparing the first DMD output to the second, most of the entries I
copied are unchanged.  However, this one has grown from ~8,880,125 bytes to
~66,740,221 bytes:

> Unreported: ~2 blocks in stack trace record 1 of 1,423
>  ~66,740,221 bytes (~66,740,221 requested / ~0 slop)
>  40.88% of the heap (40.88% cumulative);  57.56% of unreported (57.56% cumulative)
>  Allocated at
>    moz_xrealloc (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\memory\mozalloc\mozalloc.cpp:84) 0x583a1123
>    nsTArray_base<nsTArrayInfallibleAllocator,nsTArray_CopyWithMemutils>::EnsureCapacity (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\obj-fi
> refox\dist\include\nstarray-inl.h:170) 0x3152ad7
>    nsTArray_Impl<mozilla::AudioChunk,nsTArrayInfallibleAllocator>::AppendElements (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\obj-firefox\
> dist\include\nstarray.h:1257) 0x319b6a3
>    mozilla::MediaSegmentBase<mozilla::AudioSegment,mozilla::AudioChunk>::AppendChunk (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\m
> edia\mediasegment.h:291) 0x319c7a4
>    mozilla::AudioSegment::AppendAndConsumeChunk (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\audiosegment.h:157) 0x40ca8cb
>    mozilla::AudioNodeStream::AdvanceOutputSegment (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\audionodestream.cpp:460) 0x40c
> adb0
>    mozilla::AudioNodeStream::ProduceOutput (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\audionodestream.cpp:450) 0x40cb165
>    mozilla::MediaStreamGraphImpl::ProduceDataForStreamsBlockByBlock (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\mediastreamg
> raph.cpp:1090) 0x40f0096
>    mozilla::MediaStreamGraphImpl::RunThread (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\mediastreamgraph.cpp:1190) 0x40f4917
>    mozilla::`anonymous namespace'::MediaStreamGraphInitThreadRunnable::Run (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\media
> streamgraph.cpp:1344) 0x40f4d38
>    nsThread::ProcessNextEvent (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\xpcom\threads\nsthread.cpp:634) 0x32bcf2d
>    NS_ProcessNextEvent (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\xpcom\glue\nsthreadutils.cpp:263) 0x325957d
>    mozilla::ipc::MessagePumpForNonMainThreads::Run (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\ipc\glue\messagepump.cpp:301) 0x34e7e91
>    MessageLoop::RunInternal (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\ipc\chromium\src\base\message_loop.cc:226) 0x34aa864
>    MessageLoop::RunHandler (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\ipc\chromium\src\base\message_loop.cc:220) 0x34aada1
>    MessageLoop::Run (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\ipc\chromium\src\base\message_loop.cc:194) 0x34ab26c
>    nsThread::ThreadFunc (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\xpcom\threads\nsthread.cpp:267) 0x32bd492
>    _PR_NativeRunThread (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\nsprpub\pr\src\threads\combined\pruthr.c:397) 0x57f9de4f
>    pr_root (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\nsprpub\pr\src\md\windows\w95thred.c:90) 0x57fa382f
>    beginthreadex[MSVCR100D +0x4A273] 0x5815a273
>    beginthreadex[MSVCR100D +0x4A204] 0x5815a204
>    BaseThreadInitThunk[kernel32 +0x1336A] 0x7590336a
>    RtlInitializeExceptionChain[ntdll +0x39F72] 0x77dd9f72
>    RtlInitializeExceptionChain[ntdll +0x39F45] 0x77dd9f45

I think we have our smoking gun!
Additional info pertaining to webaudio:

During a normal working day and my normal dev/run/debug cycle, I'll usually not restart the browser until I see some kind of issue (issue might be anything).  One issue has been, after a number of dev/run/debug cycles, FF will start complaining it can't find the audio files (NB files it has previously been playing just fine).  Then I'll restart the browser.

Nick: Now I understand why my app's config file seems to trigger the issue.  It is responsible for triggering the loading of the sound libs.  I use a 3rd party lib called howler: http://goldfirestudios.com/blog/104/howler.js-Modern-Web-Audio-Javascript-Library
Component: Untriaged → Web Audio
Product: Firefox → Core
Target Milestone: Firefox 25 → ---
@Nick: just for confirmation, I just reran the same nightly, with my config file disabled which means the sound libs are NOT loaded and the 8K leak is gone.

Please note: there is NO sound asked for or heard during the experimental conditions detailed above.  It seems the mere presence of the lib(s) is enough to trigger the issue.
We probably won't need it, but just in case, I did an optimized, DMD-enabled build here:
https://tbpl.mozilla.org/?tree=Try&rev=5bae1a082a30
Summary: Firefox crashes while left unattended, memory climbs → Firefox crashes while left unattended, memory climbs, due to webaudio leak
> @Nick: just for confirmation, I just reran the same nightly, with my config
> file disabled which means the sound libs are NOT loaded and the 8K leak is
> gone.
> 
> Please note: there is NO sound asked for or heard during the experimental
> conditions detailed above.  It seems the mere presence of the lib(s) is
> enough to trigger the issue.

Thanks for the confirmation, that's very helpful.  Can you try again with FF25 to see if webaudio is also responsible for the bad leak in that release?
I've started to track this down. I may have STR:

- Open http://goldfirestudios.com/blog/104/howler.js-Modern-Web-Audio-Javascript-Library
- Click randomly on the buttons
- Close the tab

I instrumented a build by putting something like

>    static uint32_t max = 0;
>    if (mChunks.Length() > max) {
>      max = mChunks.Length();
>    }
>    printf("%p == mChunks.Length(): %u (max: %u)\n", this, mChunks.Length(), max);

and when I close the tab (or maybe on CC), the `max` and `mChunks.Length()` value seem to grow without stopping.
Assignee: nobody → paul
Flags: needinfo?(paul)
Okay, found the issue for the memory leak:

In StreamBuffer::ForgetUpTo (where we are supposed to determine how much buffer space we should remove), we see (numbers are from the gdb session I have at hand, we are here [1]):

(gdb) p aTime
$14 = 41062711
(gdb) p roundTo
$15 = 52428
(gdb) p forget
$16 = 41051124
(gdb) p mForgottenTime
$17 = 41051124

so we return immediately and we never free anything ever again, because aTime does not grow, because the streams are blocked until STREAM_TIME_MAX.

I checked, and the context is shutdown, like it should be, because the tab has been closed for a while.

(gdb) p mStreams[0]->mEngine->mNode->mContext->mIsShutDown
$77 = true

I then went to about:memory, I saw heap-unclassified grow super fast (like if we were leaking real time uncompressed audio data), and I pressed "Minimize memory usage", and then, everything got destroyed properly, and head-unclassified was instantly back to some reasonable value. I think that if one is running a unit-tests suite, AudioContexts would be created and teared down all the time, and we would clearly observe what Russ reported us.

Anyways, the solution here seem to adjust the logic somewhere in the MSG to avoid buffering like crazy, because at this point in the life cycle of the web page, it's useless anyways (the page has been closed, no sound is output anymore). The cycle-collection seem to be working fine, for me, but I'm no expert.

Also that I could repro the problem using my STR in comment 90 every single time I tried. I'm not sure if one has to use this specific js library, though. Also, I forgot to mention, but the code I wrote to instrument the build goes in MediaSegment::AppendChunk.

[1]: http://dxr.mozilla.org/mozilla-central/source/content/media/StreamBuffer.cpp#70
What's the likely risk of a patch here? This is serious enough that if there's a low-risk patch I'd like to get it in 27 beta.
Well, this is less serious imo that what this bug would indicate. One would have
to setup and shutdown a bunch of AudioContexts, and have delayed CC (like, when
you run unit-tests) to have a constant high memory usage.

A normal user would only see a spike on context shutdown.

The problem is, when we ::Shutdown an AudioContext, we increment the blocker
count of the destination node. This is somewhat okay for our purpose, because we
don't output sound anymore, but causes buffering in the graph, and that causes
this bug. If we replace ::Suspend by ::Mute, it's better. afaik, we do the same
amount of work, but we simply discard the data instead of buffering it, which
seems more correct (to me).
Attachment #8348783 - Flags: review?(roc)
Attachment #8348783 - Flags: review?(ehsan)
@Nick

> Can you try again with FF25 to see if webaudio is also responsible for the bad leak in that release?

Yeah, I'm pretty sure we did that already but I did it again anyway.  

Confirmed. FF25 has a huge leak (aggregated rate ~1MB/second).  When the config file (and hence audio/howler) is disabled, leak is NOT present.
> Well, this is less serious imo that what this bug would indicate. One would have
to setup and shutdown a bunch of AudioContexts, and have delayed CC (like, when
you run unit-tests) to have a constant high memory usage.
> A normal user would only see a spike on context shutdown.

I apologize if I've misread or misunderstood you, but...

IMO, FF25 needs an ESR.  Since I posted my previous and read yours, I've watched a FF25 session grow from approx ~400MB to 1,800MB and will (by my experience) crash soon with no helpful stack info in the crash report.

Whatever the cause in FF25, it was improved somewhat in FF26.  Unless you can say why that change occurred, it can easily regress. That, IMO, is why we're pushing on this.  That, IMO, is what makes this serious.

And please explain what demographic is covered by "normal user" ?
(In reply to Russ from comment #95)
> > Well, this is less serious imo that what this bug would indicate. One would have
> to setup and shutdown a bunch of AudioContexts, and have delayed CC (like,
> when
> you run unit-tests) to have a constant high memory usage.
> > A normal user would only see a spike on context shutdown.
> 
> I apologize if I've misread or misunderstood you, but...
> 
> IMO, FF25 needs an ESR.  Since I posted my previous and read yours, I've
> watched a FF25 session grow from approx ~400MB to 1,800MB and will (by my
> experience) crash soon with no helpful stack info in the crash report.

Maybe you meant chemspill / dot release by saying ESR? That's the call of release drivers, but my patch is simple enough that it seems safe to uplift it on Beta. 

> Whatever the cause in FF25, it was improved somewhat in FF26.  Unless you
> can say why that change occurred, it can easily regress. That, IMO, is why
> we're pushing on this.  That, IMO, is what makes this serious.

The change in behavior is likely caused by all the changes Karl Tomlinson landed between 25 and 26, some of which are related to the lifetime of the web audio object. This is my guess, I haven't actually looked in the detail.
 
> And please explain what demographic is covered by "normal user" ?

My expectations, confirmed by the code I could on the internet is that most applications using Web Audio use only one AudioContext, and don't trigger this bug.

That said, rest assured that I'm doing the work that is needed so we are sure that :
(1) This particular bug is fixed
(2) We have memory probes in place to track down further bugs in the area (leaks related to Web Audio). This is tracked in bug 884368, I just bumped its priority.
Also note that Firefox 25 is not supported any more as 26 has been released - and that version at least is better in terms of this leak. For 27, I hope the better fix will be in.
Comment on attachment 8348783 [details] [diff] [review]
Don't overbuffer on AudioContext shutdown. r=

Review of attachment 8348783 [details] [diff] [review]:
-----------------------------------------------------------------

Makes sense.
Attachment #8348783 - Flags: review+ → review?(roc)
(And I do think that we should uplift this patch to Aurora and Beta FWIW.)
Flags: needinfo?(ehsan)
> (And I do think that we should uplift this patch to Aurora and Beta FWIW.)

Me too -- the patch is tiny and we're early in the Beta period.

Speaking of the patch... I don't know anything about this code, but it's disconcerting that such an innocuous-looking change can have such a big effect.  Is it worth having at least a comment on the Mute() call explaining why it's preferred to Suspend(), or something like that?
(In reply to Nicholas Nethercote [:njn] from comment #100)
> Speaking of the patch... I don't know anything about this code, but it's
> disconcerting that such an innocuous-looking change can have such a big
> effect.  Is it worth having at least a comment on the Mute() call explaining
> why it's preferred to Suspend(), or something like that?

Something else should be kicking in as well. Cycle collection should have freed everything and caused MSG shutdown before we consume much memory. So I think we need to investigate that and possibly do a followup fix for that too.
(In reply to Paul Adenot (:padenot) from comment #91)
> In StreamBuffer::ForgetUpTo (where we are supposed to determine how much
> buffer space we should remove), we see (numbers are from the gdb session I
> have at hand, we are here [1]):
> 
> (gdb) p aTime
> $14 = 41062711
> (gdb) p roundTo
> $15 = 52428
> (gdb) p forget
> $16 = 41051124
> (gdb) p mForgottenTime
> $17 = 41051124
> 
> so we return immediately and we never free anything ever again, because
> aTime does not grow, because the streams are blocked until STREAM_TIME_MAX.

Hmm, so something is wrong here too. An AudioNode feeding into a blocked destination should still be able to forget; its MediaStream should not be blocked, so we should just be throwing data away.
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #101)
> (In reply to Nicholas Nethercote [:njn] from comment #100)
> > Speaking of the patch... I don't know anything about this code, but it's
> > disconcerting that such an innocuous-looking change can have such a big
> > effect.  Is it worth having at least a comment on the Mute() call explaining
> > why it's preferred to Suspend(), or something like that?
> 
> Something else should be kicking in as well. Cycle collection should have
> freed everything and caused MSG shutdown before we consume much memory. So I
> think we need to investigate that and possibly do a followup fix for that
> too.

That's true.  Although we can't really predict when the CC kicks in so it's not entirely surprising that we had this bug given that the test case is creating contexts all the time.
Rob, it seems like you added qawanted to this bug a month ago. Do you recall what you needed us to do and if the request is still valid?
Flags: needinfo?(rcampbell)
qawanted is no longer necessary;  we have a good idea of what the problem is and steps to reproduce.
Flags: needinfo?(rcampbell)
Keywords: qawanted
(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #102)
> (In reply to Paul Adenot (:padenot) from comment #91)
> > In StreamBuffer::ForgetUpTo (where we are supposed to determine how much
> > buffer space we should remove), we see (numbers are from the gdb session I
> > have at hand, we are here [1]):
> > 
> > (gdb) p aTime
> > $14 = 41062711
> > (gdb) p roundTo
> > $15 = 52428
> > (gdb) p forget
> > $16 = 41051124
> > (gdb) p mForgottenTime
> > $17 = 41051124
> > 
> > so we return immediately and we never free anything ever again, because
> > aTime does not grow, because the streams are blocked until STREAM_TIME_MAX.
> 
> Hmm, so something is wrong here too. An AudioNode feeding into a blocked
> destination should still be able to forget; its MediaStream should not be
> blocked, so we should just be throwing data away.

I should have said before that it is only the DestinationNode stream that is overbuffering, which makes sense to me.
Status: UNCONFIRMED → NEW
Ever confirmed: true
https://hg.mozilla.org/mozilla-central/rev/fdfd0621d658
https://hg.mozilla.org/mozilla-central/rev/eaa48b9c7094
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Since the fix just landed on mozilla-central, it should be in the next Nightly build.

Russ can you please try a Nightly build (from http://nightly.mozilla.org/) tomorrow, i.e. Thursday your time, to see if this fixes your problem?  Thanks!
Flags: needinfo?(russgthomas)
FF29.0a1 (2013-12-19) Same STR, PASS (tada!)

Please PLEASE have the release team review this for back-fitting to FF24 and (since it is such an app-killer here) FF25.  Yes, Nick has explained the rules but this bug in FF25 DESTROYS valuable user work and user data (in my world that's CRITICAL).

TFL

Big thanks to Nick Nethercote for his diligence on this, "without whom" and all that.
Flags: needinfo?(russgthomas)
I'm going to upgrade this to MemShrink:P2 by fiat as it turned out to be worse than we initially thought.
Whiteboard: [MemShrink:P3] → [MemShrink:P2]
(In reply to Russ from comment #111)
> Please PLEASE have the release team review this for back-fitting to FF24 and
> (since it is such an app-killer here) FF25.  Yes, Nick has explained the
> rules but this bug in FF25 DESTROYS valuable user work and user data (in my
> world that's CRITICAL).
Is Firefox 24 even affected by this bug? From what you said in comment #14 it didn't seem to be (though that was before we knew the cause).
@Emanuel: You're right.  However, since it *is* an ESR and will receive ongoing updates, there is a chance it could be affected down the line (I thought).  Perhaps the chance of that happening is "vanishingly small" ? I wouldn't know...
(In reply to Russ from comment #111)
> Please PLEASE have the release team review this for back-fitting to FF24 and
> (since it is such an app-killer here) FF25.

25 is not being maintained any more and everyone is being updated to 26. If people do not update, that's their fault and problem, not ours. And 24 ESR is only being maintained for security issues, no other changes will go there (if someone with a large deployment requests it in the enterprise list, it may be discussed but otherwise exclusively security fixes will make it into ESR, ever).
That said, this specific case has been introduced with WebAudio in Firefox 25 and higher only, so 24 cannot be patched because it doesn't even have the feature that exposes the problem. In other words, ESR 24 is already fixed because it never had the feature and the bug.
Marking this verified based on comment 111.
Status: RESOLVED → VERIFIED
Thanks for verifying the fix, Russ.

Just to summarize the version situation.

- FF24/ESR24: unaffected.

- FF25: very bad leak.  But this version is no longer current and thus won't receive any fixes.

- FF26: smaller leak.  This is the current release, but the bug isn't bad enough for a "chemspill" release (reserved for *major* security flaws and the like), so it won't receive the fix.

- FF27: smaller leak.  This is the current Beta, and the fix will be backported (as per the "tracking-firefox27+ flag above).

- FF28: smaller leak.  This is the current Aurora, and the fix will be backported (as per the "tracking-firefox28+ flag above).

- FF29: smaller leak.  This is the current Nightly, and has already been fixed.
padenot, can you please upload a combined patch and nominate it for Aurora and Beta uplift?  Thanks.
Flags: needinfo?(paul)
Depends on: 956201
(In reply to Nicholas Nethercote [:njn] from comment #119)
> padenot, can you please upload a combined patch and nominate it for Aurora
> and Beta uplift?  Thanks.

Ping?  We really want this backported to FF27 and FF28...
Bug 956201 will need to be fixed first.
Blocks: 927044
This is the folded patch for uplift.
Comment on attachment 8357160 [details] [diff] [review]
Folded patch for uplift

[Approval Request Comment]
Bug caused by (feature/regressing bug #): web audio.
User impact if declined: bad memory leaks on some web pages -- memory consumption can grow without bound even with no page interaction.
Testing completed (on m-c, etc.): on m-c for a couple of weeks. And the original reporter confirmed that it fixes his problem.
Risk to taking this patch (and alternatives if risky): low; the patch is tiny.
String or IDL/UUID changes made by this patch: none.
Attachment #8357160 - Flags: approval-mozilla-beta?
Attachment #8357160 - Flags: approval-mozilla-aurora?
Flags: needinfo?(paul)
Attachment #8357160 - Flags: approval-mozilla-beta?
Attachment #8357160 - Flags: approval-mozilla-beta+
Attachment #8357160 - Flags: approval-mozilla-aurora?
Attachment #8357160 - Flags: approval-mozilla-aurora+
Comment on attachment 8357160 [details] [diff] [review]
Folded patch for uplift

NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.

[Approval Request Comment]
Bug caused by (feature/regressing bug #): web audio
User impact if declined: big memory leak, and then crash for oom
Testing completed: m-c, m-a, original reporter said it fixed the issue
Risk to taking this patch (and alternatives if risky): little risk, cause and fix are well understood
String or UUID changes made by this patch: none
Attachment #8357160 - Flags: approval-mozilla-b2g26?
As the note says, I think you are supposed to use the blocking-b2g flag to get permission to land for non-security issues.
blocking-b2g: --- → koi?
koi? seems to correspond to b2g26.
Attachment #8357160 - Flags: approval-mozilla-b2g26?
blocking-b2g: koi? → koi+
FWIW, following my original STR...

FF27 not fixed (current beta)
FF28 fixed
FF29 fixed
(In reply to Russ from comment #130)
> FF27 not fixed (current beta)

Yes, this will only be fixed in the upcoming beta (internally called "27.0b5") that was built late yesterday.
Thanks Robert.  Just received the update.

FWIW, following my original STR...

FF27 fixed
FF28 fixed
FF29 fixed
Marking verified based on Russ' comments - thanks for testing (and reporting in the first place)!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: