936784 - Firefox crashes while left unattended, memory climbs, due to webaudio leak

Reporter

Description

•

12 years ago

User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:25.0) Gecko/20100101 Firefox/25.0 (Beta/Release) Build ID: 20131025151332 Steps to reproduce: Like the subject implies, nothing. DevTools were open too but essentially idle. Actual results: https://crash-stats.mozilla.com/report/index/0250b702-e73b-47e0-b73f-c97ac2131109 https://crash-stats.mozilla.com/report/index/e25ede62-ea51-4be8-a974-eb6142131109

Codacoder

Reporter

Comment 1

•

12 years ago

Leave desk, make coffee, walk back to desk - 10 mins? https://crash-stats.mozilla.com/report/index/962cfe4c-f6ff-4954-a085-0f3002131109 FF25 is unusable

Codacoder

Reporter

Comment 2

•

12 years ago

Aurora (26) is borked - no profiler, missing scrollbars in debugger... I'm now using FF24 and advising all customers and clients to do the same.

Loic

Comment 3

•

12 years ago

There is the DLL hooxpot.dll injected in firefox.exe, it's a part of the virtual desktop Dexpot. Could you uninstall this application and test again, please. NB: providing empty crash reports doesn't help.

Severity: normal → critical

Flags: needinfo?(russgthomas)

Keywords: crash

Codacoder

Reporter

Comment 4

•

12 years ago

> There is the DLL hooxpot.dll injected in firefox.exe, it's a part of the virtual desktop Dexpot. Could you uninstall this application and test again, please. Yes, I *could*, but others in our org also see the same issues since upgrading yesterday - they do not use dexpot. > NB: providing empty crash reports doesn't help. Doesn't help who - you? Providing the information that the crash reporter is unable to build a stack *is* information. Asking users to "filter" crashes and reports, IMO, is unwise. Remember, Moz don't pay users like me, be grateful when some of us choose to help you. So choose: 1 - send crash reports. Period. 2 - Send none.

Flags: needinfo?(russgthomas)

Loic

Comment 5

•

12 years ago

I'm not paid too. Just click on the crash link and if it's empty, discard it. There is one useful CR anyway: https://crash-stats.mozilla.com/report/index/e25ede62-ea51-4be8-a974-eb6142131109

Codacoder

Reporter

Comment 6

•

12 years ago

Perhaps my point was a little to subtle - I am not going to spend any time filtering crash reports. That is not my job, sounds like you don't want it to be your job either? A significant part of my day is already taken up by debugging firefox instead of fixing the bugs in my own code (for which I *am* being paid) made no easier by broken firefox DevTools. Seriously, running firefox with devtools debug logging enabled is not exactly ideal - or efficient. I repeat the above, take 'em or leave 'em.

Loic

Comment 7

•

12 years ago

Did you try to repro the crash in safe mode and with a new profile? (as it seems to crash even when Firefox is idle, you can test that easily)

Codacoder

Reporter

Comment 8

•

12 years ago

Oh boy... I think you'll find I've installed FF24 if you read back. I can tell you that Aurora does NOT crash when left idle (FF26). However, since Aurora has so many other issues relating to the DevTools, that's not a viable dev environment at present (and obviously isn't something I can advise my users to use). All this time we've been swapping comments, both Aurora and FF24 have been sitting idle with my app loaded. My take away from this regarding FF25 is: bad tests, period.

Andre Klapper

Updated

•

12 years ago

Summary: Firefox crashes while left unattended → Firefox crashes while left unattended, potentially due to hooxpot.dll

Codacoder

Reporter

Comment 9

•

12 years ago

http://dexpot.de/forum/viewtopic.php?f=12&t=4864

Codacoder

Reporter

Comment 10

•

12 years ago

Redownloaded FF25, used profilemanager to create a -no-remote for it, ensured all plugins were uptodate (flash and adobe pdf). Updated Aurora (FF27). Updated dexpot. (1.6.0) (This is a red herring IMO) Fired up FF24, FF25 and FF27 same app, same URL. While typing this and the above had been running for around 20 mins, FF25 crashed... FF25 crash report after uptime of 2242: (EMPTY!) https://crash-stats.mozilla.com/report/index/aab0982a-7c9e-4d4b-8ba6-617272131110 FF24 and Aurora still running.

Rob Campbell [:rc] (:robcee)

Comment 11

•

12 years ago

cc'ing Nick. The crash report mentioned in C#5 looks like it's coming from our memory system. Do you have anyone who could take a look into this? I'm suspicious of the reporter's environment as he's been reporting bugs with devtools we're just not seeing in any quantity. Russ, have you tried running Fx25 with a completely new profile?

Codacoder

Reporter

Comment 12

•

12 years ago

I thought that's what I just tried?

Codacoder

Reporter

Comment 13

•

12 years ago

NB - does NOT occur in Aurora

Codacoder

Reporter

Comment 14

•

12 years ago

Rob, I saw that memory reference too, so, restarted the experiment depicted in #10 and watched firefox.exe in Windows Task Manager... FF24 stable at 144,860K FF25 CLIMBING at 844,nnnK approx rate of 1M/sec (now 904nnn at I type) same URL in both.

Codacoder

Reporter

Comment 15

•

12 years ago

[10:08:40.938] Could not read chrome manifest 'file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox%2025/chrome.manifest'. [10:08:40.981] Could not read chrome manifest 'file:///C:/Program%20Files%20(x86)/Mozilla%20Firefox%2025/browser/extensions/%7B972ce4c6-7e08-4474-a285-3208198ce6fd%7D/chrome.manifest'. [10:08:40.982] While creating services from category 'profile-after-change', could not create service for entry 'Disk Space Watcher Service', contract ID '@mozilla.org/toolkit/disk-space-watcher;1' [10:08:51.535] 1384099731535 Services.HealthReport.HealthReporter WARN Saved state file does not exist. [10:08:51.535] 1384099731535 Services.HealthReport.HealthReporter WARN No prefs data found.

Codacoder

Reporter

Comment 16

•

12 years ago

The same memory climb occurs in safe mode, too. We can dismiss plugins... anything else that this "means"? Then I realised, I'm always running inside my test harness which wraps the app in an iframe. When the app runs outside the test env, memory climb STOPS. The question becomes (for me) what am I doing in the test env that: - has never had an issue before (<FF25), - breaks FF25, - does not break >FF25 (FF26, 27/Aurora) Of course, there is an assumption here... that the memory climb is the cause of (or leads to) the crash. Comments?

Nicholas Nethercote [inactive]

Comment 17

•

12 years ago

The memory climbing is consistent with the non-empty out-of-memory crash in comment 5. The fact that it happens in safe mode also rules out extensions and (I think) hardware acceleration issues (which are reponsible for a surprising fraction of problems of this nature). Russ, can you please do the following with FF25: - Visit about:memory (type it in the address bar) in a separate tab. - Take two memory measurements using the "Measure and save..." button -- one at the start, and one after memory consumption has been climbing for a while but before it crashes. - Attach the two files to this bug. It probably doesn't matter if you do it in safe mode, but using a new (or almost new) profile would be helpful, just keep things simpler. We can then do a diff of the two files (using about:memory's "Load and diff" button, you can try it too if you want). That's the most likely way we'll make progress here. Thanks. (BTW, the messages in comment 15 are harmless and not relevant, AFAIK.)

Whiteboard: [MemShrink]

Codacoder

Reporter

Comment 18

•

12 years ago

Will do. I had already taken a peek in there (in a previous issue, Rob had asked me to send him memory reports) but to my uneducated eyes, comparing FF24, FF25 and FF27 (Aurora) (measure, measure, measure, over and over) I saw nothing that stood out. But, as you know, there's a lot to digest in there... :/ And yes, shiny new profile :) And btw, my laptop behaves the exactly same (non-dev environment, FF25, email, IMs... twitter... ****... you get the idea). I wanted to rule out as much suspicious tooling as possible (not least any wonderings Rob may be having re comment 11). Back soon, hopefully.

Codacoder

Reporter

Comment 19

•

12 years ago

Oh, one more thing Nick... While in about:memory, hitting "Minimize memory usage" has an obvious effect against both FF24 and 27. No (noticeable) effect at all on 25. So what ever this is, the browser will not GC it.

Codacoder

Reporter

Comment 20

•

12 years ago

And Nick, for Aurora, the session described in comment 16 is STILL running, sitting idle, same URL, memory climbing very slowly in 8K chunks (now at 505MB).

Codacoder

Reporter

Comment 21

•

12 years ago

Attached file memory-report-1.json.gz — Details

Attachment #829944 - Flags: feedback+

Codacoder

Reporter

Comment 22

•

12 years ago

Attached file memory-report-2.json.gz — Details

Attachment #829945 - Flags: feedback+

Nicholas Nethercote [inactive]

Comment 23

•

12 years ago

Russ, thanks for the data. Unfortunately, the diff isn't helpful: 400.96 MB (100.0%) -- explicit ├──381.23 MB (95.08%) ── heap-unclassified ├───18.56 MB (04.63%) -- heap-overhead │ ├──22.22 MB (05.54%) ── waste │ └──-3.65 MB (-0.91%) ── page-cache └────1.17 MB (00.29%) ++ (5 tiny) It's mostly "heap-unclassified", which corresponds to memory consumption that we don't have explicit measurement of within Firefox. I'm running low on ideas for how to investigate this further.

Codacoder

Reporter

Comment 24

•

12 years ago

Frustrating. I guess I'll just "skip" 25. Thanks for your help, Nick. Anyone else have any ideas?

Nicholas Nethercote [inactive]

Comment 25

•

12 years ago

If you can be bothered redoing the memory dumps for FF26, it would be good to check if that slow increase in memory consumption is also falling under "heap-unclassified", or somewhere more useful.

Rob Campbell [:rc] (:robcee)

Comment 26

•

12 years ago

Nick, did anything specific change in FF25 wrt our garbage collector. From Russ' emails this weekend, it sounds like his monolothic webapp's testing harness is leaking memory. If this was handled in FF24 and now better in 26, I'm wondering if changes to our generational GC might have missed something in that timeframe. Of course, without additional debugging access this is going to be very difficult to pin down. Russ, you said your test harness is running something in an interval. Can you describe what's on those intervals? Are you caching data, dom nodes, ... stuff? Can you give us enough detail so we can write a test based on your information?

Rob Campbell [:rc] (:robcee)

Updated

•

12 years ago

Keywords: qawanted

Target Milestone: --- → Firefox 25

Andrew McCreight [:mccr8]

Comment 27

•

12 years ago

(In reply to Rob Campbell [:rc] (:robcee) from comment #26) > Nick, did anything specific change in FF25 wrt our garbage collector. From > Russ' emails this weekend, it sounds like his monolothic webapp's testing > harness is leaking memory. If this was handled in FF24 and now better in 26, > I'm wondering if changes to our generational GC might have missed something > in that timeframe. Our memory management changes all the time, but there's nothing particularly specific to 25 I can think of. Generational GC isn't enabled anywhere, so that shouldn't be an issue.

Rob Campbell [:rc] (:robcee)

Updated

•

12 years ago

Summary: Firefox crashes while left unattended, potentially due to hooxpot.dll → Firefox crashes while left unattended, memory climbs

Rob Campbell [:rc] (:robcee)

Comment 28

•

12 years ago

ok, thanks Andrew.

Codacoder

Reporter

Comment 29

•

12 years ago

@Rob Wow... sheesh... lemme think... Ok, the TH is NOT executing tests and is therefore NOT running that code I mentioned. It is loaded in the main browser window (ie top). The "monolithic app" (ha! not far off!) sits in an iframe that occupies (guessing) 95% of the viewport, the TH displaying a strip across the top - make sense? Graphically... + --------------+ | TH | <<== Hitting play/pause does NOT break into running code - hence "idle" + --------------+ | | | | | APP | <<== Hitting play/pause does NOT break into running code - hence "idle" | | | | | | + --------------+ So, no tests loaded, TH does its startup stuff and sits waiting on events (button clicks, essentially). And apart from that stupid brain-fade typo (a clearTimeout that should have been a clearInterval), last bugfix was in Feb - it's pretty stable. And again, that code has not executed during the lifetime of this conversation (not testing right now - I just happen to dev in the TH). So, yes, it's an obvious suspect but it has (so far) rock solid alibis ;) Aside: my Aurora is now 27.0a2 So, ... @Nick: I will carry out your request, but it will be on FF27 (assuming the small 8K leak is still there). Rob - I'll email you some bits.

Codacoder

Reporter

Comment 30

•

12 years ago

@Nick FF27 is climbing in 16 and 20K chunks. I'll do the test in a few hours.

Nicholas Nethercote [inactive]

Comment 31

•

12 years ago

> Our memory management changes all the time, but there's nothing particularly > specific to 25 I can think of. And our profiling coverage of JS memory consumption is really good, so if JS objects (or strings, or other JS things) were leaking we wouldn't be getting the rise entirely in "heap-unclassified". We have coverage for orphan DOM nodes as well, so it shouldn't be that either. The fact that there's such a drastic difference in behaviour between different Firefox versions makes it definitely smell like a bug on our side.

Nicholas Nethercote [inactive]

Comment 32

•

12 years ago

Russ sent me some about:memory snapshots for FF27, where he's seeing the small memory increase. Again, the increase falls entirely into "heap-unclassified": 41.79 MB (100.0%) -- explicit ├──38.47 MB (92.06%) ── heap-unclassified ├───2.83 MB (06.77%) -- workers/workers()/worker(resource://gre/modules/PageThumbsWorker.js, 0xNNN) │ ├──1.50 MB (03.59%) -- gc-heap │ │ ├──1.47 MB (03.51%) ── unused-arenas [+] │ │ └──0.03 MB (00.07%) ── chunk-admin [+] │ ├──0.68 MB (01.63%) ++ runtime │ └──0.65 MB (01.54%) ++ (3 tiny) ├───2.39 MB (05.72%) -- js-non-window │ ├──2.39 MB (05.72%) -- zones/zone(0xNNN) │ │ ├──2.18 MB (05.22%) -- compartment([System Principal], resource://gre/modules/TelemetryFile.jsm) │ │ │ ├──2.12 MB (05.08%) -- objects │ │ │ │ ├──1.80 MB (04.30%) -- malloc-heap │ │ │ │ │ ├──1.77 MB (04.23%) ── elements/non-asm.js │ │ │ │ │ └──0.03 MB (00.07%) ── slots [+] │ │ │ │ └──0.32 MB (00.78%) ++ gc-heap │ │ │ └──0.06 MB (00.15%) ++ (2 tiny) │ │ └──0.21 MB (00.49%) ++ (38 tiny) │ └──0.00 MB (00.00%) ++ runtime/code └──-1.90 MB (-4.54%) ++ (7 tiny) Argh. Quoting Russ's summary from his email: FF24 is now my workhorse - no known issues FF25 - we know about [bad, fast leak] FF26 - *was* Aurora when I first noticed the 8K creep... that was updated to... FF27 - I'm assuming 27.0a1? this showed the 8K creep too FF27.0a2 - current Aurora - showing 16K or 20K or 24K creep (The diff above is for FF27.0a2) I asked Russ about running the web app myself, but apparently getting permission to do so is almost impossible.

Nicholas Nethercote [inactive]

Updated

•

12 years ago

Whiteboard: [MemShrink] → [MemShrink:P3]

videoclocknet

Comment 33

•

12 years ago

Does it happen after Windows hibernates/sleeps? Can you try deactivating hibernation/sleep mode? I reproduced similar issues in the past.

Wayne Mery (:wsmwk)

Updated

•

12 years ago

Flags: needinfo?(russgthomas)

Keywords: regression

Codacoder

Reporter

Comment 34

•

12 years ago

I cannot say since I don't use hibernate/sleep. Do I think it would occur after either? Yes - it's unlikely to "fix" it but no, sleep is not "the cause".

Flags: needinfo?(russgthomas)

Codacoder

Reporter

Comment 35

•

12 years ago

FYI... On a whim I decided to download nightly: 28.0a1 The bug remains but far less aggressive - relentless 8K increments (approx 16K/second). @Nick: As before, if I disable the read of the config data in the app, the bug disappears. And yes, I'm hoping to chop that into pieces in the hope that I can nail it at least a little closer. Aside: re Australis... I know moz have a thing about protecting privacy and anonymity, but was the plan really to make the browser itself so anonymous? Would joe public recognize it in a movie?

Emanuel Hoogeveen [:ehoogeveen]

Comment 36

•

12 years ago

After bug 819839 merges to mozilla-central, it might be possible to detect these leaks in a custom build on Windows.

Benjamin Smedberg

Comment 37

•

12 years ago

Emanuel, what are the steps that users should take with a DMD build, now that it's on mozilla-central? The goal is to diagnose the heap-unclassified leak, correct?

Flags: needinfo?(emanuel.hoogeveen)

Emanuel Hoogeveen [:ehoogeveen]

Comment 38

•

12 years ago

I'm not the expert, but instructions for using DMD are here: https://wiki.mozilla.org/Performance/MemShrink/DMD In particular, see https://wiki.mozilla.org/Performance/MemShrink/DMD#Desktop and https://wiki.mozilla.org/Performance/MemShrink/DMD#Desktop_2 for instructions on how to start Firefox with DMD enabled and use it to get an alaysis. Regardless of how you set the environment variables, you'll want to do this from commandline so you can confirm that it's active. From there, you probably want to run an analysis just after opening Firefox, and then another one a while later for the sake of comparison. Since this is a slow leak that happens regardless of what you do, it's probably best to leave it be for a while and get a second trace (rather than browsing around). However, I don't think there are any automated builds that have DMD enabled right now. It might be possible to get a try build, or you could try to build it locally. Nicholas, is that right?

Flags: needinfo?(emanuel.hoogeveen) → needinfo?(n.nethercote)

Nicholas Nethercote [inactive]

Comment 39

•

12 years ago

> However, I don't think there are any automated builds that have DMD enabled > right now. It might be possible to get a try build, or you could try to > build it locally. Nicholas, is that right? Local builds certainly work. Try builds should be possible; I just added a paragraph to https://wiki.mozilla.org/Performance/MemShrink/DMD#Everything_other_than_B2G-device_builds on how to do a try build. I think it is correct but I haven't actually tested it.

Flags: needinfo?(n.nethercote)

Nicholas Nethercote [inactive]

Comment 40

•

12 years ago

I tried doing a DMD build via try server, but got linking errors: https://tbpl.mozilla.org/php/getParsedLog.php?id=31460041&tree=Try#error0 glandium, any idea what went wrong?

Flags: needinfo?(mh+mozilla)

Catalin Iacob

Comment 41

•

12 years ago

Nick, you say local builds certainly work but maybe not all combinations do. My Windows .mozconfig has: ac_add_options --enable-debug ac_add_options --enable-dmd mk_add_options MOZ_MAKE_FLAGS="-j4" I don't think I ever tested --enable-dmd without --enable-debug (and for sure not with --enable-release). It didn't occur to me but looking retrospectively I should have done it. I can try to look at it over the next days but don't promise I'll find time for it.

Nicholas Nethercote [inactive]

Comment 42

•

12 years ago

Ok, the debug try build worked! https://tbpl.mozilla.org/?tree=Try&rev=cfab371e2020 is the run. Russ: your turn now! Note that Windows support for DMD only landed a few days ago, so this is all very new. Fingers crossed that it works. And debug builds are significantly slower than optimized builds, so be prepared for that. The following instructions are mostly from https://wiki.mozilla.org/Performance/MemShrink/DMD. I've copied the Windows-specific parts here to make things easier for you. GET THE BUILD ------------- Russ: please go to http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nnethercote@mozilla.com-cfab371e2020/try-win32-debug/ ... and I guess download the installer.exe file? Or maybe the win32.zip file? Not sure, but it must be one of those two. RUN --- To run it, do this from the command line: MOZ_REPLACE_MALLOC_LIB=path\\to\\dmd.dll \ DMD=1 \ <command> I'm not sure what "path\\to\\" will be, hopefully you can search for dmd.dll and work it out. <command> will be something like "firefox -P <profilename>". If you give a <profilename> that doesn't already exist, it'll ask if you want to create it. On start-up, you'll see some commentary on stderr, such as: DMD[20523] $DMD = '1' DMD[20523] DMD is enabled The number in brackets is the process ID. The browser will run a little slower than usual. ANALYZE ------- DMD doesn't do anything notable until you ask it to. You should trigger analysis once Firefox has been running a while and memory consumption has climbed significantly. To trigger DMD's analysis phase, enter this URI: javascript:DMDReportAndDump(<filename>) where <filename> is a string containing the name of the file that output will be written to. For example: javascript:DMDReportAndDump("out.dmd") The output file will be written to the current working directory. Note that Firefox doesn't let you do this directly in the address bar, so you must either (a) create a bookmark for it, or (b) run it from the error console. The latter is discouraged because opening the error console allocates a lot of memory that will skew your results. Furthermore, using the bookmark fails from certain pages such as about:memory; if that happens in a debug build you'll get a console warning like this: WARNING: No principal to execute JS with: file /home/njn/moz/mi2/dom/src/jsurl /nsJSProtocolHandler.cpp, line 157 This command runs all the memory reporters and then invokes DMD analysis of the reports, which triggers this commentary: DMD[20600] Dump 1 { DMD[20600] gathering live block groups... DMD[20600] creating and sorting double-reported block group array... DMD[20600] creating and sorting unreported block group array... DMD[20600] printing unreported block group array... DMD[20600] creating and sorting unreported frame group array... DMD[20600] printing unreported frame group array... DMD[20600] creating and sorting reported block group array... DMD[20600] printing reported block group array... DMD[20600] creating and sorting reported frame group array... DMD[20600] printing reported frame group array... DMD[20600] } I'm not sure how long this takes on Windows. On Mac it can take 30+ seconds. Once that's finished printing, please upload <filename> to the bug report! You can run DMDReportAndDump multiple times in a single session.

Nicholas Nethercote [inactive]

Comment 43

•

12 years ago

I filed bug 947117 about the DMD link error in non-debug builds.

Nicholas Nethercote [inactive]

Updated

•

12 years ago

Flags: needinfo?(mh+mozilla) → needinfo?(russgthomas)

Codacoder

Reporter

Comment 44

•

12 years ago

@Nick: that's a 28 build. Killer bug is in the 25 branch. Please confirm this is the right target?

Flags: needinfo?(russgthomas)

Nicholas Nethercote [inactive]

Comment 45

•

12 years ago

You said that 28 still exhibits a leak, just a smaller one, right? That's the version that's currently under development. 25 is past that point now...

Codacoder

Reporter

Comment 46

•

12 years ago

NICK > I'm not sure what "path\\to\\" will be, hopefully you can search for dmd.dll and work it out. That's a big fat 404. Not in the zip, and I aborted the .installer.exe since it wants to install nightly again (which I have).

Codacoder

Reporter

Comment 47

•

12 years ago

Sure. 28 it is.

Nicholas Nethercote [inactive]

Comment 48

•

12 years ago

>> I'm not sure what "path\\to\\" will be, hopefully you can search for >> dmd.dll >> and work it out. > > That's a big fat 404. Not in the zip, and I aborted the .installer.exe > since it wants to install nightly again (which I have). glandium, is dmd.dll not included by whatever does the packaging?

Flags: needinfo?(mh+mozilla)

Mike Hommey [:glandium]

Comment 49

•

12 years ago

dmd.dll is not in package-manifest.in, so yes, it's not included.

Flags: needinfo?(mh+mozilla)

Nicholas Nethercote [inactive]

Comment 50

•

12 years ago

It's Friday night here, so it'll have to wait until Monday, but I can try to work out how to add it to package-manifest and do another try run. Or if someone else wants to have a go in the mean-time, just note that you have to disable trace-malloc when you enable DMD, as I did in https://hg.mozilla.org/try/rev/9b92b8c295a7.

Nick Thomas [:nthomas] (UTC+12)

Comment 51

•

12 years ago

I bet you need to add 58 #ifdef MOZ_DMD 59 @BINPATH@/@DLL_PREFIX@dmd@DLL_SUFFIX@ 60 #endif from b2g/installer/package-manifest.in to browser/installer/package-manifest.in (and possibly the mobile and xulrunner equivalents).

Nicholas Nethercote [inactive]

Comment 52

•

12 years ago

Thanks, nthomas! I did another try build, the output is here: http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nnethercote@mozilla.com-8f49111414e6/try-win32-debug/ I verified that firefox-28.0a1.en-US.win32.zip contains dmd.dll. Russ, your turn again! :)

Flags: needinfo?(russgthomas)

Codacoder

Reporter

Comment 53

•

12 years ago

Attached file out.dmd — Details

out.dmd generated as per Nick Nethercote's instructions in comment 42. Two invokations, first as soon as the 8K allocations started to appear in windows Task Manager; second "some time later" ;) Firefox 28.0a1 new profile, Win7/64.

Flags: needinfo?(russgthomas)

Codacoder

Reporter

Comment 54

•

12 years ago

Attached file console.txt — Details

Output from windows console during DMD session FWIW.

Nicholas Nethercote [inactive]

Updated

•

12 years ago

Attachment #8346715 - Attachment mime type: text/x-vhdl → text/plain

Nicholas Nethercote [inactive]

Comment 55

•

12 years ago

> Two > invokations, first as soon as the 8K allocations started to appear in > windows Task Manager; second "some time later" ;) Thanks, Russ! I only see output from a single run in that attachment, and it looks like it's probably the earlier one. Did you forget to attach the second one?

Codacoder

Reporter

Comment 56

•

12 years ago

> You can run DMDReportAndDump multiple times in a single session. No... I assumed the above meant it would keep on appending? If that causes an overwrite of previous data then you have the last invokation. Let me know - I can easily re-run it. For clarity: "invoke" means "click the bookmarklet" with URI: javascript:DMDReportAndDump("out.dmd")

Codacoder

Reporter

Comment 57

•

12 years ago

Confirmed - it overwrites. I'll time this one for approx 15 mins.

Nicholas Nethercote [inactive]

Comment 58

•

12 years ago

(In reply to Russ from comment #56) > > You can run DMDReportAndDump multiple times in a single session. > > No... I assumed the above meant it would keep on appending? If that causes > an overwrite of previous data then you have the last invokation. Let me > know - I can easily re-run it. Oh, right. Yes, if you give the same filename to DMDReportAndDump() twice the second invocation will overwrite the file written by the first invocation. > For clarity: "invoke" means "click the bookmarklet" with URI: > javascript:DMDReportAndDump("out.dmd") Yes. So that means the attached file is from the second invocation. This line near the end: > Total: ~99,507,611 bytes (100.00%) in ~14,353 blocks (100.00%) Indicates the heap was just under 100MB, which is a lot smaller than I expected. Hmm.

Nicholas Nethercote [inactive]

Comment 59

•

12 years ago

> Confirmed - it overwrites. I'll time this one for approx 15 mins. Yeah... ideally you'd invoke DMD shortly before the crash happens, or at least at a point where memory consumption has climbed greatly.

Codacoder

Reporter

Comment 60

•

12 years ago

Not sure we'll reach the crash point (these incs are 8K, the crash was occurring when the incs were of the order of 1M).

Codacoder

Reporter

Comment 61

•

12 years ago

Attached file out.dmd.2 — Details

Dump after ~15mins

Nicholas Nethercote [inactive]

Comment 62

•

12 years ago

15 minutes isn't that long -- if we're seeing 8KB per second growth (is that right?) that's only about 7 MB. A long run would be useful... except that the stacks DMD is producing look totally bogus. Consider the biggest unreported record: Unreported: 1 block in stack trace record 1 of 1,516 15,765,504 bytes (15,765,504 requested / 0 slop) 14.53% of the heap (14.53% cumulative); 23.96% of unreported (23.96% cumulative) Allocated at moz_xrealloc[mozalloc +0x1123] 0x6a8f1123 mozilla::detail::GuardObjectNotificationReceiver::operator=[xul +0x2E08] 0x2f72e08 mozilla::detail::GuardObjectNotificationReceiver::operator=[xul +0x4B7CE] 0x2fbb7ce mozilla::detail::GuardObjectNotificationReceiver::operator=[xul +0x4C876] 0x2fbc876 JSPrincipals::dump[xul +0xF5DDE6] 0x3ecdde6 JSPrincipals::dump[xul +0xF5E2CB] 0x3ece2cb JSPrincipals::dump[xul +0xF5E680] 0x3ece680 soundtouch::SoundTouch::operator=[xul +0xF83521] 0x3ef3521 soundtouch::SoundTouch::operator=[xul +0xF87DA2] 0x3ef7da2 soundtouch::SoundTouch::operator=[xul +0xF881C3] 0x3ef81c3 XRE_AddJarManifestLocation[xul +0x16B4E3] 0x30db4e3 mozilla::services::_external_GetHistoryService[xul +0x108615] 0x3078615 XRE_AddJarManifestLocation[xul +0x16B924] 0x30db924 PRP_TryLock[nss3 +0x16DE4F] 0x56edde4f PR_MD_UNLOCK[nss3 +0x17382F] 0x56ee382f beginthreadex[MSVCR100D +0x4A273] 0x5777a273 beginthreadex[MSVCR100D +0x4A204] 0x5777a204 BaseThreadInitThunk[kernel32 +0x1336A] 0x7590336a RtlInitializeExceptionChain[ntdll +0x39F72] 0x77dd9f72 RtlInitializeExceptionChain[ntdll +0x39F45] 0x77dd9f45 GuardObjectNotificationReceiverd::operator= doesn't exist, and if it did, I doubt it would call itself. JSPrincipals::dump exists but doesn't call itself. SoundTouch::operator= doesn't exist, and it wouldn't call itself if it did. And these bogus |operator=| entries appear in all the large records. I don't know what to make of this. Dammit.

Nicholas Nethercote [inactive]

Comment 63

•

12 years ago

The only other thing I can think of is this: on Linux and Mac, DMD output needs post-processing to turn the symbol names and offsets into filenames and line numbers. On Windows, this isn't necessary... at least, if you build and run it locally the filename/line-numbers are obtained automatically. I don't know how this works, but it's clearly not happening in Russ's run (which came from a try build). I wonder if we were able to post-process the output to produce filenames/line numbers whether the results would be more sensible. bsmedberg, do you know about this?

Flags: needinfo?(benjamin)

Codacoder

Reporter

Comment 64

•

12 years ago

Attached file out.dmd.3 — Details

Something like 90 mins

Nicholas Nethercote [inactive]

Comment 65

•

12 years ago

khuey said ted might also know about windows debug info stuff.

Flags: needinfo?(ted)

(Away)

Comment 66

•

12 years ago

Did you do MOZ_CRASHREPORTER_UPLOAD_FULL_SYMBOLS on the try run? I think we need the original .pdb files; I don't think any Windows tools know how to handle .sym. Plus the automatic PDB hookup only works if the .pdb's are at the same absolute path as the path where they were build (which is usually the case if it's a local build, but usually not otherwise). To set the path manually, you'll want to set the _NT_SYMBOL_PATH environment variable to point to the folder with the .pdb's (and make sure you launch in such a way that the browser inherits this environment variable).

Nicholas Nethercote [inactive]

Comment 67

•

12 years ago

dmajor and I are working on a new build that incorporates the symbols. We'll test locally before asking Russ to try it!

(no longer active)

Comment 68

•

12 years ago

(In reply to comment #66) > Plus the automatic PDB hookup only works if the .pdb's are at the same absolute > path as the path where they were build (which is usually the case if it's a > local build, but usually not otherwise). To set the path manually, you'll want > to set the _NT_SYMBOL_PATH environment variable to point to the folder with the > .pdb's (and make sure you launch in such a way that the browser inherits this > environment variable). FWIW we could probably do this within our own code if we detect where the .pdb files are (for example, if they're in the installation directory.)

(Away)

Comment 69

•

12 years ago

I confirmed Nick's latest try build works on my machine. Here were the steps I took: https://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/nnethercote@mozilla.com-af379511c3c4/try-win32-debug/ Unzip firefox-29.0a1.en-US.win32.zip to c:\firefox Unzip firefox-29.0a1.en-US.win32.crashreporter-symbols-full.zip to c:\ffsymbols Open a command prompt (as administrator since I dumped stuff into c:\) cd \ffsymbols for /f %i in ('dir /s /b *.pd_') do @expand -r %i cd \firefox mkdir .profile set MOZ_REPLACE_MALLOC_LIB=dmd.dll set DMD=1 set _NT_SYMBOL_PATH=c:\ffsymbols firefox.exe -no-remote -profile .profile This produced an out.dmd with reasonable symbols. The "for...expand" piece is crucial; I always forget this step and end up wondering where my symbols went.

Benjamin Smedberg

Updated

•

12 years ago

Flags: needinfo?(benjamin)

(not currently active) Ted Mielczarek

Comment 70

•

12 years ago

FYI you can also unzip the compressed symbols somewhere and use that as an upstream store, as per: https://wiki.mozilla.org/ReleaseEngineering/TryServer#Debugging_with_debug_symbols (and make dbghelp do the decompression for you.) But yeah, we could also bake some of this logic in if we're expecting people to use these DMD builds.

Flags: needinfo?(ted)

Nicholas Nethercote [inactive]

Comment 71

•

12 years ago

BTW, I updated https://wiki.mozilla.org/Performance/MemShrink/DMD to explain how to do DMD-enabled try builds and runs on Windows.

Codacoder

Reporter

Comment 72

•

12 years ago

Ok, I'm trying this new build and again following instructions in comment 42, specifically, the advice under the heading "ANALYZE" where the suggestion is to use a bookmark. On my previous runs I did this and it worked just fine (new tab, click bookmarklet) but on this occasion, I get this: [73356] WARNING: No principal to execute JS with: file c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/dom/src /jsurl/nsJSProtocolHandler.cpp, line 156 Which is similar to the warning mentioned in comment 42. I eventually managed to get the tab/bookmark to work by middle-clicking it. Problem: memory had suddenly more than doubled (from about 210M to 500+M). So, if I post these results they're going to be falsely inflated by the opening/closing tabs code (which in itself sounds buggy to me but that's for another day, perhaps). So, question: should I post the out.dmd anyway? I've chosen to restart the experiment and provide a less polluted output but I have the out.dmd here if it's needed. @David Major: I followed your steps but chose not to pollute my root and also put the statements in a batch file. Worked fine except for the for statement - what would be the syntax changes required? I'm guessing there's something up with the quotes on expansion?

(Away)

Comment 73

•

12 years ago

(In reply to Russ from comment #72) > @David Major: I followed your steps but chose not to pollute my root and > also put the statements in a batch file. Worked fine except for the for > statement - what would be the syntax changes required? I'm guessing there's > something up with the quotes on expansion? If you use a batch file then you'll need to double the percents: %i -> %%i Though once you have the symbols expanded, you don't need to do it again on every run.

Codacoder

Reporter

Comment 74

•

12 years ago

Thanks David - been a while since I wrote batch code.

Codacoder

Reporter

Comment 75

•

12 years ago

Following on from comment 72, I restarted and let the app run as previously described. I watched memory climb from ~194M to ~260M. I middle clicked my bookmarklet and... "Nightly has stopped working" Ouch. Last line printed to the console was : Assertion failure: mutationCount == table_->mutationCount, at c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\ obj-firefox\dist\include\js/HashTable.h:872 And no, crash reporter did not appear. So, I fired up the build again (again from the command prompt)... tried to load my app... BANG! [64296] ###!!! ASSERTION: Adding a child where we already have a child? This may misbehave: 'Error', file c:/builds/moz2 _slave/try-w32-d-00000000000000000000/build/docshell/shistory/src/nsSHEntry.cpp, line 622 Over to you, Nick.

Flags: needinfo?(n.nethercote)

Nicholas Nethercote [inactive]

Comment 76

•

12 years ago

Ah, the joys of working with immature tools... > [73356] WARNING: No principal to execute JS with: file > c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/dom/src > /jsurl/nsJSProtocolHandler.cpp, line 156 > > Which is similar to the warning mentioned in comment 42. Yes. What tab did you have open when you clicked on the bookmark? Was it about:memory or some other non-content tab? That's the only cause of that error that I'm aware of. > Assertion failure: mutationCount == table_->mutationCount, at > c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\ > obj-firefox\dist\include\js/HashTable.h:872 Ugh. No idea what would cause that. > [64296] ###!!! ASSERTION: Adding a child where we already have a child? This > may misbehave: 'Error', file c:/builds/moz2 > _slave/try-w32-d-00000000000000000000/build/docshell/shistory/src/nsSHEntry. > cpp, line 622 Did that kill the browser? We have lots of "soft" assertions that cause an error to be printed to the console but don't kill the browser, and that one looks like a soft assertion?

Flags: needinfo?(n.nethercote)

Codacoder

Reporter

Comment 77

•

12 years ago

> What tab did you have open when you clicked on the bookmark? Was it about:memory or some other non-content tab? New, fresh, virginal tab. > Did that kill the browser? Sure. As before, Bang as in dead as in gone to meet its maker as in this is an ex-parrot... Maybe I should be more clear; that last was a restart, followed by me launching my app (via a bookmark) and then an immediate bang. That line was posted to the console "as it happened" near enough. Reason for the NEEDSINFO: I really don't know how to proceed - I'd prefer not to guess. Was that a bad nightly?

Nicholas Nethercote [inactive]

Comment 78

•

12 years ago

> > What tab did you have open when you clicked on the bookmark? Was it about:memory or some other non-content tab? > > New, fresh, virginal tab. It appears that empty tabs also don't permit DMD to be run. I'll update the docs. > Reason for the NEEDSINFO: I really don't know how to proceed - I'd prefer > not to guess. Was that a bad nightly? Not sure. Assertions are only enabled in debug builds, which you have, so you might be hitting latent bugs that wouldn't necessarily manifest in an observable way in a release build. Bugzilla has no record of the mutationCount assertion failure; it's possible that DMD is involved, because DMD uses those hash tables. But the "Adding a child" one has a pedigree -- see bug 645229, bug 751208, bug 859156, 864383. Optimized DMD-enabled builds for Windows have just started working (see bug 947117 and bug 948621). So my best suggestion is to roll the dice one more time... I'll do such a build, and then you can try it, Russ. You won't get the assertion failures (because assertions are off) and hopefully it'll work sufficiently that we can get some useful data. Sorry this has been such a drawn-out process :(

Nicholas Nethercote [inactive]

Comment 79

•

12 years ago

> It appears that empty tabs also don't permit DMD to be run. I'll update the > docs. If you need to open a tab in which to trigger DMD (though wouldn't you have a content tab open somewhere?) then about:home is a good choice -- it's simple enough to not perturb the results much.

Codacoder

Reporter

Comment 80

•

12 years ago

> If you need to open a tab in which to trigger DMD As I said, middle clicking DMD is working fine. > (though wouldn't you have a content tab open somewhere?) Sure - my app.

Codacoder

Reporter

Comment 81

•

12 years ago

Attached file out.zip — Details

Nicholas Nethercote [inactive]

Comment 82

•

12 years ago

Attached file Partial DMD output that suggests audio is the issue — Details

> out.zip Russ, I guess you managed to get some output without crashing? Great! If we can get data from a longer run that will be more conclusive (I'm building the optimized DMD build right now), but it looks like webaudio is the issue. I've attached the relevant part of the DMD output.

Nicholas Nethercote [inactive]

Comment 83

•

12 years ago

ehsan, padenot, karlt: Russ saw a major memory leak on Windows (~1MB per second) when upgrading from FF24 to FF25. FF26 and later are better, but still leak ~8KB per second. His site is proprietary so he's been gallantly doing DMD runs for us. He just obtained the first usable data and it points the finger at webaudio. Please see the "partial DMD output" attachment. We obviously don't have memory reporters for web audio (that's bug 884368), but if you could evaluate this for likelihood of leakiness that would be great!

Flags: needinfo?(paul)

Flags: needinfo?(ehsan)

Codacoder

Reporter

Comment 84

•

12 years ago

Attached file out2.zip — Details

Same session Nick, taken moments ago.

Nicholas Nethercote [inactive]

Comment 85

•

12 years ago

Aha! Comparing the first DMD output to the second, most of the entries I copied are unchanged. However, this one has grown from ~8,880,125 bytes to ~66,740,221 bytes: > Unreported: ~2 blocks in stack trace record 1 of 1,423 > ~66,740,221 bytes (~66,740,221 requested / ~0 slop) > 40.88% of the heap (40.88% cumulative); 57.56% of unreported (57.56% cumulative) > Allocated at > moz_xrealloc (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\memory\mozalloc\mozalloc.cpp:84) 0x583a1123 > nsTArray_base<nsTArrayInfallibleAllocator,nsTArray_CopyWithMemutils>::EnsureCapacity (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\obj-fi > refox\dist\include\nstarray-inl.h:170) 0x3152ad7 > nsTArray_Impl<mozilla::AudioChunk,nsTArrayInfallibleAllocator>::AppendElements (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\obj-firefox\ > dist\include\nstarray.h:1257) 0x319b6a3 > mozilla::MediaSegmentBase<mozilla::AudioSegment,mozilla::AudioChunk>::AppendChunk (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\m > edia\mediasegment.h:291) 0x319c7a4 > mozilla::AudioSegment::AppendAndConsumeChunk (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\audiosegment.h:157) 0x40ca8cb > mozilla::AudioNodeStream::AdvanceOutputSegment (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\audionodestream.cpp:460) 0x40c > adb0 > mozilla::AudioNodeStream::ProduceOutput (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\audionodestream.cpp:450) 0x40cb165 > mozilla::MediaStreamGraphImpl::ProduceDataForStreamsBlockByBlock (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\mediastreamg > raph.cpp:1090) 0x40f0096 > mozilla::MediaStreamGraphImpl::RunThread (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\mediastreamgraph.cpp:1190) 0x40f4917 > mozilla::`anonymous namespace'::MediaStreamGraphInitThreadRunnable::Run (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\content\media\media > streamgraph.cpp:1344) 0x40f4d38 > nsThread::ProcessNextEvent (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\xpcom\threads\nsthread.cpp:634) 0x32bcf2d > NS_ProcessNextEvent (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\xpcom\glue\nsthreadutils.cpp:263) 0x325957d > mozilla::ipc::MessagePumpForNonMainThreads::Run (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\ipc\glue\messagepump.cpp:301) 0x34e7e91 > MessageLoop::RunInternal (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\ipc\chromium\src\base\message_loop.cc:226) 0x34aa864 > MessageLoop::RunHandler (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\ipc\chromium\src\base\message_loop.cc:220) 0x34aada1 > MessageLoop::Run (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\ipc\chromium\src\base\message_loop.cc:194) 0x34ab26c > nsThread::ThreadFunc (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\xpcom\threads\nsthread.cpp:267) 0x32bd492 > _PR_NativeRunThread (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\nsprpub\pr\src\threads\combined\pruthr.c:397) 0x57f9de4f > pr_root (c:\builds\moz2_slave\try-w32-d-00000000000000000000\build\nsprpub\pr\src\md\windows\w95thred.c:90) 0x57fa382f > beginthreadex[MSVCR100D +0x4A273] 0x5815a273 > beginthreadex[MSVCR100D +0x4A204] 0x5815a204 > BaseThreadInitThunk[kernel32 +0x1336A] 0x7590336a > RtlInitializeExceptionChain[ntdll +0x39F72] 0x77dd9f72 > RtlInitializeExceptionChain[ntdll +0x39F45] 0x77dd9f45 I think we have our smoking gun!

Codacoder

Reporter

Comment 86

•

12 years ago

Additional info pertaining to webaudio: During a normal working day and my normal dev/run/debug cycle, I'll usually not restart the browser until I see some kind of issue (issue might be anything). One issue has been, after a number of dev/run/debug cycles, FF will start complaining it can't find the audio files (NB files it has previously been playing just fine). Then I'll restart the browser. Nick: Now I understand why my app's config file seems to trigger the issue. It is responsible for triggering the loading of the sound libs. I use a 3rd party lib called howler: http://goldfirestudios.com/blog/104/howler.js-Modern-Web-Audio-Javascript-Library

Andrew McCreight [:mccr8]

Updated

•

12 years ago

Component: Untriaged → Web Audio

Product: Firefox → Core

Target Milestone: Firefox 25 → ---

Codacoder

Reporter

Comment 87

•

12 years ago

@Nick: just for confirmation, I just reran the same nightly, with my config file disabled which means the sound libs are NOT loaded and the 8K leak is gone. Please note: there is NO sound asked for or heard during the experimental conditions detailed above. It seems the mere presence of the lib(s) is enough to trigger the issue.

Nicholas Nethercote [inactive]

Comment 88

•

12 years ago

We probably won't need it, but just in case, I did an optimized, DMD-enabled build here: https://tbpl.mozilla.org/?tree=Try&rev=5bae1a082a30

Nicholas Nethercote [inactive]

Updated

•

12 years ago

Summary: Firefox crashes while left unattended, memory climbs → Firefox crashes while left unattended, memory climbs, due to webaudio leak

Nicholas Nethercote [inactive]

Comment 89

•

12 years ago

> @Nick: just for confirmation, I just reran the same nightly, with my config > file disabled which means the sound libs are NOT loaded and the 8K leak is > gone. > > Please note: there is NO sound asked for or heard during the experimental > conditions detailed above. It seems the mere presence of the lib(s) is > enough to trigger the issue. Thanks for the confirmation, that's very helpful. Can you try again with FF25 to see if webaudio is also responsible for the bad leak in that release?

Paul Adenot (:padenot)

Assignee

Comment 90

•

12 years ago

I've started to track this down. I may have STR: - Open http://goldfirestudios.com/blog/104/howler.js-Modern-Web-Audio-Javascript-Library - Click randomly on the buttons - Close the tab I instrumented a build by putting something like > static uint32_t max = 0; > if (mChunks.Length() > max) { > max = mChunks.Length(); > } > printf("%p == mChunks.Length(): %u (max: %u)\n", this, mChunks.Length(), max); and when I close the tab (or maybe on CC), the `max` and `mChunks.Length()` value seem to grow without stopping.

Assignee: nobody → paul

Flags: needinfo?(paul)

Paul Adenot (:padenot)

Assignee

Comment 91

•

12 years ago

Okay, found the issue for the memory leak: In StreamBuffer::ForgetUpTo (where we are supposed to determine how much buffer space we should remove), we see (numbers are from the gdb session I have at hand, we are here [1]): (gdb) p aTime $14 = 41062711 (gdb) p roundTo $15 = 52428 (gdb) p forget $16 = 41051124 (gdb) p mForgottenTime $17 = 41051124 so we return immediately and we never free anything ever again, because aTime does not grow, because the streams are blocked until STREAM_TIME_MAX. I checked, and the context is shutdown, like it should be, because the tab has been closed for a while. (gdb) p mStreams[0]->mEngine->mNode->mContext->mIsShutDown $77 = true I then went to about:memory, I saw heap-unclassified grow super fast (like if we were leaking real time uncompressed audio data), and I pressed "Minimize memory usage", and then, everything got destroyed properly, and head-unclassified was instantly back to some reasonable value. I think that if one is running a unit-tests suite, AudioContexts would be created and teared down all the time, and we would clearly observe what Russ reported us. Anyways, the solution here seem to adjust the logic somewhere in the MSG to avoid buffering like crazy, because at this point in the life cycle of the web page, it's useless anyways (the page has been closed, no sound is output anymore). The cycle-collection seem to be working fine, for me, but I'm no expert. Also that I could repro the problem using my STR in comment 90 every single time I tried. I'm not sure if one has to use this specific js library, though. Also, I forgot to mention, but the code I wrote to instrument the build goes in MediaSegment::AppendChunk. [1]: http://dxr.mozilla.org/mozilla-central/source/content/media/StreamBuffer.cpp#70

Benjamin Smedberg

Comment 92

•

12 years ago

What's the likely risk of a patch here? This is serious enough that if there's a low-risk patch I'd like to get it in 27 beta.

tracking-firefox27: --- → ?

tracking-firefox28: --- → ?

Paul Adenot (:padenot)

Assignee

Comment 93

•

12 years ago

Attached patch Don't overbuffer on AudioContext shutdown. r= — Details — Splinter Review

Well, this is less serious imo that what this bug would indicate. One would have to setup and shutdown a bunch of AudioContexts, and have delayed CC (like, when you run unit-tests) to have a constant high memory usage. A normal user would only see a spike on context shutdown. The problem is, when we ::Shutdown an AudioContext, we increment the blocker count of the destination node. This is somewhat okay for our purpose, because we don't output sound anymore, but causes buffering in the graph, and that causes this bug. If we replace ::Suspend by ::Mute, it's better. afaik, we do the same amount of work, but we simply discard the data instead of buffering it, which seems more correct (to me).

Attachment #8348783 - Flags: review?(roc)

Attachment #8348783 - Flags: review?(ehsan)

Codacoder

Reporter

Comment 94

•

12 years ago

@Nick > Can you try again with FF25 to see if webaudio is also responsible for the bad leak in that release? Yeah, I'm pretty sure we did that already but I did it again anyway. Confirmed. FF25 has a huge leak (aggregated rate ~1MB/second). When the config file (and hence audio/howler) is disabled, leak is NOT present.

Codacoder

Reporter

Comment 95

•

12 years ago

> Well, this is less serious imo that what this bug would indicate. One would have to setup and shutdown a bunch of AudioContexts, and have delayed CC (like, when you run unit-tests) to have a constant high memory usage. > A normal user would only see a spike on context shutdown. I apologize if I've misread or misunderstood you, but... IMO, FF25 needs an ESR. Since I posted my previous and read yours, I've watched a FF25 session grow from approx ~400MB to 1,800MB and will (by my experience) crash soon with no helpful stack info in the crash report. Whatever the cause in FF25, it was improved somewhat in FF26. Unless you can say why that change occurred, it can easily regress. That, IMO, is why we're pushing on this. That, IMO, is what makes this serious. And please explain what demographic is covered by "normal user" ?

Paul Adenot (:padenot)

Assignee

Comment 96

•

12 years ago

(In reply to Russ from comment #95) > > Well, this is less serious imo that what this bug would indicate. One would have > to setup and shutdown a bunch of AudioContexts, and have delayed CC (like, > when > you run unit-tests) to have a constant high memory usage. > > A normal user would only see a spike on context shutdown. > > I apologize if I've misread or misunderstood you, but... > > IMO, FF25 needs an ESR. Since I posted my previous and read yours, I've > watched a FF25 session grow from approx ~400MB to 1,800MB and will (by my > experience) crash soon with no helpful stack info in the crash report. Maybe you meant chemspill / dot release by saying ESR? That's the call of release drivers, but my patch is simple enough that it seems safe to uplift it on Beta. > Whatever the cause in FF25, it was improved somewhat in FF26. Unless you > can say why that change occurred, it can easily regress. That, IMO, is why > we're pushing on this. That, IMO, is what makes this serious. The change in behavior is likely caused by all the changes Karl Tomlinson landed between 25 and 26, some of which are related to the lifetime of the web audio object. This is my guess, I haven't actually looked in the detail. > And please explain what demographic is covered by "normal user" ? My expectations, confirmed by the code I could on the internet is that most applications using Web Audio use only one AudioContext, and don't trigger this bug. That said, rest assured that I'm doing the work that is needed so we are sure that : (1) This particular bug is fixed (2) We have memory probes in place to track down further bugs in the area (leaks related to Web Audio). This is tracked in bug 884368, I just bumped its priority.

Robert Kaiser

Comment 97

•

12 years ago

Also note that Firefox 25 is not supported any more as 26 has been released - and that version at least is better in terms of this leak. For 27, I hope the better fix will be in.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

12 years ago

Attachment #8348783 - Flags: review?(roc)

Attachment #8348783 - Flags: review?(ehsan)

Attachment #8348783 - Flags: review+

(no longer active)

Comment 98

•

12 years ago

Comment on attachment 8348783 [details] [diff] [review] Don't overbuffer on AudioContext shutdown. r= Review of attachment 8348783 [details] [diff] [review]: ----------------------------------------------------------------- Makes sense.

Attachment #8348783 - Flags: review+ → review?(roc)

(no longer active)

Comment 99

•

12 years ago

(And I do think that we should uplift this patch to Aurora and Beta FWIW.)

Flags: needinfo?(ehsan)

Nicholas Nethercote [inactive]

Comment 100

•

12 years ago

> (And I do think that we should uplift this patch to Aurora and Beta FWIW.) Me too -- the patch is tiny and we're early in the Beta period. Speaking of the patch... I don't know anything about this code, but it's disconcerting that such an innocuous-looking change can have such a big effect. Is it worth having at least a comment on the Mute() call explaining why it's preferred to Suspend(), or something like that?

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

12 years ago

Attachment #8348783 - Flags: review?(roc) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 101

•

12 years ago

(In reply to Nicholas Nethercote [:njn] from comment #100) > Speaking of the patch... I don't know anything about this code, but it's > disconcerting that such an innocuous-looking change can have such a big > effect. Is it worth having at least a comment on the Mute() call explaining > why it's preferred to Suspend(), or something like that? Something else should be kicking in as well. Cycle collection should have freed everything and caused MSG shutdown before we consume much memory. So I think we need to investigate that and possibly do a followup fix for that too.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 102

•

12 years ago

(In reply to Paul Adenot (:padenot) from comment #91) > In StreamBuffer::ForgetUpTo (where we are supposed to determine how much > buffer space we should remove), we see (numbers are from the gdb session I > have at hand, we are here [1]): > > (gdb) p aTime > $14 = 41062711 > (gdb) p roundTo > $15 = 52428 > (gdb) p forget > $16 = 41051124 > (gdb) p mForgottenTime > $17 = 41051124 > > so we return immediately and we never free anything ever again, because > aTime does not grow, because the streams are blocked until STREAM_TIME_MAX. Hmm, so something is wrong here too. An AudioNode feeding into a blocked destination should still be able to forget; its MediaStream should not be blocked, so we should just be throwing data away.

(no longer active)

Comment 103

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #101) > (In reply to Nicholas Nethercote [:njn] from comment #100) > > Speaking of the patch... I don't know anything about this code, but it's > > disconcerting that such an innocuous-looking change can have such a big > > effect. Is it worth having at least a comment on the Mute() call explaining > > why it's preferred to Suspend(), or something like that? > > Something else should be kicking in as well. Cycle collection should have > freed everything and caused MSG shutdown before we consume much memory. So I > think we need to investigate that and possibly do a followup fix for that > too. That's true. Although we can't really predict when the CC kicks in so it's not entirely surprising that we had this bug given that the test case is creating contexts all the time.

u279076

Comment 104

•

12 years ago

Rob, it seems like you added qawanted to this bug a month ago. Do you recall what you needed us to do and if the request is still valid?

Flags: needinfo?(rcampbell)

Nicholas Nethercote [inactive]

Comment 105

•

12 years ago

qawanted is no longer necessary; we have a good idea of what the problem is and steps to reproduce.

Flags: needinfo?(rcampbell)

Keywords: qawanted

Paul Adenot (:padenot)

Assignee

Comment 106

•

12 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #102) > (In reply to Paul Adenot (:padenot) from comment #91) > > In StreamBuffer::ForgetUpTo (where we are supposed to determine how much > > buffer space we should remove), we see (numbers are from the gdb session I > > have at hand, we are here [1]): > > > > (gdb) p aTime > > $14 = 41062711 > > (gdb) p roundTo > > $15 = 52428 > > (gdb) p forget > > $16 = 41051124 > > (gdb) p mForgottenTime > > $17 = 41051124 > > > > so we return immediately and we never free anything ever again, because > > aTime does not grow, because the streams are blocked until STREAM_TIME_MAX. > > Hmm, so something is wrong here too. An AudioNode feeding into a blocked > destination should still be able to forget; its MediaStream should not be > blocked, so we should just be throwing data away. I should have said before that it is only the DestinationNode stream that is overbuffering, which makes sense to me.

Paul Adenot (:padenot)

Assignee

Comment 107

•

12 years ago

Pushed with a little comment. https://hg.mozilla.org/integration/mozilla-inbound/rev/fdfd0621d658

Paul Adenot (:padenot)

Assignee

Comment 108

•

12 years ago

Followup, because trying to mute an OfflineAudioContext is not going to be super useful: https://tbpl.mozilla.org/php/getParsedLog.php?id=32156755&tree=Mozilla-Inbound#error0 https://hg.mozilla.org/integration/mozilla-inbound/rev/eaa48b9c7094

:Benjamin Peterson

Updated

•

12 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Ryan VanderMeulen [:RyanVM]

Comment 109

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/fdfd0621d658 https://hg.mozilla.org/mozilla-central/rev/eaa48b9c7094

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla29

Nicholas Nethercote [inactive]

Comment 110

•

12 years ago

Since the fix just landed on mozilla-central, it should be in the next Nightly build. Russ can you please try a Nightly build (from http://nightly.mozilla.org/) tomorrow, i.e. Thursday your time, to see if this fixes your problem? Thanks!

Flags: needinfo?(russgthomas)

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

12 years ago

status-firefox26: --- → affected

status-firefox27: --- → affected

status-firefox28: --- → affected

status-firefox29: --- → fixed

tracking-firefox27: ? → +

tracking-firefox28: ? → +

Codacoder

Reporter

Comment 111

•

12 years ago

FF29.0a1 (2013-12-19) Same STR, PASS (tada!) Please PLEASE have the release team review this for back-fitting to FF24 and (since it is such an app-killer here) FF25. Yes, Nick has explained the rules but this bug in FF25 DESTROYS valuable user work and user data (in my world that's CRITICAL). TFL Big thanks to Nick Nethercote for his diligence on this, "without whom" and all that.

Flags: needinfo?(russgthomas)

Andrew McCreight [:mccr8]

Comment 112

•

12 years ago

I'm going to upgrade this to MemShrink:P2 by fiat as it turned out to be worse than we initially thought.

Whiteboard: [MemShrink:P3] → [MemShrink:P2]

Emanuel Hoogeveen [:ehoogeveen]

Comment 113

•

12 years ago

(In reply to Russ from comment #111) > Please PLEASE have the release team review this for back-fitting to FF24 and > (since it is such an app-killer here) FF25. Yes, Nick has explained the > rules but this bug in FF25 DESTROYS valuable user work and user data (in my > world that's CRITICAL). Is Firefox 24 even affected by this bug? From what you said in comment #14 it didn't seem to be (though that was before we knew the cause).

Codacoder

Reporter

Comment 114

•

12 years ago

@Emanuel: You're right. However, since it *is* an ESR and will receive ongoing updates, there is a chance it could be affected down the line (I thought). Perhaps the chance of that happening is "vanishingly small" ? I wouldn't know...

Robert Kaiser

Comment 115

•

12 years ago

(In reply to Russ from comment #111) > Please PLEASE have the release team review this for back-fitting to FF24 and > (since it is such an app-killer here) FF25. 25 is not being maintained any more and everyone is being updated to 26. If people do not update, that's their fault and problem, not ours. And 24 ESR is only being maintained for security issues, no other changes will go there (if someone with a large deployment requests it in the enterprise list, it may be discussed but otherwise exclusively security fixes will make it into ESR, ever).

Robert Kaiser

Comment 116

•

12 years ago

That said, this specific case has been introduced with WebAudio in Firefox 25 and higher only, so 24 cannot be patched because it doesn't even have the feature that exposes the problem. In other words, ESR 24 is already fixed because it never had the feature and the bug.

(not currently active) Ted Mielczarek

Updated

•

12 years ago

status-firefox-esr24: --- → unaffected

u279076

Comment 117

•

12 years ago

Marking this verified based on comment 111.

Status: RESOLVED → VERIFIED

status-firefox29: fixed → verified

Nicholas Nethercote [inactive]

Comment 118

•

12 years ago

Thanks for verifying the fix, Russ. Just to summarize the version situation. - FF24/ESR24: unaffected. - FF25: very bad leak. But this version is no longer current and thus won't receive any fixes. - FF26: smaller leak. This is the current release, but the bug isn't bad enough for a "chemspill" release (reserved for *major* security flaws and the like), so it won't receive the fix. - FF27: smaller leak. This is the current Beta, and the fix will be backported (as per the "tracking-firefox27+ flag above). - FF28: smaller leak. This is the current Aurora, and the fix will be backported (as per the "tracking-firefox28+ flag above). - FF29: smaller leak. This is the current Nightly, and has already been fixed.

Nicholas Nethercote [inactive]

Comment 119

•

12 years ago

padenot, can you please upload a combined patch and nominate it for Aurora and Beta uplift? Thanks.

Flags: needinfo?(paul)

Karl Tomlinson (:karlt)

Updated

•

12 years ago

Depends on: 956201

Nicholas Nethercote [inactive]

Comment 121

•

12 years ago

(In reply to Nicholas Nethercote [:njn] from comment #119) > padenot, can you please upload a combined patch and nominate it for Aurora > and Beta uplift? Thanks. Ping? We really want this backported to FF27 and FF28...

Karl Tomlinson (:karlt)

Comment 122

•

12 years ago

Bug 956201 will need to be fixed first.

Karl Tomlinson (:karlt)

Updated

•

12 years ago

Blocks: 927044

Paul Adenot (:padenot)

Assignee

Comment 123

•

12 years ago

Attached patch Folded patch for uplift — Details — Splinter Review

This is the folded patch for uplift.

Nicholas Nethercote [inactive]

Comment 124

•

12 years ago

Comment on attachment 8357160 [details] [diff] [review] Folded patch for uplift [Approval Request Comment] Bug caused by (feature/regressing bug #): web audio. User impact if declined: bad memory leaks on some web pages -- memory consumption can grow without bound even with no page interaction. Testing completed (on m-c, etc.): on m-c for a couple of weeks. And the original reporter confirmed that it fixes his problem. Risk to taking this patch (and alternatives if risky): low; the patch is tiny. String or IDL/UUID changes made by this patch: none.

Attachment #8357160 - Flags: approval-mozilla-beta?

Attachment #8357160 - Flags: approval-mozilla-aurora?

Flags: needinfo?(paul)

bhavana bajaj [:bajaj]

Updated

•

12 years ago

Attachment #8357160 - Flags: approval-mozilla-beta?

Attachment #8357160 - Flags: approval-mozilla-beta+

Attachment #8357160 - Flags: approval-mozilla-aurora?

Attachment #8357160 - Flags: approval-mozilla-aurora+

Paul Adenot (:padenot)

Assignee

Comment 125

•

12 years ago

Comment on attachment 8357160 [details] [diff] [review] Folded patch for uplift NOTE: This flag is now for security issues only. Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings. [Approval Request Comment] Bug caused by (feature/regressing bug #): web audio User impact if declined: big memory leak, and then crash for oom Testing completed: m-c, m-a, original reporter said it fixed the issue Risk to taking this patch (and alternatives if risky): little risk, cause and fix are well understood String or UUID changes made by this patch: none

Attachment #8357160 - Flags: approval-mozilla-b2g26?

Andrew McCreight [:mccr8]

Comment 126

•

12 years ago

As the note says, I think you are supposed to use the blocking-b2g flag to get permission to land for non-security issues.

blocking-b2g: --- → koi?

Andrew McCreight [:mccr8]

Comment 127

•

12 years ago

koi? seems to correspond to b2g26.

Paul Adenot (:padenot)

Assignee

Updated

•

12 years ago

Attachment #8357160 - Flags: approval-mozilla-b2g26?

Ryan VanderMeulen [:RyanVM]

Comment 128

•

12 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/969a4a48c64e https://hg.mozilla.org/releases/mozilla-beta/rev/3f97da3df622

status-b2g-v1.2: --- → affected

status-b2g-v1.3: --- → fixed

status-firefox26: affected → wontfix

status-firefox27: affected → fixed

status-firefox28: affected → fixed

Jason Smith [:jsmith]

Updated

•

12 years ago

blocking-b2g: koi? → koi+

Ryan VanderMeulen [:RyanVM]

Comment 129

•

12 years ago

https://hg.mozilla.org/releases/mozilla-b2g26_v1_2/rev/5c47ebb33bdd

status-b2g-v1.2: affected → fixed

Codacoder

Reporter

Comment 130

•

12 years ago

FWIW, following my original STR... FF27 not fixed (current beta) FF28 fixed FF29 fixed

Robert Kaiser

Comment 131

•

12 years ago

(In reply to Russ from comment #130) > FF27 not fixed (current beta) Yes, this will only be fixed in the upcoming beta (internally called "27.0b5") that was built late yesterday.

Codacoder

Reporter

Comment 132

•

12 years ago

Thanks Robert. Just received the update. FWIW, following my original STR... FF27 fixed FF28 fixed FF29 fixed

Nicholas Nethercote [inactive]

Comment 133

•

12 years ago

Hooray!

Robert Kaiser

Comment 134

•

12 years ago

Marking verified based on Russ' comments - thanks for testing (and reporting in the first place)!

status-firefox27: fixed → verified

status-firefox28: fixed → verified

memory-report-1.json.gz 12 years ago Codacoder 31.82 KB, application/gzip	codacodercodacoder : feedback+	Details
memory-report-2.json.gz 12 years ago Codacoder 32.64 KB, application/gzip	codacodercodacoder : feedback+	Details
out.dmd 12 years ago Codacoder 4.64 MB, text/plain		Details
console.txt 12 years ago Codacoder 35.08 KB, text/plain		Details
out.dmd.2 12 years ago Codacoder 4.64 MB, text/plain		Details
out.dmd.3 12 years ago Codacoder 4.64 MB, text/plain		Details
out.zip 12 years ago Codacoder 905.16 KB, application/x-zip-compressed		Details
Partial DMD output that suggests audio is the issue 12 years ago Nicholas Nethercote [inactive] 17.05 KB, text/plain		Details
out2.zip 12 years ago Codacoder 902.92 KB, application/x-zip-compressed		Details
Don't overbuffer on AudioContext shutdown. r= 12 years ago Paul Adenot (:padenot) 894 bytes, patch	roc : review+	Details \| Diff \| Splinter Review
Folded patch for uplift 12 years ago Paul Adenot (:padenot) 1.19 KB, patch	bajaj : approval-mozilla-aurora+ bajaj : approval-mozilla-beta+	Details \| Diff \| Splinter Review