Closed Bug 845842 Opened 12 years ago Closed 12 years ago

Firefox Health Report is exhausting JS stack [too much recursion]

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect, P1)

Product:

Component:

Platform:

x86

macOS

Type:

defect

Priority:

P1

Severity:

major

Tracking

(firefox20 unaffected, firefox21+ fixed)

Status:

RESOLVED FIXED

Milestone:

Firefox 22

Tracking Flags:

Tracking

Status

firefox20

---

unaffected

firefox21

+

fixed

People

(Reporter: Irving, Assigned: gps)

References

Details

(Whiteboard: [fhr])

Attachments

(3 files, 1 obsolete file)

Shutdown log with gps's patch 12 years ago :Irving Reid (No longer working on Firefox) 88.75 KB, text/plain		Details
Safe mode shutdown hang log 12 years ago :Irving Reid (No longer working on Firefox) 93.80 KB, text/plain		Details
Insert next tick promises, v1 12 years ago Gregory Szorc [:gps] 5.40 KB, patch		Details \| Diff \| Splinter Review
Don't resolve promises on the current tick, v1 12 years ago Gregory Szorc [:gps] 11.72 KB, patch	rnewman : review+ bajaj : approval-mozilla-aurora+	Details \| Diff \| Splinter Review

:Irving Reid (No longer working on Firefox)

Reporter

Description

•

12 years ago

Shutting down my debug/trunk build of Firefox is hanging most, if not all, of the time for the last couple of days. I caught it in the debugger. I'll try to hold onto the debugger session in case we can extract additional useful info... Unfortunately the stack is hard to decode because of #includes in the HealthReport.jsm module; here's the raw stack: 0 waitForSyncCallback() ["resource://gre/modules/HealthReport.jsm":121] this = [object Object] 1 anonymous() ["resource://gre/modules/HealthReport.jsm":163] this = function callback(error, ret) { "use strict"; if (error) cb.throw(error); cb(ret); } 2 anonymous() ["resource://gre/modules/HealthReport.jsm":3789] this = [object Object] 3 anonymous() ["resource://gre/modules/HealthReport.jsm":3671] this = [object Object] and here's what I think it corresponds to in the source files: // Keep waiting until our callback is triggered (unless the app is quitting). while (Async.checkAppReady() && callback.state == CB_READY) { thread.processNextEvent(true); } at http://dxr.mozilla.org/mozilla-central/services/common/async.js#l98 called from http://dxr.mozilla.org/mozilla-central/services/common/async.js#l141 http://dxr.mozilla.org/mozilla-central/services/healthreport/healthreporter.jsm#l336 http://dxr.mozilla.org/mozilla-central/services/healthreport/healthreporter.jsm#l218

Richard Newman [:rnewman]

Updated

•

12 years ago

Component: General → Metrics and Firefox Health Report

Product: Firefox → Mozilla Services

Version: Trunk → unspecified

:Irving Reid (No longer working on Firefox)

Reporter

Comment 1

•

12 years ago

To clarify, it's not stuck in thread.processNextEvent() - it's receiving periodic callbacks from MozStorageConnection (and possibly others), handling them and then looping back to wait in processNextEvent() again.

Gregory Szorc [:gps]

Assignee

Comment 2

•

12 years ago

Fun times. First, some background. FHR creates a nested event loop on shutdown if and only if it has pending I/O operations that haven't completed. This *should* be very rare. It's very interesting to me that you are apparently able to reproduce this reliably. We know nested event loops are evil. However, until Gecko provides a way to delay shutdown until previously-initiated async events have completed, our hands are tied. See my comments in bug 722648, especially comment #27. Anyway... Are you seeing full "beach ball" hangs or browser-still-appears-to-be-somewhat-operational-but-never-shuts-down hangs? Please apply the patch at http://gps.pastebin.mozilla.org/2181016. Let me know if you need it refreshed against your tree. I'd love to see all of FHR's terminal output, if possible. Verbose logging from FHR should reveal the culprit here. I wonder if we regressed something in bug 843816...

Flags: needinfo?(irving)

:Irving Reid (No longer working on Firefox)

Reporter

Comment 3

•

12 years ago

This is a debug build, run from a terminal. Main window is closed, activity monitor says "not responding", but the process is periodically doing things; now and again it prints WARNING: 1 sort operation has occurred for the SQL statement '0x110058010'. See https://developer.mozilla.org/En/Storage/Warnings details.: file /Users/ireid/tbird/mozilla-central/storage/src/mozStoragePrivateHelpers.cpp, line 110 to stdout/stderr. I'll try running with gps's patch.

Flags: needinfo?(irving)

:Irving Reid (No longer working on Firefox)

Reporter

Comment 4

•

12 years ago

Attached file Shutdown log with gps's patch — Details

This has a bunch of additional noise from add-ons and also caught the add-on manager trying to do an up-to-date check after FF started shutting down.

:Irving Reid (No longer working on Firefox)

Reporter

Comment 5

•

12 years ago

Attached file Safe mode shutdown hang log — Details

Reproduce the hang in safe mode.

Gregory Szorc [:gps]

Assignee

Comment 6

•

12 years ago

Comment on attachment 719089 [details] Safe mode shutdown hang log \o/ \o/ \o/ I think we found bug 842360! 1361986876753 Services.Metrics.Collector WARN Provider failed to initialize: org.mozilla.appInfo: too much recursion JS Stack trace: Logger.prototype.level@log4moz.js:155 < Logger.prototype.level@log4moz.js:156 < Logger_log@log4moz.js:216 < Logger_debug@log4moz.js:251 < configureFields@HealthReport.jsm:895 < TaskImpl_run@Task.jsm:192 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < TaskImpl@Task.jsm:163 < Task_spawn@Task.jsm:135 < Measurement.prototype<._configureStorage@HealthReport.jsm:900 < init@HealthReport.jsm:1261 < TaskImpl_run@Task.jsm:192 < effort@promise.js:55 < resolved@promise.js:117 < then@promise.js:37 < then@promise.js:123 < TaskImpl_run@Task.jsm:207 < TaskImpl@Task.jsm:163 < Task_spawn@Task.jsm:135 < Provider.prototype<.init@HealthReport.jsm:1249 < initProvider@HealthReport.jsm:587 < TaskImpl_run@Task.jsm:192 < TaskImpl@Task.jsm:163 < Task_spawn@Task.jsm:135 < Collector.prototype<._popAndInitProvider@HealthReport.jsm:606 < Collector.prototype<.registerProvider@HealthReport.jsm:559 < AbstractHealthReporter.prototype<.registerProvider@HealthReport.jsm:3860 < registerPullProviders@HealthReport.jsm:3979 < TaskImpl_run@Task.jsm:192 < TaskImpl@Task.jsm:163 < Task_spawn@Task.jsm:135 < AbstractHealthReporter.prototype<.ensurePullOnlyProvidersRegistered@HealthReport.jsm:3985 < doUpload@HealthReport.jsm:4500 < TaskImpl_run@Task.jsm:192 < TaskImpl@Task.jsm:163 < Task_spawn@Task.jsm:135 < HealthReporter.prototype<.requestDataUpload@HealthReport.jsm:4511 < DataReportingService.prototype<.onRequestDataUpload@DataReportingService.js:84 < _dispatchSubmissionRequest@DataReportingService.js:1392 < checkStateAndTrigger@DataReportingService.js:1266 < notify@DataReportingService.js:1160 And it appears more of the same causes the shutdown hang: 1361988168762 Services.HealthReport.HealthReporter WARN Collector is in progress of initializing. Waiting to finish. This log will be responsible for fixing the major FHR issue plaguing us. I owe you a drink or something. mconnor can see that happen :)

Gregory Szorc [:gps]

Assignee

Comment 7

•

12 years ago

Oh, this log is a trove of bugs: 1361986826492 Services.HealthReport.HealthReporter INFO Request to shut down. 1361986826492 Services.HealthReport.HealthReporter WARN Collector is in progress of initializing. Waiting to finish. 1361986876724 Services.DataReporting.Policy INFO Requesting data submission. Will expire at Wed Feb 27 2013 12:51:16 GMT-0500 (EST) 1361986876728 Services.Metrics.Collector INFO Initializing provider with storage: org.mozilla.appInfo

(dormant account)

Comment 8

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #2) > Fun times. > > First, some background. FHR creates a nested event loop on shutdown if and > only if it has pending I/O operations that haven't completed. This *should* > be very rare. It's very interesting to me that you are apparently able to > reproduce this reliably. > > We know nested event loops are evil. However, until Gecko provides a way to > delay shutdown until previously-initiated async events have completed, our > hands are tied. See my comments in bug 722648, especially comment #27. No. 'Your hands are tied' by insisting on doing things in JS. Any shutdown system that runs code that freezes is going to result in this kind of behavior. How many bad corner cases does it take to stop collecting data in risky fashion?

Gregory Szorc [:gps]

Assignee

Comment 9

•

12 years ago

(In reply to Taras Glek (:taras) from comment #8) > (In reply to Gregory Szorc [:gps] from comment #2) > > Fun times. > > > > First, some background. FHR creates a nested event loop on shutdown if and > > only if it has pending I/O operations that haven't completed. This *should* > > be very rare. It's very interesting to me that you are apparently able to > > reproduce this reliably. > > > > We know nested event loops are evil. However, until Gecko provides a way to > > delay shutdown until previously-initiated async events have completed, our > > hands are tied. See my comments in bug 722648, especially comment #27. > > No. 'Your hands are tied' by insisting on doing things in JS. Any shutdown > system that runs code that freezes is going to result in this kind of > behavior. It sounds like you are saying "don't write browser/Gecko features in JS." Quite frankly, I find that a ludicrous opinion and one that doesn't match the reality we live in nor the inevitability of the future.

Gregory Szorc [:gps]

Assignee

Comment 10

•

12 years ago

Oh, this log is a trove of bugs: 1361986826492 Services.HealthReport.HealthReporter INFO Request to shut down. 1361986826492 Services.HealthReport.HealthReporter WARN Collector is in progress of initializing. Waiting to finish. 1361986876724 Services.DataReporting.Policy INFO Requesting data submission. Will expire at Wed Feb 27 2013 12:51:16 GMT-0500 (EST) 1361986876728 Services.Metrics.Collector INFO Initializing provider with storage: org.mozilla.appInfo Filed bug 845935.

Gregory Szorc [:gps]

Assignee

Comment 11

•

12 years ago

OK, so the logging looks like: 1361986826461 Services.Metrics.Collector INFO Initializing provider with storage: org.mozilla.searches 1361986826464 Services.Metrics.Measurement.counts DEBUG Registering field: amazon.com.abouthome daily-counter 1361986826466 Services.Metrics.Measurement.counts DEBUG Registering field: amazon.com.contextmenu daily-counter 1361986826467 Services.Metrics.Measurement.counts DEBUG Registering field: amazon.com.searchbar daily-counter 1361986826468 Services.Metrics.Measurement.counts DEBUG Registering field: amazon.com.urlbar daily-counter 1361986826470 Services.Metrics.Measurement.counts DEBUG Registering field: bing.abouthome daily-counter 1361986826471 Services.Metrics.Measurement.counts DEBUG Registering field: bing.contextmenu daily-counter 1361986826472 Services.Metrics.Measurement.counts DEBUG Registering field: bing.searchbar daily-counter 1361986826474 Services.Metrics.Measurement.counts DEBUG Registering field: bing.urlbar daily-counter 1361986826475 Services.Metrics.Measurement.counts DEBUG Registering field: google.abouthome daily-counter 1361986826482 Services.Metrics.Collector WARN Provider failed to initialize: org.mozilla.searches: too much recursion JS Stack trace... 1361986826484 Services.HealthReport.HealthReporter INFO Attempting to load provider from category manager: SessionsProvider from resource://gre/modules/HealthReport.jsm 1361986826485 Services.HealthReport.HealthReporter INFO Provider is pull-only. Deferring initialization: org.mozilla.appSessions 13 The FHR code that calls into log4moz is HealthReport.jsm:895: _configureStorage: function () { return Task.spawn(function configureFields() { for (let [name, info] in Iterator(this.fields)) { 895 this._log.debug("Registering field: " + name + " " + info.type); let id = yield this.storage.registerField(this.id, name, info.type); this._fields[name] = [id, info.type]; } }.bind(this)); }, The top frame on the stack is log4moz.js:155: get level() { if (this._level != null) return this._level; 155 if (this.parent) return this.parent.level; dump("log4moz warning: root logger configuration error: no level defined\n"); return Log4Moz.Level.All; }, this.parent is a simple getter: get parent() this._parent, The previous calls to that line are working. And, I don't see any obvious recursion in the stack trace. And the top frame is pretty simple. So, it certainly looks like we are hitting the JS recursion limit! Huh? I was told the stack limit for chrome JS is like 20k or something ridiculous. I assumed this meant frames. Maybe "recursion limit" is different from "stack depth?" CC'ing JS engine brain trust.

Summary: Shutdown hang in FHR → Shutdown hang in FHR [too much recursion JS Stack]

Gregory Szorc [:gps]

Assignee

Comment 12

•

12 years ago

I fear this bug is related to Task.jsm usage.

Gregory Szorc [:gps]

Assignee

Comment 13

•

12 years ago

Why don't I see this in my personal builds? Why isn't the world blowing up in test infra? Irving: Do you have anything special about your build? What platform and arch are you on?

Gregory Szorc [:gps]

Assignee

Comment 14

•

12 years ago

1361986876753 Services.Metrics.Collector WARN Provider failed to initialize: org.mozilla.appInfo: too much recursion And that explains bug 842360! I never would have been able to find this without the log. Where do we stand? I need input from SpiderMonkey/Platform people on what can be done to avoid the recursion limit. I can move forward on bug 845431 to record provider init errors in the payload. This should give us an idea of how widespread this is.

David Teller [:Yoric] - still alive but not very active

Comment 15

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #12) > I fear this bug is related to Task.jsm usage. Both promise/core.js and Task.jsm consume lots of stack. I have experimented with an extension of Task.jsm that calls nsITimer every so often to yield time back to the VM and to empty the stack. I probably still have it somewhere, if you are interested. Not tested, though.

Gregory Szorc [:gps]

Assignee

Comment 16

•

12 years ago

Do we have Telemetry recording how often we see stack exception events? That would be really helpful to diagnose if landing FHR caused an uptick...

:Irving Reid (No longer working on Firefox)

Reporter

Comment 17

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #13) > Why don't I see this in my personal builds? Why isn't the world blowing up > in test infra? > > Irving: Do you have anything special about your build? What platform and > arch are you on? Mac OS X 10.7, Apple's stock clang, DEBUG, profiling: ac_add_options --enable-application=browser ac_add_options --enable-debug ac_add_options --disable-optimize ac_add_options --enable-chrome-format=symlink ac_add_options --enable-tests ac_add_options --enable-profiling export MOZ_DEBUG_SYMBOLS=1 CC=clang CXX=clang++ Irvings-MacBook-Pro% clang --version Apple LLVM version 4.2 (clang-425.0.24) (based on LLVM 3.2svn) Target: x86_64-apple-darwin11.4.2 Thread model: posix Not sure if it makes a difference, but the page open on launch is about:telemetry (actually, it might - that could force things to initialize in a different order)

Gregory Szorc [:gps]

Assignee

Comment 18

•

12 years ago

jimb has been helping in #jsapi. Thanks, Jim! Anyway, here is a pretty-printed stack: 1361986826482Services.Metrics.CollectorWARNProvider failed to initialize: org.mozilla.searches: too much recursion JS Stack trace: Logger.prototype.level@log4moz.js:155 Logger_log@log4moz.js:216 Logger_debug@log4moz.js:251 configureFields@HealthReport.jsm:895 TaskImpl_run@Task.jsm:192 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 then@promise.js:123 TaskImpl_run@Task.jsm:207 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 then@promise.js:123 TaskImpl_run@Task.jsm:207 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 then@promise.js:123 TaskImpl_run@Task.jsm:207 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 then@promise.js:123 TaskImpl_run@Task.jsm:207 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 then@promise.js:123 TaskImpl_run@Task.jsm:207 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 then@promise.js:123 TaskImpl_run@Task.jsm:207 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 then@promise.js:123 TaskImpl_run@Task.jsm:207 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 then@promise.js:123 TaskImpl_run@Task.jsm:207 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 then@promise.js:123 TaskImpl_run@Task.jsm:207 TaskImpl@Task.jsm:163 Task_spawn@Task.jsm:135 Measurement.prototype._configureStorage@HealthReport.jsm:900 init@HealthReport.jsm:1261 TaskImpl_run@Task.jsm:192 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 then@promise.js:123 TaskImpl_run@Task.jsm:207 TaskImpl@Task.jsm:163 Task_spawn@Task.jsm:135 Provider.prototype.init@HealthReport.jsm:1249 initProvider@HealthReport.jsm:587 TaskImpl_run@Task.jsm:192 TaskImpl@Task.jsm:163 Task_spawn@Task.jsm:135 Collector.prototype._popAndInitProvider@HealthReport.jsm:606 Collector.prototype.registerProvider@HealthReport.jsm:559 AbstractHealthReporter.prototype.registerProvider@HealthReport.jsm:3860 AbstractHealthReporter.prototype.registerProviderFromType@HealthReport.jsm:3884 AbstractHealthReporter.prototype.registerProvidersFromCategoryManager@HealthReport.jsm:3935 AbstractHealthReporter.prototype._initializeCollector@HealthReport.jsm:3636 TaskImpl_run@Task.jsm:192 TaskImpl@Task.jsm:163 Task_spawn@Task.jsm:135 AbstractHealthReporter.prototype._onStorageCreated@HealthReport.jsm:3619 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 resolve@promise.js:143 TaskImpl_run@Task.jsm:217 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 resolve@promise.js:143 TaskImpl_run@Task.jsm:220 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 resolve@promise.js:143 onResult@Sqlite.jsm:494 effort@promise.js:55 resolved@promise.js:117 then@promise.js:37 resolve@promise.js:143 OpenedConnection.prototype._executeStatement/pending.handleCompletion@Sqlite.jsm:772 handleCompletion at the bottom of the stack seems legit. This is what gets called when Storage has finished a SQL statement. If it weren't the first frame, I think that would mean we weren't using async SQLite! Here are the non-promise/Task frames: Logger.prototype.level@log4moz.js:155 Logger_log@log4moz.js:216 Logger_debug@log4moz.js:251 configureFields@HealthReport.jsm:895 Measurement.prototype._configureStorage@HealthReport.jsm:900 Provider.prototype.init@HealthReport.jsm:1249 initProvider@HealthReport.jsm:587 Collector.prototype._popAndInitProvider@HealthReport.jsm:606 Collector.prototype.registerProvider@HealthReport.jsm:559 AbstractHealthReporter.prototype.registerProvider@HealthReport.jsm:3860 AbstractHealthReporter.prototype.registerProviderFromType@HealthReport.jsm:3884 AbstractHealthReporter.prototype.registerProvidersFromCategoryManager@HealthReport.jsm:3935 AbstractHealthReporter.prototype._initializeCollector@HealthReport.jsm:3636 AbstractHealthReporter.prototype._onStorageCreated@HealthReport.jsm:3619 onResult@Sqlite.jsm:494 OpenedConnection.prototype._executeStatement/pending.handleCompletion@Sqlite.jsm:772 This seems legit from FHR's perspective. Storage finishes initializing then we move on to instantiate the collector (a manager of providers) and then actually instantiating a provider instance. What does seem weird is the massive number of promise and task.jsm frames between configureFields@HealthReport.jsm:895 and Measurement.prototype._configureStorage@HealthReport.jsm:900. I hypothesize these are left over from the generator function registering individual fields? Does this mean that every time we yield a promise from a Task.jsm that the stack will grow?

Gregory Szorc [:gps]

Assignee

Comment 19

•

12 years ago

Morphing this to be about the stack size issue. I wouldn't be surprised if other parts of FHR were susceptible to this. Filed bug 845966 to track the issue of shutdown hang due to infinite timeout.

Summary: Shutdown hang in FHR [too much recursion JS Stack] → Firefox Health Report is exhausting JS stack [too much recursion]

Luke Wagner [:luke]

Comment 20

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #11) > I was told the stack limit for chrome JS is like 20k or something > ridiculous. I assumed this meant frames. Maybe "recursion limit" is > different from "stack depth?" We can recurse pretty deep on the JS stack (20K sounds about right), but not if the recursion involves reentering the engine in C++ (which quickly eats up system stack space). I'm guessing that Task.jsm uses generators? If so that would explain it: generator calls recursively call the C++ interpreter. At some point, especially if we ever want to make generators fast at all, we'll want to avoid reentering C++ and make generator calls more like normal calls. That is a good amount of work, though, and not a priority atm so I wouldn't block this bug on it. To fix the problem, try to limit the maximum generator recursion depth.

Gregory Szorc [:gps]

Assignee

Comment 21

•

12 years ago

(In reply to Luke Wagner [:luke] from comment #20) > (In reply to Gregory Szorc [:gps] from comment #11) > > I was told the stack limit for chrome JS is like 20k or something > > ridiculous. I assumed this meant frames. Maybe "recursion limit" is > > different from "stack depth?" > > We can recurse pretty deep on the JS stack (20K sounds about right), but not > if the recursion involves reentering the engine in C++ (which quickly eats > up system stack space). I'm guessing that Task.jsm uses generators? If so > that would explain it: generator calls recursively call the C++ interpreter. Yes, Task.jsm uses generators. So, well, crap. So this sounds like "use of generators calling into C++ considered harmful [until the JS engine can be improved]" to me. This translates to "use Task.jsm with extreme care." During the course of writing the code for FHR, we did discuss the issue of generators and stacks and the worry it would explode the stack (I can't find the exact comments right now but I'm sure rnewman or somebody will prove they exist by posting links). I talked with a few people in #jsapi about the general issue of generators in the current engine. I was told that for performance reasons we should avoid generators in high volume or performance sensitive loops. We did. I was told the stack limit was really high. However, the caveat that calling into C++ would explode stack use was not communicated AFAIK. Theorizing that we would never reach more than a few hundred frames - nowhere close to the stack limit - we went ahead and liberally employed Task.jsm because in our opinion the resulting procedural looking code and corresponding better code readability (due to no callback spaghetti) and lower long-term maintenance costs outweighed the only known drawback - slightly worse function call performance (not relevant for a background feature). Look at https://hg.mozilla.org/mozilla-central/rev/5060a8ce0be4 for an example of before and after and tell me what you would rather maintain! Anyway, that's how we got here. > At some point, especially if we ever want to make generators fast at all, > we'll want to avoid reentering C++ and make generator calls more like normal > calls. That is a good amount of work, though, and not a priority atm so I > wouldn't block this bug on it. To fix the problem, try to limit the maximum > generator recursion depth. Whatever you say! I also heard mumblings that baseline might be more intelligent about things as well. I will gladly lend my voice behind any support to make this better: I think the code maintainability improvements from using Task.jsm are worth investing some time in improving the engine. But that's just my mostly ignorance perspective :) Now, next steps for this bug. I see the following options: 1) Change FHR code to use less Task.jsm. 2) Change Gecko to not be so wasteful with generators. 3) Increase the stack limit. #2 is out of the question for the short term, I think. What's the feasibility of #3? This is for chrome-only JS. Could the change be backported to Aurora/21? Who would need to make this call?

Luke Wagner [:luke]

Comment 22

•

12 years ago

comment 0 mentions a debug build; is this problem *only* observed on debug builds? Last time I looked, js::Interpret's stack frame was much much larger on debug builds.

Gregory Szorc [:gps]

Assignee

Comment 23

•

12 years ago

(In reply to Luke Wagner [:luke] from comment #22) > comment 0 mentions a debug build; is this problem *only* observed on debug > builds? Last time I looked, js::Interpret's stack frame was much much > larger on debug builds. We also have bug 842360 on file. I've poured over the code and the only time appInfo should be missing is if it fails to initialize or collect. The log in this bug is the only lead I have as to what's causing the issue. Bug 845431 could confirm. Perhaps I should prioritize landing that...

Gregory Szorc [:gps]

Assignee

Comment 24

•

12 years ago

Check this out: _configureStorage: function () { return Task.spawn(function configureFields() { let error = new Error(); let stack = error.stack; for (let [name, info] in Iterator(this.fields)) { this._log.debug("Registering field: " + name + " " + info.type); let id = yield this.storage.registerField(this.id, name, info.type); this._fields[name] = [id, info.type]; let error = new Error(); let stackDiff = error.stack.substr(0, error.stack.length - stack.length); stack = error.stack; this._log.debug("Stack diff: " + stackDiff); } }.bind(this)); }, 1362008148420 Services.Metrics.Measurement.active DEBUG Registering field: addons last-text 1362008148420 Services.Metrics.Measurement.active DEBUG Stack diff: configureFields@resource://gre/modules/HealthReport.jsm:911 TaskImpl_run@resource://gre/modules/Task.jsm:192 effort@resource://gre/modules/commonjs/sdk/core/promise.js:55 resolved@resource://gre/modules/commonjs/sdk/core/promise.js:117 then@resource://gre/modules/commonjs/sdk/core/promise.js:37 1362008148420 Services.Metrics.Measurement.counts DEBUG Registering field: theme daily-last-numeric 1362008148420 Services.Metrics.Measurement.counts DEBUG Stack diff: configureFields@resource://gre/modules/HealthReport.jsm:911 TaskImpl_run@resource://gre/modules/Task.jsm:192 effort@resource://gre/modules/commonjs/sdk/core/promise.js:55 resolved@resource://gre/modules/commonjs/sdk/core/promise.js:117 then@resource://gre/modules/commonjs/sdk/core/promise.js:37 If you look at the string lengths, it comes out to about 300 bytes per iteration. If you look at registerField (https://hg.mozilla.org/mozilla-central/file/0a91da5f5eab/services/metrics/storage.jsm#l1023) and know that I was running this on an existing profile, you figure the "field already exists" path was taken and only 1 promise was involved. I'm not sure if that is the promise showing up in the stack or what. The important takeaway is the stack grows with each iteration of the generator loop. Is this expected? Or is my code or Task.jsm implemented in such a way that it is "leaking" a frame?

Gregory Szorc [:gps]

Assignee

Comment 25

•

12 years ago

inbound now contains patches to submit more errors in payloads and to be more robust around detecting initialization errors. The shutdown hang should be gone. Once people start picking up tomorrow's Nightly, we should be able to gauge how many people are seeing stack exhaustion as part of submission. This may answer the mystery behind bug 842360! We still need to figure out the underlying problem of stack exhaustion.

Gregory Szorc [:gps]

Assignee

Comment 26

•

12 years ago

Looking at the stack frames from comment #24, the appending chain of promises on the stack do make some sense. As I mentioned in that comment, since my profile already had these fields registered from a previous run, no DB operations needed to occur and thus the returned promise was resolved immediately, in the same tick. This allowed the generator function to continue in the same tick. No C++ was involved. Irving's log exhibit the same behavior. No SQL operations were performed. Thus, no C++ was involved. This troubles me. From what the JS people have been saying, it's the C++ that chews up the stack. But here, we are accumulating frames purely within JS. Yet we still trigger stack exhaustion. Huh? Surely this rabbit hole goes deeper. I'd really like to instrument the code to record the current stack size. What's the easiest way to do that? While I'm here, I have an interesting idea: change the implementation of Promise.resolve() and Promise.reject() to schedule the promise fulfillment on the next tick. I /think/ this will help keep stack size small since it will effectively chop off frames by moving on to a new tick. Or, do generators negate that optimization? Also, I'm not sure if excessive "reticking" is frowned upon. Another benefit of the change would be that it would break consumers using promises incorrectly - forcing them to be async instead of possibly relying on synchronous fulfillment. I don't think we should change for that reason alone, but I won't complain if we get better conformance testing for free.

David Teller [:Yoric] - still alive but not very active

Comment 27

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #26) > While I'm here, I have an interesting idea: change the implementation of > Promise.resolve() and Promise.reject() to schedule the promise fulfillment > on the next tick. I /think/ this will help keep stack size small since it > will effectively chop off frames by moving on to a new tick. Or, do > generators negate that optimization? Also, I'm not sure if excessive > "reticking" is frowned upon. Another benefit of the change would be that it > would break consumers using promises incorrectly - forcing them to be async > instead of possibly relying on synchronous fulfillment. I don't think we > should change for that reason alone, but I won't complain if we get better > conformance testing for free. Initially, I wanted this very much (as well as error-reporting for promises). Unfortunately, this was rejected by the original author of promise/core.js, according to some Jetpack-specific criteria that I don't remember right now. It might still be time to change this policy.

Jim Blandy :jimb

Comment 29

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #24) > Check this out: > > _configureStorage: function () { > return Task.spawn(function configureFields() { > let error = new Error(); > let stack = error.stack; > > for (let [name, info] in Iterator(this.fields)) { > this._log.debug("Registering field: " + name + " " + info.type); > > let id = yield this.storage.registerField(this.id, name, info.type); > this._fields[name] = [id, info.type]; > > let error = new Error(); > let stackDiff = error.stack.substr(0, error.stack.length - > stack.length); > stack = error.stack; > this._log.debug("Stack diff: " + stackDiff); > } > }.bind(this)); > }, This code, itself, will all run in a single JS stack frame. It's the caller of this generator --- the thing receiving the yielded values, and passing the id to the generator's 'next' method --- that's accumulating stack frames.

Gregory Szorc [:gps]

Assignee

Comment 30

•

12 years ago

(In reply to Jim Blandy :jimb from comment #29) > This code, itself, will all run in a single JS stack frame. It's the caller > of this generator --- the thing receiving the yielded values, and passing > the id to the generator's 'next' method --- that's accumulating stack frames. You said the caller of this generator will accumulate stack frames. OK. Can you clarify: * If the caller is accumulating stack frames and the stack is of finite size, doesn't that mean that generators must be finite? That seems wrong. * Is this dependent on using .send() to send values back to the generator? * Is this caused by any way our code is implemented or is this just how things work? I'm just trying to figure out what exactly is accumulating stack frames. If there is a bug or suboptimal behavior in Task.jsm or in promises, we should fix that. See also bug 842347 to track changing promise.js to make it less abusive of the stack. I still don't have an answer on the best way for me to probe my code to record/log stack size. Essentially, I want to know what parts of my code are growing the stack so I can triage those first. Can you also comment on scheduling on a later tick as a strategy for combating stack growth.

Flags: needinfo?(jimb)

Jim Blandy :jimb

Comment 31

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #30) > You said the caller of this generator will accumulate stack frames. ... Well, what I meant was, it does appear that stack frames are accumulating - but this generator won't. So it must be its caller. It's not some characteristic of _configureFields that's causing the stack growth, so I'm suspicious of the other stuff. > Can you clarify: > > * If the caller is accumulating stack frames and the stack is of finite > size, doesn't that mean that generators must be finite? That seems wrong. I can't parse this question. "Generators must be finite"??? (I'm sorry I missed you on IRC just now.) > * Is this dependent on using .send() to send values back to the generator? Is what dependent on using .send()? _configureFields is counting on someone to use .send() to send it a value for 'id'. But such usage doesn't necessarily cause unbounded stack growth. > * Is this caused by any way our code is implemented or is this just how > things work? I strongly suspect that the JS code that interacts with the iterator built by configureFields is written such that each resumption of the generator happens on a deeper stack than the prior resumption. So it's just that the JS is written badly; the engine is using no more stack space than necessary. If it's Task.jsm that's driving that iterator, then I would be suspicious of Task.jsm. > I still don't have an answer on the best way for me to probe my code to > record/log stack size. Essentially, I want to know what parts of my code are > growing the stack so I can triage those first. I'm not experienced with event-driven code, but... isn't that simply the functions that appear most often on the very-deep stacks? That is, Task.jsm and promise.js? > Can you also comment on scheduling on a later tick as a strategy for > combating stack growth. That could work. It could also work to rewrite the recursion in Task.jsm using iterative JS structures. It might be the case that TaskImpl_run is trying to permit laziness at Task.jsm:207, but then promise.js is actually doing the evaluation immediately, before returning, and thereby causing the next iteration to start, five frames deeper.

Flags: needinfo?(jimb)

Gregory Szorc [:gps]

Assignee

Comment 32

•

12 years ago

I had a very extensive and informative conversation with jimb earlier tonight in #jsapi. Some takeaways: 1) The main "run some JS" function in the JS engine is js::Interpret(). Its frames take up a significant amount of C++ stack space - 37k or so in jimb's build! For most JS function calls in JS the C++ performs an inline function call so we don't keep pushing js::Interpret frames on the stack and thus the C++ stack size remains in control. i.e. there is typically a 1 to many between js::Interpret frames and JS function frames. However, some JS operations do cause a new js::Interpret to be pushed (growing the C++ stack)! 2) When you finish a JS tick, all the js::Interpret frames on the stack are popped. The next tick makes a new js::Interpret call and you go from there. 3) JS generators result in a new js::Interpret being pushed for at least each new generator instance. 4) FHR makes use of a number of JS generators, some nested. These all involve promises and Task.jsm. 5) Promise.resolve(), Promise.reject() (the obtain-a-promise-that-is-immediately-finished functions) and parts of the Task.jsm implementation perform function chaining on the same tick, quickly growing the JS tack. 6) The patch in bug 841074 caused us to emit a lot more immediately-resolved promises than before. 7) We speculate that something in Promise.resolve() or Task.jsm is causing js::Interpret frame accumulation. We will need to run under a debugger to see exactly what is going on. This, combined with use of nested generators, is causing stack blow out. So, that's the summary. Now, where do we go from here? Some options: a) FHR should wean off promises and/or Task.jsm. b) We should modify promises and/or Task.jsm to work better with SpiderMonkey or hack around it in FHR. c) Increase stack limit. d) Change the JS engine so it doesn't churn through stack so quickly. Short term, I think we should go with a combination of a and b. Long term, perhaps d can be improved on. The good news is we have a relatively easy workaround for promises and Tasks: rescheduling operations for a future tick. The problem with the stack trace Irving posted is that we have a number of immediately-resolved promises all being chained on the same tick. Each iteration of a loop in a generator or growing the JS tack and presumably also the C++ stack. "All we need to do" is periodically insert "pauses" into the Tasks by way of scheduling resumption on a future tick. For example, instead of having Promise.resolve() create a deferred, resolve it, and return the already resolved promise, we have it create the deferred, return the promise, and schedule the resolving on a future tick. When the current tick is finished, it takes the js::Interpret frames and stack size with it. On the next tick, we start with a fresh stack. Even if we have lots of promise chaining, the stack size remains in check. We can do something similar for promises. Instead of sending a result into the generator right away, we can schedule the sending for a future tick. Theoretically there is some performance penalty associated with this. However, I was getting the vibe we wouldn't notice unless this were in a tight loop. There are no such usages in FHR (max iteration count is maybe a few dozen). IMO the bigger risk is impact on other code. If you start changing the behavior of Task.jsm or promises in their implementations, other consumers (including tests) may break. I'm not very comfortable with that change, especially if we need to uplift to Aurora. I'd feel much better about creating a new promise and/or Task implementation (even if it's a fork) and switching FHR to use that. We can also add workarounds in FHR itself. In its generators we could yield a future-tick-resolving promise between each real promise. I like this as a quick and uplift-friendly solution. We can also look into removing some generators from FHR. As for how this impacts FHR, I'd like to see some more data first. FHR is enabled in Aurora starting with the 2013-03-02 build. I'd like to see what the submission count numbers look like and whether there are any oddities in the payload (like missing app info sections - which are presumed to be caused by this issue). In parallel, I'd like to work on bug 846083 so FHR manages to submit a payload even if the storage and/or collector systems fail to initialize. The report would hopefully contain enough details so we can ascertain whether this issue is impacting people in the wild. I would like to land that in Aurora, if possible. Finally, there are some other bugs on file related to the general issue of promises, tasks, and the stack: * Bug 810490 tracks a second promise implementation in the tree free from "defects" of the current. * Bug 842347 tracks reducing stack depth caused by the current promise implementation. * Bug 846979 tracks adding telemetry so we know how often the JS engine blows out of stack in the wild (we currently don't know). * Bug 846985 tracks adding a feature to the JS engine and/or Gecko to allow us to see what the stack is doing [so developers can find and fix stack explosion]. * Bug 846987 tracks changing the promise and/or task implementation to use more ticks.

Joe Walker [:jwalker]

Comment 33

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #32) > 3) JS generators result in a new js::Interpret being pushed for at least > each new generator instance. Not saying that I disagree in any way with your analysis, but the frequent stack blow-ups that we see don't involve generators, to my knowledge. They do seem to happen consistently at 204 stack frames.

Jim Blandy :jimb

Comment 34

•

12 years ago

To echo what Joe said: I don't think there are any nested generator resumptions going on here. (To be clear: generator resumption does create a new js::Interpret frame in the C++ - but since we're not nesting them, that's probably not the problem.) I would say that the same-tick resolving of promises whose value is immediately available is what's causing the stack to grow here. The first solution I'd want try is ensuring that the promise provides its value on a later tick, not immediately. Joe, as the author of the Promises code, what do you think of that?

Jim Blandy :jimb

Comment 35

•

12 years ago

(I think it would be really sad if we stopped using generators. They're perfect for event-driven programming.)

Joe Walker [:jwalker]

Comment 36

•

12 years ago

(In reply to Jim Blandy :jimb from comment #34) > I would say that the same-tick resolving of promises whose value is > immediately available is what's causing the stack to grow here. The first > solution I'd want try is ensuring that the promise provides its value on a > later tick, not immediately. > > Joe, as the author of the Promises code, what do you think of that? /me gratuitously CCs the real author of the promises code Irakli and I have debated this a number of times. I think there are a number of reasons for including a setTimeout or similar in resolve/reject before the 'then' actions are called (which I think is what you said) and a number against: - You could be surprised by something that is normally synchronous becoming asynchronous at a later date, but if it's always async then you can't be surprised. - The Promises/A+ spec says you should do that, and that's what most of the other Promise implementations do, so: - That's what users will expect, and may be surprised - If (as seems likely from the wiki) a future version of Ecmascript uses async promises then we'll have a hard time converting But on the other hand: - You can manually convert a synchronous promise to an asynchronous promise, but you can't do the reverse - Some events (like clipboard related keyboard events) have magic powers specific to that dispatch from the event loop, preventing the use of asynchronous promises in that context - Async code is viral, in contexts with some degree of polymorphism it can be a handy shortcut to reason. "I know this promise can only happen synchronously, so I can save the complexity and use the value without waiting" - i.e. preventing the unnecessary spread of asynchronisity Personally, I'd like our promise implementation to be default asynchronous, with an optional synchronous setting, however Irakli rejects this suggestion.

Joe Walker [:jwalker]

Comment 37

•

12 years ago

(In reply to Jim Blandy :jimb from comment #34) > I would say that the same-tick resolving of promises whose value is > immediately available is what's causing the stack to grow here. The first > solution I'd want try is ensuring that the promise provides its value on a > later tick, not immediately. > > Joe, as the author of the Promises code, what do you think of that? Also, I've thought of manually adding setTimeout all over our code, but I think we'd need to add it in enough places that it becomes nasty, and I think we're likely to find that we have code that breaks unpredictably when you combine it in new ways. "Ah you called Foo when your stack depth was already X, you'll need to insert a setTimeout there". I think I'd prefer to have a second separate Promise implementation somewhere than this.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 38

•

12 years ago

Given the summary of the arguments in comment 36 and the other context in this bug, I think we should switch our promises to be always-async. But I guess we should continue that particulary discussion in bug 842347. To address this bug in the near term, it seems like we should just work around the Promise limitations in FHR code to the extent that that's possible (by avoiding them, or adjusting use of them). It seems unlikely to me that JS engine changes (either an increased stack limit or stack use optimization) are required to address this, even in the longer term.

Jim Blandy :jimb

Comment 39

•

12 years ago

I feel like there's a rumor here getting started that I would really like to strangle in the crib (thus the cross-bug repetition): Generator expression is not any worse for stack growth than any number of other commonly used JS constructs. The troubles in bug 845842 - the immediate motivation for this bug - are due to our promises being sometimes asynchronous and sometimes synchronous, and can arise without any generator use at all. In the deep stacks we saw there, there were no generator frames on the stack. The stack consumption for a generator resumption is the same as that consumed for a getter or setter property invocation, or invoking a proxy handler, or a valueOf or toString function call. I *would* agree that generator use makes it easier to trip over the bad case in our promise code. Caution combining Task.jsm with our unrevised promise implementation makes sense. But I'm persuaded, by the list of four reasons to be always asynchronous that Joe cites in bug 845842 comment 36, that we need an always-async promises implementation. Such an implementation would be fine for use with Task.jsm.

Joe Walker [:jwalker]

Comment 40

•

12 years ago

(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #38) > Given the summary of the arguments in comment 36 and the other context in > this bug, I think we should switch our promises to be always-async. But I > guess we should continue that particulary discussion in bug 842347. I replied in bug 842347 comment 16. Summary: I'm suggesting a simple change to promise.js which would avoid the need for hacks to FHR.

Jim Blandy :jimb

Comment 41

•

12 years ago

Compulsive correction to comment 39: s/Generator expression is/Generator expressions are/

Gregory Szorc [:gps]

Assignee

Comment 42

•

12 years ago

Attached patch Insert next tick promises, v1 (obsolete) — Details — Splinter Review

This patch a) creates a promise that is resolvable on some future tick b) hooks it up in FHR code. Normally, I would patch code in Toolkit and/or add-on SDK. However, given the global implications of such a change and the potential desire to uplift this patch, I think it's best to have a local solution for now. We can clean this up when the dust settles in promise/Task land.

Assignee: nobody → gps

Status: NEW → ASSIGNED

Attachment #720811 - Flags: review?(rnewman)

Richard Newman [:rnewman]

Comment 43

•

12 years ago

Comment on attachment 720811 [details] [diff] [review] Insert next tick promises, v1 Review of attachment 720811 [details] [diff] [review]: ----------------------------------------------------------------- My preference is to have an analogue of this logic live in a Task class, rather than be arbitrarily inserted into the bodies of calling code. That is, have a version of Task.spawn that returns a delayed promise, or an iterator whose final (or every) promise is delayed. This also moves us closer to a reusable solution to the problem: we could move that code into Task.jsm.

Gregory Szorc [:gps]

Assignee

Comment 44

•

12 years ago

(In reply to Richard Newman [:rnewman] from comment #43) > That is, have a version of Task.spawn that returns a delayed promise, or an > iterator whose final (or every) promise is delayed. > > This also moves us closer to a reusable solution to the problem: we could > move that code into Task.jsm. So, I guess we fork Task.jsm until our changes are "upstreamed" :/

Richard Newman [:rnewman]

Comment 45

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #44) > So, I guess we fork Task.jsm until our changes are "upstreamed" :/ Doesn't strictly need to be a fork; the changes can go into toolkit/content/Task.jsm. We just need to preserve the existing behavior in parallel. Task.jsm is pretty compact; I'd rather not see task/promise-related scheduling workarounds start to bud elsewhere in the tree.

Richard Newman [:rnewman]

Comment 46

•

12 years ago

Comment on attachment 720811 [details] [diff] [review] Insert next tick promises, v1 Clearing review flag. Greg, this is back to you.

Attachment #720811 - Flags: review?(rnewman)

Richard Newman [:rnewman]

Comment 47

•

12 years ago

Also needs measurement of errors.

Priority: -- → P1

Gregory Szorc [:gps]

Assignee

Comment 48

•

12 years ago

This bug has been quiet and I wanted to give an update. tl;dr We don't (yet?) have sufficient data from error reports to indicate this is a major issue. FHR self-reports errors it encounters as part of the uploaded payload. We've been using this on the Nightly and Aurora channels to identify and triage bugs. Of the literally hundreds of thousands of payloads on the servers, 6 contain the dreaded "too much recursion" error. If that number were accurate, it would be far below my radar and this bug would be maybe a P4 and definitely not a beta blocker. Until recently, FHR only reported errors that occurred as part of collection or payload generation. With bug 846083, FHR now reports errors that occur as part of initialization. This could potentially uncover a lot of clients experiencing this error. I would like to remain in a holding pattern on this bug until we have more data to indicate this occurs in the wild and not just on debug builds and/or an extremely small number of users. Assuming Nightly updates get turned back on, we should have enough data by EOD Wednesday to determine whether this is truly an issue. If the error counts remain in the same ballpark, I would be comfortable attributing the issue to debug builds and switching this to a P4.

Gregory Szorc [:gps]

Assignee

Comment 49

•

12 years ago

Attached patch Don't resolve promises on the current tick, v1 — Details — Splinter Review

This patch updates nearly every occurrence of Promise.resolve() with our new version provided by CommonUtils. We should no longer have excessive stack accumulation due to same-tick resolved promises within generators.

Attachment #720811 - Attachment is obsolete: true

Attachment #726381 - Flags: review?(rnewman)

Richard Newman [:rnewman]

Comment 50

•

12 years ago

I feel like we're building a utils wrapper to do something that should be built-in. promise.js is small, and it's also filled with module boilerplate and var-spew. How about we just bring promise.js (and maybe Task.jsm?) into the modern strict-mode world, and make sure they offer this kind of functionality?

Gregory Szorc [:gps]

Assignee

Comment 51

•

12 years ago

(In reply to Richard Newman [:rnewman] from comment #50) > I feel like we're building a utils wrapper to do something that should be > built-in. promise.js is small, and it's also filled with module boilerplate > and var-spew. > > How about we just bring promise.js (and maybe Task.jsm?) into the modern > strict-mode world, and make sure they offer this kind of functionality? Long-term, sure. I don't really feel like making Mossop and Co. go through a fire drill to update the add-on SDK and backport things to 21. We'll create a self-contained solution now and figure out a long-term solution in one of the many bugs referenced by this one. i.e. I don't want to make them clean up our mess.

Richard Newman [:rnewman]

Comment 52

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #51) > Long-term, sure. I don't really feel like making Mossop and Co. go through a > fire drill to update the add-on SDK and backport things to 21. We'll create > a self-contained solution now and figure out a long-term solution in one of > the many bugs referenced by this one. i.e. I don't want to make them clean > up our mess. See Bug 810490. I argue that a self-contained solution is "land a different promise implementation that doesn't suck, rather than landing wrapper utils". I agree that updating the add-on SDK should be an explicit non-goal; I don't want to make work for other teams. That raises the question "why are we using a promise implementation from the add-on SDK that blows the stack"? I expect the answer is "because that was the first implementation that landed, and nobody took the time to move it into toolkit". I'll accept the approach in this patch so long as we have a firm plan to replace it *soon*. I really don't want us long-term maintaining a pile of utilities wrapping code that we just want to get rid of, because without that firm plan, that's exactly what will happen. See services/common/utils.js, which is full of broken-upstream-API wrappers like "makeURI" (without throwing an exception), "encodeUTF8" (without acting like we're C++), "safeAtoB" (without producing incorrect padding), etc. etc. ad nauseum.

Gregory Szorc [:gps]

Assignee

Comment 53

•

12 years ago

I don't like it any more than you, Richard. If we didn't have to uplift this, I'd be advocating for doing it a better way. We already have a number of bugs on file to make promises and tasks suck less, so I'm not too worried that this will be yanked out as soon as a better alternative is available.

Richard Newman [:rnewman]

Comment 54

•

12 years ago

OK, so let's track this for 21, block the "do it better" bug I just filed (Bug 852411), and land this.

Blocks: 852411

tracking-firefox21: --- → ?

Richard Newman [:rnewman]

Updated

•

12 years ago

Attachment #726381 - Flags: review?(rnewman) → review+

Gregory Szorc [:gps]

Assignee

Comment 55

•

12 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/bc6c0983cec5 I'm trying to think if there's a reason to keep this [leave open]. I don't think so. I think all the related bugs have been filed and am content with letting this close out with the next merge.

Target Milestone: --- → mozilla22

Ryan VanderMeulen [:RyanVM]

Comment 56

•

12 years ago

https://hg.mozilla.org/mozilla-central/rev/bc6c0983cec5

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

bhavana bajaj [:bajaj]

Updated

•

12 years ago

status-firefox20: --- → unaffected

status-firefox21: --- → affected

tracking-firefox21: ? → +

bhavana bajaj [:bajaj]

Comment 57

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #55) > https://hg.mozilla.org/integration/mozilla-inbound/rev/bc6c0983cec5 > Can you please nominate for uplift on aurora given this is tracking and status is affected on Fx21 . Looks like its baked on m-c for a few days as well so should be stable enough for uplift unless we are waiting on anything else here ?

Gregory Szorc [:gps]

Assignee

Comment 58

•

12 years ago

Comment on attachment 726381 [details] [diff] [review] Don't resolve promises on the current tick, v1 [Approval Request Comment] Bug caused by (feature/regressing bug #): FHR User impact if declined: Possible stack exhaustion Testing completed (on m-c, etc.): It's baked for a few days. Risk to taking this patch (and alternatives if risky): String or UUID changes made by this patch: None

Attachment #726381 - Flags: approval-mozilla-aurora?

bhavana bajaj [:bajaj]

Comment 59

•

12 years ago

(In reply to Gregory Szorc [:gps] from comment #58) > Comment on attachment 726381 [details] [diff] [review] > Don't resolve promises on the current tick, v1 > > [Approval Request Comment] > Bug caused by (feature/regressing bug #): FHR > User impact if declined: Possible stack exhaustion > Testing completed (on m-c, etc.): It's baked for a few days. > Risk to taking this patch (and alternatives if risky): Can you please call out any risk this may have outside FHR ? Thanks

Gregory Szorc [:gps]

Assignee

Comment 60

•

12 years ago

There should be no risk outside of FHR. All the code is self-contained to FHR.

bhavana bajaj [:bajaj]

Updated

•

12 years ago

Attachment #726381 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Gregory Szorc [:gps]

Assignee

Comment 61

•

12 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/569a5ca67a56

status-firefox21: affected → fixed

Mike Connor [:mconnor]

Updated

•

12 years ago

Component: Metrics and Firefox Health Report → Client: Desktop

Product: Mozilla Services → Firefox Health Report

Target Milestone: mozilla22 → Firefox 22

Updated

•

6 years ago

Product: Firefox Health Report → Firefox Health Report Graveyard

You need to log in before you can comment on or make changes to this bug.