Last Comment Bug 531886 - [regression] *.mfasl fastload caches not invalidated after build / changing XUL/JS code
: [regression] *.mfasl fastload caches not invalidated after build / changing X...
Status: RESOLVED FIXED
: regression
Product: Core
Classification: Components
Component: XPCOM (show other bugs)
: Trunk
: All All
: -- major with 2 votes (vote)
: ---
Assigned To: (dormant account)
:
:
Mentors:
: 551813 (view as bug list)
Depends on: packedxpi 556644 594058 789346
Blocks: 511761 551813
  Show dependency treegraph
 
Reported: 2009-11-30 10:43 PST by Ben Bucksch (:BenB)
Modified: 2012-09-06 17:40 PDT (History)
38 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
final+


Attachments
make in objdir/browser causes invalidation, adds pref to turn off cache (147.53 KB, patch)
2010-05-13 17:27 PDT, Benedict Hsieh [:benedict]
no flags Details | Diff | Splinter Review
make in objdir/browser causes invalidation, adds pref to turn off cache (9.43 KB, patch)
2010-05-13 18:03 PDT, Benedict Hsieh [:benedict]
no flags Details | Diff | Splinter Review
make in objdir/browser, adds CL option to invalidate cache (6.98 KB, patch)
2010-05-18 17:43 PDT, Benedict Hsieh [:benedict]
benjamin: review-
Details | Diff | Splinter Review
fixes above comments (5.66 KB, patch)
2010-05-25 13:16 PDT, Benedict Hsieh [:benedict]
no flags Details | Diff | Splinter Review
make invalidation is debug-only, adds environment variable (5.89 KB, patch)
2010-05-25 15:01 PDT, Benedict Hsieh [:benedict]
benjamin: review+
Details | Diff | Splinter Review
make invalidation is debug-only, adds environment variable, fixes special build environments (5.81 KB, patch)
2010-06-22 16:29 PDT, Benedict Hsieh [:benedict]
no flags Details | Diff | Splinter Review
453230: make invalidation is debug-only, adds environment variable, fixes special build environments (5.91 KB, patch)
2010-06-22 17:34 PDT, Benedict Hsieh [:benedict]
benjamin: review+
Details | Diff | Splinter Review
unbitrotted, make invalidation is debug-only, adds environment variable, fixes special build environments (5.96 KB, patch)
2010-07-02 11:47 PDT, Benedict Hsieh [:benedict]
no flags Details | Diff | Splinter Review
Workaround from comment 46 as installable toolkit extension (3.27 KB, application/x-xpinstall)
2010-07-15 01:11 PDT, Nils Maier [:nmaier]
no flags Details

Description Ben Bucksch (:BenB) 2009-11-30 10:43:23 PST
This is an inherent regression caused by bug 511761. That bug changed the behavior to no longer stat() jar files and components and only reply on the version in compatibility.ini. That causes severe, practical problems for XUL development, both core developers and XUL developers.

bsmedberg offered only 3 workarounds for XUL devs:
* Use debug builds
  That's out for me, because they are *way* too slow (2-3 times slower),
  and way too spammy lots of assertions (on console and sometimes as popups)
  which nobody fixes, and I can't see my own output anymore.
  Debug builds are unusable for me.
* Change version/buildID in compatibility.ini.
  That's good for releases, but not during development.
  make -C mail must work and be sufficient (if I use jar files.
  If I don't use jars, just changing the XUL/JS file must be sufficient).
* Delete *.mfasl in my profile
  That works, but
  a) is unintuitive and will be a problem for new XUL developers (comment 38).
     Even many Mozilla developers (e.g. dmose) are/were not aware of this
     and accounted this as "strange errors".
  b) is error-prone. I know of this requirement, and script it /
     use bash history, but that is cumbersome, and I need to change this
     *every time* I create a new profile for testing, and I have already
     forgotten about this several times, leaved to lots of wasted time
     hunting on the wrong ends (my code, build, ...).
  This is extremely annoying to have to have to *manually* delete files
  in my profile, given the long profile and strange path, between
  *every* change and test (app start).

We need a better solution, it can't stay like this. This is unusable for
development.

Severity:
I lost a lot of time several times, because my changes didn't take effect. I
already learned to delete *.mfasl before starting TB, but I script that,
because it's a long path, and I changed the profile, and was wondering why
|make| is broken.

This makes XUL development a pain. There's a tradeoff between dev and runtime, and we already make a huge tradeoff by using JS and XUL instead of C++. Do we have to let new XUL developers suffer severe pain, and even cut experienced devs, for the few dozen milliseconds we gained there?

It's stuff like this which makes people drop a solution like XUL. This is observation: I have seen people get very frustrated and walk away from XUL, because they couldn't get their attempts to even start. They burn an hour sticking around in the dark and walk away. Most of the time, if I diagnosed it, turned out to be component registration issues. This is a prime reason why people don't even start using XUL, from what I could see.
Comment 1 Ben Bucksch (:BenB) 2009-11-30 10:50:07 PST
Another workaround: Disable fastload with pref "nglayout.debug.disable_xul_fastload" = false.
I don't know how much this will slow the build down, but I'd need to do that in each profile, and I often create new profiles for testing, so that doesn't work.
Comment 2 Benjamin Smedberg [:bsmedberg] 2009-11-30 10:53:28 PST
If your requirement is "I want something to work out of the box without setting a pref", I believe that is unreasonable. I don't believe that disabling fastload significantly affects non-startup performance time.
Comment 3 Ben Bucksch (:BenB) 2009-11-30 10:58:05 PST
I think the expectation that just |make| (including depend builds) works is quite reasonable. And |make -C mail| makes the changes in mail/ apply, out of the box.
Comment 4 Brendan Eich [:brendan] 2009-11-30 11:19:34 PST
Indeed, make(1), with help from the compiler, tracks dependencies such as .h files so developers don't have to. The situation with fastload is analogous (or consider ccache as another, closer analogue). I implemented fastload depdency tracking in the old days precisely to make fastload transparent to XUL developers.

Benjamin: it is "unreasonable" to break this without reasons that address the usability bug filed here, including all the details (multiple profiles, out of box confusion). Assertions about what you think developers should do, instead of dealing with what they actually do, do not nullify this bug report.

Consider a better fix. I don't believe that it must require slowing down release builds measurably.

/be
Comment 5 Brendan Eich [:brendan] 2009-11-30 11:29:23 PST
Bug 511761 comment 52 proposes an imprecise, conservative scheme to force regen of fastload/startup caches. I couldn't tell if it was necessarily only for DEBUG builds, though.

/be
Comment 6 Benjamin Smedberg [:bsmedberg] 2009-11-30 11:36:33 PST
No, not only for DEBUG builds. That solves the problem in comment #3 for developers who are developing from a tree, which is the easy problem. Comment #0 sounded like it was talking about extension development using shipped release builds, which is a different kind of problem (and one where disabling the fastload cache has often been the only solution, even prior to the latest change).
Comment 7 Benedict Hsieh [:benedict] 2009-12-02 14:54:04 PST
I think we need some more clarity on what the behavior should be. Seems that a pref is not acceptable because of the multiple-profiles problem (is this a problem that many XUL developers experience?) Is it acceptable to require that developers issue a |make|? 

One concern that's been brought up is that this system is very confusing to new XUL developers. If that's the intended audience this fix is supposed to help, it's worth considering whether it's easier to set a pref for each profile or issue a make (which requires development from a tree, like bsmedberg says above).
Comment 8 Benedict Hsieh [:benedict] 2009-12-02 14:57:32 PST
Sorry, premature commit on comment above.

Continuing: is it easier for XUL developers to set a pref per profile, or to issue a make each time a change is made? Or is neither of these solutions acceptable?

If it's the case that changing a file needs to work without changing any prefs or requiring any additional input on the developer's part (no make, or removal or creation of any signal file), then it seems like statting is the only solution; I don't think there's any other way to tell if someone just opened up a file in an editor and mucked around with it.
Comment 9 Brendan Eich [:brendan] 2009-12-02 15:11:50 PST
Ben Bucksch and others should comment, but we may never hear from XUL devs for whom the status quo is confusing and mysterious, and frustrating. This is not a case of "no bug reports, no problem!".

The XUL model is web-like and web developers benefit from shift-reload or restart fetching the latest filesystem or served content. Caching is supposed to be an optimization, and optimizations are supposed to be transparent, optional. I'm not saying this must continue, but it has been the case for a long while.

Benjamin, what problems were you referring to in comment 6 when you wrote "(and one where disabling the fastload cache has often been the only solution, even prior to the latest change)"?

/be
Comment 10 Ben Bucksch (:BenB) 2009-12-02 16:49:04 PST
Yes, the expectation is that it "just works", changes apply instantly.
It used to work before. That's what the change broke. That's what needs to be restored.

You should find a solution which works just as well as before. Of course it's a matter of viewpoint, but IMHO, the solution is not to ask people to do something new, but to fix the code to work as well as it used to do.

I think we had this discussion about 10 years ago, and ended up with the solution before the change.
Comment 11 Ben Bucksch (:BenB) 2009-12-02 16:58:42 PST
Another way to approach it is to see how XUL devs actually modify code.
1. One obvious way is the Mozilla build system, and starting |make|. That *must* work, breaking |make| is a clear bug. It currently doesn't.

2. jar files: People might make a new jar file and drop it in. That should be noticed. Firefox has 4 jar files, Thunderbird has 8, that shouldn't be a problem to stat() (check the last modification time).

3. Individual chrome files: There's an option to use not jar files, but plain files for chrome (chrome.manifest specifies a directory, not jar file). I guess many XUL devs work this way, at least in some cases, especially to be able to edit the file in a text editor and see the changes immediately, and it's critical to keep this working. You'd need to stat() them all, or not cache them.

There might be other cases which I have missed.
Comment 12 (dormant account) 2009-12-02 17:04:55 PST
Sounds the problem is psychological. Users like Ben Bucksch see files laying around and expect that modifying them will produce some result.

Afaik best way to fight that problem is to stick all of the js files into a jar.  Then
a) Users wont be temped modify raw files
b) If they do want to modify the jar, checking the jar modified date is much better than than stat()ing each individual component.
Comment 13 Ben Bucksch (:BenB) 2009-12-02 17:12:40 PST
Do you need to get insulting? I'm not a user, but a developer in this project since 10 years.

> best way to fight that problem is to stick all of the js files into a jar.

No, it doesn't help. TB build uses jar files, and I see the bug there.
Did you even read comment 11?
Comment 14 (dormant account) 2009-12-02 17:22:03 PST
(In reply to comment #13)
> > best way to fight that problem is to stick all of the js files into a jar.
> 
> No, it doesn't help. TB build uses jar files, and I see the bug there.
> Did you even read comment 11?

b) is just a proposed solution, so yes you'd see the bug there.

I think there is merit to your idea where in flat chrome builds we should stat() individual files. Clearly in those builds the user doesn't care about startup performance.

Looks to me like we are reaching a consensus: 
i) Naked js/xul files are misleading because we do not stat() them. In this case setting a pref should be enough.
ii) Flat builds should stat()
iii) Old way was a performance hole, we aren't going back to that.
Comment 15 Ben Bucksch (:BenB) 2009-12-02 17:29:21 PST
> in flat chrome builds

I am not only talking about a Firefox chrome built as flat chrome, but a Firefox which has an extension which uses flat chrome.

Basically, stat() *all* files that the chrome depends on. In normal Firefox builds, that's exactly 4 files in 1 directory (chrome/), plus possibly extensions (typically 1 file per extension), as far as I can see. That shouldn't be a problem.
Comment 16 Ben Bucksch (:BenB) 2009-12-02 17:32:58 PST
> Looks to me like we are reaching a consensus: 
> iii) Old way was a performance hole, we aren't going back to that.

FYI, that is not true, we do not agree on that.
Comment 17 (dormant account) 2009-12-02 17:35:49 PST
(In reply to comment #15)
> > in flat chrome builds
> 
> I am not only talking about a Firefox chrome built as flat chrome, but a
> Firefox which has an extension which uses flat chrome.
> 
> Basically, stat() *all* files that the chrome depends on. In normal Firefox
> builds, that's exactly 4 files in 1 directory (chrome/), plus possibly
> extensions (typically 1 file per extension), as far as I can see. That
> shouldn't be a problem.

I'm not an extension expert, but this sounds like it may be a decent compromise. If we can assume that normally extensions are packaged, then this could work. Either that or set some flag in the manifest to blacklist extension files from fasl.
Comment 18 Ben Bucksch (:BenB) 2009-12-02 17:50:09 PST
Great.

I just realized that that this was only chrome, i.e. XUL.mfasl, though, unfortunately.

XPC.mfasl seems to cache components written in JavaScript and Components.import. For these, I think the right solution seems to be to allow to package them in the jar file together with chrome. That's the same optimization pattern that we used for chrome files (jars), xpt files, DLLs (libxul) etc. (And it would make sense anyways, it has always annoyed me that a JS module lives in a different dir than the chrome that uses it. There's a bug about that, can't find it.) I think we should keep the old behavior until this is implemented and deployed.
Comment 19 Ben Bucksch (:BenB) 2009-12-02 17:55:06 PST
I was thinking of bug 398579, just that we need chrome: instead of resource: or file:. Maybe that works by now. If not it must be solvable.
Comment 20 Robert Kaiser 2009-12-02 18:02:11 PST
(In reply to comment #17)
> If we can assume that normally extensions are packaged, then this
> could work.

Actually, if this isn't the case, I guess we can warn extension developers that packaging up improves speed a lot, which I guess the major extension devs will follow eagerly. And for those that don't follow, we won't be worse than we are now (e.g. in FF 3.5), right?
Comment 21 Ben Bucksch (:BenB) 2009-12-02 18:07:14 PST
We have a "bit" of control over extensions via AMO, too.
Comment 22 Brendan Eich [:brendan] 2009-12-02 19:34:35 PST
This is a regression bug. It is not just a psychological problem, of course -- but I don't be scold-y. I do want an owner and Ben gets the prize by default. But if there's a better owner, please reassign. It sounds like we may have consensus on the fix, but I'm not 100% certain.

/be
Comment 23 Benedict Hsieh [:benedict] 2009-12-08 11:55:58 PST
Suggestion from shaver, what do you guys think? For now, s/startup_cache/fastload in his comments below.

[5:58pm]shaver: benedict: here's my idea
[5:58pm] shaver: benedict: NO_STARTUP_CACHE=1 in the env, means we don't use the cache
[5:58pm] shaver: similarly -no-startup-cache
[5:59pm] shaver: and then they can start FF that way if they want
...
[5:59pm] shaver: it's not similar to the pref idea
[5:59pm] shaver: because it doesn't require changing profiles
[5:59pm] taras: we have prefs that live outside of profile dir
[6:00pm] shaver: it doesn't require changing anything
[6:00pm] shaver: like, editing anything
Comment 24 Ben Bucksch (:BenB) 2009-12-08 12:26:23 PST
It's a nice idea. But it doesn't fix either of the cases above: |make| is still broken (I can just manually work around it), and newbies still run into it. I'd prefer the solution in comment 15 and 18.
Comment 25 Nickolay_Ponomarev 2010-01-09 17:10:41 PST
So the original bug 511761 made many changes at once, breaking different workflows, and now they're all discussed (and confused) together.

I tried to list the changes, the workflows I could think of, and the regressions in each workflow at http://docs.google.com/View?id=dfs226xt_14hkzngff7 - I can move to wiki.m.o or give edit access if there's interest.

Perhaps we should update the lists above first, then come to agreement about the relative importance of various broken workflows compared to the startup win.

I personally think that |make| should just work and there should be a pref, preferably auto-set if another cache-disabling pref is set, that allows to revert to stat-checking (so that changes to the XPCOM components just work - since I guess running auto-registration each time would be too slow).

BTW Brendan in comment 9 wrote - and I'm curious too:
> Benjamin, what problems were you referring to in comment 6 when you wrote "(and
> one where disabling the fastload cache has often been the only solution, even
> prior to the latest change)"?
Comment 26 Dave Townsend [:mossop] 2010-01-20 15:14:46 PST
Let me just see if I have a full understanding of the effects we are looking at here, from the point of view of the affected developers. Are these statements incorrect or missing anything?

For extension developers (and we probably only need be concerned with those using the pointer file install technique), any changes they make to JS components, XUL or JS are ignored.

For large scale XUL app developers (those with a build system like Firefox/Thunderbird), changes made to JS components, XUL or JS in the source are ignored unless the app is fully rebuilt (which would change the app's build ID), buolding just the part of the app that is changed doesn't help.

For small scale XUL app developers who commonly run XULRunner directly on their source and so don't use JAR files at all, their changes are ignored unless they manually change the build ID in application.ini.

In all these cases deleting the caches in the profile folder or enabling nglayout.debug.disable_xul_cache to disable the cache will allow the changes to be picked up.
Comment 27 Nickolay_Ponomarev 2010-01-20 16:09:03 PST
> or enabling nglayout.debug.disable_xul_cache to disable the cache will allow 
> the changes to be picked up.
Unfortunately no, that pref only affects the XUL cache, and not the JS components / modules cache.
Comment 28 Ted Mielczarek [:ted.mielczarek] 2010-01-21 11:12:31 PST
I hit this hacking on a JS module, FWIW, and I can't find any way to work around it, which is quite a pain. I guess I have to rm the fastload files from my profile every time I make a change? I'd happily set a pref or run with an environment variable or whatever I have to do, but we don't actually have that option for JS components/modules.
Comment 29 Mike Shaver (:shaver -- probably not reading bugmail closely) 2010-01-21 12:35:05 PST
Environment variable would work for components, no?
Comment 30 Ted Mielczarek [:ted.mielczarek] 2010-01-21 15:51:26 PST
Yeah, if we had one, I would totally use it!
Comment 31 Ben Bucksch (:BenB) 2010-01-21 16:20:14 PST
Dave, yes, correct (modulo JS components as Nickolay said).
Comment 32 Benedict Hsieh [:benedict] 2010-01-22 16:15:29 PST
Mossop, what is your suggestion on this? I think you mentioned offline that some previous helpful behavior that people were relying on might actually have been unintended/unreliable, maybe you can outline what the real "expected behavior" was and what it should be.
Comment 33 Dave Townsend [:mossop] 2010-01-26 10:49:49 PST
So I think we only need to do a couple of small things here.

I suspect that most extension developers already have the pref nglayout.debug.disable_xul_cache set, so we should extend that to also disable caching components/modules. The name sucks but it's what is always in use so I think we should just go for it.

Large scale XUL app developers should for the most part be able to get by by either doing full builds, or building the bit that sets the app buildid as well as whatever small part they need building (OSX already has a similar problem since as long as I can remember). However I think I agree that adding an environment variable to disable caching would make this even easier, developers can just set it in their development environment, we should probably just add it to MozillaBuild to hit the majority of the windows developers at once.

Small scale XUL app developers can then use either a pref or environment variable as suits them. I wonder though if it would also be helpful to add a flag to application.ini that would disable the cache too. Probably not difficult and would mean apps can opt out of it until they decide they are doing a release.
Comment 34 Robert Kaiser 2010-01-26 10:55:34 PST
More use of the "disable cache" pref is surely a good idea for add-on and rapid UI dev.
We should make sure that doing "make chrome" anywhere always triggers something that makes us rebuild all caches/fastload.
And an env var or even application.ini setting for XUL app dev is a nice addition for sure, yes.
Comment 35 Ben Bucksch (:BenB) 2010-01-26 11:16:53 PST
Dave, all of that requires devs to do something, for each dev session and/or profile. And it requires the dev to be aware of the problem, failing for new XUL devs.

My proposals in comment 15 and 18 require no changes from devs and should restore old behaviour, without severe performance penalities.
Comment 36 Dave Townsend [:mossop] 2010-01-26 11:36:17 PST
(In reply to comment #35)
> Dave, all of that requires devs to do something, for each dev session and/or
> profile. And it requires the dev to be aware of the problem, failing for new
> XUL devs.

I suspect it will be harder to get the information out to existing XUL devs than new ones tbh, but still.

> My proposals in comment 15 and 18 require no changes from devs and should
> restore old behaviour, without severe performance penalities.

But they do still introduce perf penalties that grow with every add-on installed and my understanding is that it would be hard to implement your suggestion in comment 18.
Comment 37 Ben Bucksch (:BenB) 2010-01-26 11:50:47 PST
Stating even 20 files for 20 extensions will be completely insignificant compared to the startup slowdown that the extensions itself cause.

Yes, the suggestion in comment 18 is harder to implement (but has significant other, unrelated positive results). But the main use case that XUL devs run into are XUL and JS files, not JS components, so I think comment 15 (stat the 4-10 JAR files) is most important.

Comment 18 (JS components in JAR files), which I think will come sooner or later anyways, will then directly profit from that, without additional overhead (they're in the same JAR files).
Comment 38 Ben Bucksch (:BenB) 2010-01-26 11:51:31 PST
An alternative for JS components is to continue stating them, until comment 18 is implemented. We don't have that many of them either.
Comment 39 Dave Townsend [:mossop] 2010-01-26 11:59:23 PST
(In reply to comment #37)
> Stating even 20 files for 20 extensions will be completely insignificant
> compared to the startup slowdown that the extensions itself cause.

All the more reason to try to reduce the amount of that slowdown where we can.

(In reply to comment #38)
> An alternative for JS components is to continue stating them, until comment 18
> is implemented. We don't have that many of them either.

We have 50-60 of them, It's pretty likely that not statting these was what gave us the performance improvements in bug 511761 in the first place.
Comment 40 Ben Bucksch (:BenB) 2010-01-26 12:05:08 PST
> All the more reason

No, if the slowdown is already 3 seconds, and we'll have little chance to improve that, because it's JS code of extension writers, it makes no sense to optimize away ~10ms (that's what we're talking about), if it causes significant problems, as it does here. It's just not rational.

> We have 50-60 of [JS components]

> It's pretty likely that not statting these was what gave
> us the performance improvements in bug 511761 in the first place.

If that's the case, why don't you just back this change out and implement comment 18 instead?
Comment 41 Dave Townsend [:mossop] 2010-01-26 12:19:11 PST
(In reply to comment #40)
> > All the more reason
> 
> No, if the slowdown is already 3 seconds, and we'll have little chance to
> improve that, because it's JS code of extension writers, it makes no sense to
> optimize away ~10ms (that's what we're talking about), if it causes significant
> problems, as it does here. It's just not rational.

Wins should always be weighed against their costs of course.

> > We have 50-60 of [JS components]
> 
> > It's pretty likely that not statting these was what gave
> > us the performance improvements in bug 511761 in the first place.
> 
> If that's the case, why don't you just back this change out and implement
> comment 18 instead?

I should really have said "the majority of the performance improvements", bug 511761 did more than that and more than perf wins and was far simpler to do than what you propose in comment 18. I don't agree that with the simpler fixes proposed this is as significant an issue as you believe
Comment 42 Ben Bucksch (:BenB) 2010-01-26 13:01:55 PST
A pref and/or env var don't really help any of the core problems I mentioned:
* It doesn't fix |make|
* It doesn't help newcomers (nobody reads documentation)
* I need to do stuff whenever I'm at a new site, machine, new profile etc.
Any of that means for me that the solution is no good.

I believe the costs of the change of bug 511761 outweigh the benefit of 50ms or how much it was, so I think it should be backed out.

How about this for a compromise:
1) (Comment 15) Continue to |stat| any files that chrome depends on
   (i.e. are in chrome manifest). For Firefox release builds, that's
   4 files in 1 directory, plus typically 1 file per extension, so
   not a big perf problem to stat 4-10 files.
   Easy to implement, small perf cost, high dev usability gain.
2) (Your suggestion) Implement a pref and ".autoreg" file or similar
   for JS components to be |stat|ed.
3) |make| creates a file ".autoreg" or similar in the install dir, which
   triggers the same |stat| as the pref.
4) (comment 18) I expect JS components to be able to live in JAR files
   one day anyways, for the other significant advantages that gives, e.g.
   source code and install files are more logically ordered, and perf
   improvements in other situations like runtime and installer.
   If that happens, 1) causes it to just work and workarounds 2) and 3)
   are not longer needed.
Comment 43 Ben Bucksch (:BenB) 2010-02-10 06:05:05 PST
Found the bug for 4): bug 406538.
Comment 44 Dave Townsend [:mossop] 2010-03-09 15:27:50 PST
(In reply to comment #43)
> Found the bug for 4): bug 406538.

That is for JS modules not components.
Comment 45 Michael Kraft [:morac] 2010-03-12 09:15:51 PST
*** Bug 551813 has been marked as a duplicate of this bug. ***
Comment 46 Michael Kraft [:morac] 2010-03-12 10:45:47 PST
I don't know if this helps with application development, but for extensions development I worked around this by creating an add-on that simply executes the following two lines at "app-startup" for branch 1.9.3a1 or later builds:

let XRE = Cc["@mozilla.org/xre/app-info;1"].getService(Ci.nsIXULRuntime);
XRE.invalidateCachesOnRestart();


This was posted in bug 511761 comment 39 and works.  I just install it into my development profile and the problem goes away since the fastload cache files are automatically invalidated every time I run.  If one has multiple development profiles, it can be installed globally.

It would be nice if there was a way to do this without having to install an add-on though, sort of like comment 23.
Comment 47 Benedict Hsieh [:benedict] 2010-05-07 18:50:48 PDT
bsmedberg and I have come up with the following fix for this bug.

-Bring back the '.autoreg' file in some form, so issuing make in /browser and perhaps some other directories will cause a cache invalidation on the next run.
-Create a more universal pref to turn off the startup caches, so that both js component cache and xul cache are disabled together. This could also be an environment variable.

This patch will be on top of work in bug 520309.
Comment 48 Benedict Hsieh [:benedict] 2010-05-13 17:27:58 PDT
Created attachment 445244 [details] [diff] [review]
make in objdir/browser causes invalidation, adds pref to turn off cache

Kind of a WIP. The problem is that JS Components get loaded early, and I think the pref service isn't really functional at that point. Specifically, they are loaded before the "profile-do-change" notification.

I'm not sure what to do about this. Going to look into the pref service a bit more, but looking for prefs when mozJSComponentLoader is loading things doesn't return anything, and the same lookups work later when xulPrototypeCache is loading things (after the notification). Maybe this means that a pref isn't the right solution for turning off caching of JS components, though.
Comment 49 Michael Kraft [:morac] 2010-05-13 17:41:38 PDT
Well then how about a command line argument?
Comment 50 Benedict Hsieh [:benedict] 2010-05-13 18:03:03 PDT
Created attachment 445249 [details] [diff] [review]
make in objdir/browser causes invalidation, adds pref to turn off cache

Sorry, fixed patch.
Comment 51 Ben Bucksch (:BenB) 2010-05-14 03:08:10 PDT
Commandline argument would be a good idea and would work for me. I need that anyways during development, for -no-remote, -P and sometimes -jsconsole.

(Nevertheless, I still think that 1) and 4) in comment 42 is the right fix, and the rest are just workarounds)
Comment 52 Benedict Hsieh [:benedict] 2010-05-18 17:43:32 PDT
Created attachment 446107 [details] [diff] [review]
make in objdir/browser, adds CL option to invalidate cache

I removed the pref, because it seems like some clients (currently, mozJSComponentLoader) will need us before the pref service is ready (before profile-do-change notification). This would cause inconsistent behavior between clients. I added a commandline option, -dumpcaches, that causes removal of startup cache caches and forces autoregistration. Doing a make in /browser or /xulrunner does the same.

It would be nice if there was a way to make a pref like the one to disable the xul cache, but not sure how to resolve the timing issue.
Comment 53 Benjamin Smedberg [:bsmedberg] 2010-05-21 10:40:52 PDT
Comment on attachment 446107 [details] [diff] [review]
make in objdir/browser, adds CL option to invalidate cache

Is there a particular reason you only touch .dumpcaches in browser/app and xulrunner/app, rather than every make invocation?

I think you're using XRE_GetBinaryPath incorrectly here: we want to look for .dumpcaches in the application directory, not the XULRunner directory, in case they are different, so you almost certainly want gAppData->directory instead.

It's also not clear to me why you have to have flagFile->Remove in three different locations. Can't we just check for existence of that file up-front and remove it immediately?
Comment 54 Benedict Hsieh [:benedict] 2010-05-21 20:36:28 PDT
>Is there a particular reason you only touch .dumpcaches in browser/app and
>xulrunner/app, rather than every make invocation?
These are the makes that used to touch the .autoreg file, so I thought it would be reasonable to just do them there. Do we want this to happen with every make invocation? I'll probably have to find some implementation other than modifying every Makefile, in that case... (I'm just using the two Makefile.ins currently).

>I think you're using XRE_GetBinaryPath incorrectly here: we want to look for
>.dumpcaches in the application directory, not the XULRunner directory, in case
>they are different, so you almost certainly want gAppData->directory instead.
Ah okay, I will fix this in my next patch.

>It's also not clear to me why you have to have flagFile->Remove in three
>different locations. Can't we just check for existence of that file up-front
>and remove it immediately?
I was thinking about what would happen if the program was killed after the removal of the flag file but before we actually removed the cache files. I think this was part of the reason I rewrite the compat.ini file in those locations as well, although there were other reasons for that also. This is definitely a very edge case though, so do you think I should just move the Remove() up for the sake of cleanness?
Comment 55 Brendan Eich [:brendan] 2010-05-21 20:57:28 PDT
Super-nit, maybe not meaningful to everyone: "dump" in .dumpcaches means like a dumptruck at the dump, but it often also means "list" or "display" or "log". How about "purge"? Unambiguous!

/be
Comment 56 Robert Kaiser 2010-05-22 06:41:12 PDT
(In reply to comment #54)
> Do we want this to happen with every make invocation?

That may be reasonable, in which case it should be done in config.mk or rules.mk (not sure where it fits better, Benjamin or ted would know).
Comment 57 Benjamin Smedberg [:bsmedberg] 2010-05-23 08:07:54 PDT
Yes, every make invocation. You can probably tie it to the default target so that we don't do it recursively by default.

The reason we need to do this is because previously we checked timestamps for things in the fastload file and component files. Now we don't.
Comment 58 Benedict Hsieh [:benedict] 2010-05-25 13:16:00 PDT
Created attachment 447348 [details] [diff] [review]
fixes above comments

Changed name of flag file to .purgecaches, CL option to -purgecaches.

Fixed to use gAppData->directory and unified the flagFile->Removal()s to happen one place.

This can actually land without startupcache.
Comment 59 Benedict Hsieh [:benedict] 2010-05-25 15:01:44 PDT
Created attachment 447385 [details] [diff] [review]
make invalidation is debug-only, adds environment variable

There was some concern about doing a stat of the .purgecaches file in release builds, so that stat will now only happen in debug builds. (So doing an incremental make will not trigger cache invalidation in release builds.)

I also added environment variable MOZ_PURGE_CACHES, which does the same thing as the -purge-caches CL option. These are the two ways to trigger cache invalidation in a release build.
Comment 60 Robert Kaiser 2010-05-25 17:29:24 PDT
(In reply to comment #59)
> There was some concern about doing a stat of the .purgecaches file in release
> builds, so that stat will now only happen in debug builds. (So doing an
> incremental make will not trigger cache invalidation in release builds.)

So how does someone not doing any damn debug build (which is slow as hell usually) do reasonable add-on or application development? I understand why we want to save us from taking the toll of doing an additional stat in what we release to normal users, but it would be really awesome if people could develop add-ons or app UI in a reasonable manner (i.e. without manually cleaning random startup cache files on every run). Doing this debug-only does IMHO not fix this bug at all.
Comment 61 (dormant account) 2010-05-25 17:30:59 PDT
(In reply to comment #60)

> So how does someone not doing any damn debug build (which is slow as hell
> usually) do reasonable add-on or application development? I understand why we
> want to save us from taking the toll of doing an additional stat in what we
> release to normal users, but it would be really awesome if people could develop
> add-ons or app UI in a reasonable manner (i.e. without manually cleaning random
> startup cache files on every run). Doing this debug-only does IMHO not fix this
> bug at all.

Changed the firefox shortcut to do -purgecaches?
Comment 62 Robert Kaiser 2010-05-25 17:34:50 PDT
(In reply to comment #61)
> Changed the firefox shortcut to do -purgecaches?

Oh, sorry, right, forgot we're adding that option as well. That one perfectly should do it, as long as it work on opt/release builds! Sorry for the bugmail!
Comment 63 Ben Bucksch (:BenB) 2010-05-25 18:09:24 PDT
Please note, however, that not checking the .purgecaches file in opt builds will make the make file change useless, which means that |make| is still broken on release builds. I think that's reasonable, but 1) and 4) in comment 42 are the real solutions necessary for new XUL developers. This patch is a great step forward, but not a complete fix of this bug.
Comment 64 Benjamin Smedberg [:bsmedberg] 2010-05-25 18:26:30 PDT
#1 is not something I'm willing to fix
Comment 65 Ben Bucksch (:BenB) 2010-05-25 18:35:42 PDT
Can you give some numbers to justify? A stat() of 5-10 files is not significant. Esp. if 4) is also fixed, which would OTOH remove a considerable number of stat()s and open()s, so probably a net win, probably significantly.
Comment 66 Ben Bucksch (:BenB) 2010-05-25 19:00:42 PDT
Code:
  var start = new Date();
  let dir = profileDir();
  dir.append("foo");
  var i = 0;
  for (let entries = dir.directoryEntries; entries.hasMoreElements(); )
  {
    i++;
    let file = entries.getNext().QueryInterface(Ci.nsIFile);
    let mod = file.lastModifiedTime;
    //dump(file.path + " " + mod + "\n");
  }
  dump("Time to stat() " + i + " files: " + (new Date() - start) + "ms\n");
Output (Linux):
Time to stat() 11 files: 1ms
Time to stat() 11 files: 1ms
Time to stat() 11 files: 1ms
Time to stat() 10 files: 0ms
Time to stat() 10 files: 0ms
Time to stat() 10 files: 0ms
Time to stat() 10 files: 0ms
Time to stat() 11 files: 0ms
Time to stat() 11 files: 0ms
echo 7 > /proc/sys/vm/drop_caches
Time to stat() 11 files: 0ms
Comment 67 Benjamin Smedberg [:bsmedberg] 2010-05-25 19:04:46 PDT
On mobile devices a stat of small numbers of files is significant. But more importantly to me, it means that you need code which tracks which JAR files individual fastload items are loaded from, which is complex and error prone and not worth it.
Comment 68 Ben Bucksch (:BenB) 2010-05-26 03:10:55 PDT
> importantly to me, it means that you need code which tracks which JAR files
> individual fastload items are loaded from, which is complex and error prone and
> not worth it.

I don't understand. If any JAR file changes, you reload all caches which may contain items from JAR files, i.e. *.mfasl currently. It should only happen on installs and during development, i.e. when needed.
Comment 69 Brendan Eich [:brendan] 2010-05-26 08:07:19 PDT
We used to stat jar files, not items in jar files. That was lost. Ben is right: it is all that's needed.

It's kinda obvious to me (but maybe I'm mistaken) that mobile devices are a different configuration and front end -- which can and should turn off all of this desktop developer iterative stuff.

It would be faster to restore the lost stats than argue about it. What principle is at stake here that justifies this (don't get me wrong, there could be an imp. principle at stake justifying arguing over coding)?

The principle can't be that startup time needs to be insignificantly faster on desktops. The stat(2) system call was not meant to be avoided even at 10 or 11 calls per startup because it is sooo slow.

/be
Comment 70 Benjamin Smedberg [:bsmedberg] 2010-05-26 10:08:51 PDT
It's not about the stats! It's about the code which creates the fastload data: if any stats are required, it has to be able to walk backwards from whatever channel/buffer it currently has to the file from which that data came. That code is noticeably and undesirably complex, and shows up in the profile even without any stats.

Re-adding the per-item stats is not faster, and not acceptable.
Comment 71 Brendan Eich [:brendan] 2010-05-26 10:15:05 PDT
(In reply to comment #70)
> It's not about the stats! It's about the code which creates the fastload data:
> if any stats are required, it has to be able to walk backwards from whatever
> channel/buffer it currently has to the file from which that data came. That
> code is noticeably and undesirably complex, and shows up in the profile even
> without any stats.

I wrote v1 of the code. It was not complex of time-consuming. There was no walking backwards. Please cite the bug where this was measured.

> Re-adding the per-item stats is not faster, and not acceptable.

This is a red herring -- please drop it. You're the only one re-raising it.

/be
Comment 72 (dormant account) 2010-05-26 11:22:34 PDT
(In reply to comment #71)
> (In reply to comment #70)
> > It's not about the stats! It's about the code which creates the fastload data:
> > if any stats are required, it has to be able to walk backwards from whatever
> > channel/buffer it currently has to the file from which that data came. That
> > code is noticeably and undesirably complex, and shows up in the profile even
> > without any stats.
> 
> I wrote v1 of the code. It was not complex of time-consuming. There was no
> walking backwards. Please cite the bug where this was measured.

a) Stat()s are a slipper slope.

Fastload is one of the more (likely needlessly) complex parts of startup. I am not sure of "walking backwards", but there is high startup cost to stat()ing. It's not just a matter of stat()ing the application jars. That would be cheap since we are going to open it anyway.  The problem is that firefox ships too many files that are nakedly laying around on disk(ie not in jars). To make matters worse, extensions add their own naked files which leads to the problem of ever-slower startup. So instead of a fixed O(1) cost of checking in one place whether the cache needs to be invalided we have O(N) checks for N files that are fastloaded. Since any sort of disk access is much slower than cpu, this hurts a lot. 
stat()ing every single extension jar is bad enough, stat()ing files outside of jars is silly given that normal users will be paying the startup price even if they have no interest in extension dev.

Having these just laying around makes it tempting to modify them and expect firefox to respond to those changes. That's what I refereed to this as a psychological issue earlier. I think moving Firefox to a single-application-file model(ie omnijar in bug 552121) will address this problem. In the ideal world to modify stock firefox a dev would have to modify the omnijar, which would then be a cheap check to make. It's unfortunate that relying on the filesystem is such a liability.

If we ever want to not startup slowly on realistic profiles we need to move away from the O(N) cost to startup where N is the number of files shipped by firefox + added via extensions.

b) Ben's comment on stat()s being cheap.
You are measurements are wrong, your kernel obviously isn't dropping the cache(I know /proc method is currently broken on fedora). If you look at disk specs, expecting a disk to do anything in <10ms is unrealistic. In real life stat()s that result in disk io cost 20-40ms minimum.
> 
> > Re-adding the per-item stats is not faster, and not acceptable.
> 
> This is a red herring -- please drop it. You're the only one re-raising it.

No, he is speaking for number of people who have tried to make this code less of a startup hog. I don't have the bugno handy but if my memory serves me right benh shaved off ~10% of startup on OSX by getting rid of typically pointless stat()s. Users shouldn't pay that so that developers can avoid passing a single commandline flag.
Comment 73 Ben Bucksch (:BenB) 2010-05-26 11:27:33 PDT
I don't see what's complex here. We stat() the JAR files (which you say is no problem). We take the most recent last modification time of them. We stat() the XUL/XPC.mfasl file(s). If the mfasl is older than the newest JAR file, we need to re-generate it mfasl.
Alternatively, you can write a line into the mfasl file with the last mod time of the newest JAR file at the time of mfasl generation, and when reading the mfasl file, compare that (instead of stat()ing the mfasl file) with the current last mod time of the JAR files.
Comment 74 (dormant account) 2010-05-26 11:30:02 PDT
(In reply to comment #73)
> I don't see what's complex here. We stat() the JAR files (which you say is no
> problem). We take the most recent last modification time of them. We stat() the
> XUL/XPC.mfasl file(s). If the mfasl is older than the newest JAR file, we need
> to re-generate it mfasl.
> Alternatively, you can write a line into the mfasl file with the last mod time
> of the newest JAR file at the time of mfasl generation, and when reading the
> mfasl file, compare that (instead of stat()ing the mfasl file) with the current
> last mod time of the JAR files.

The problem is that what you are asking for is stat()ing files on disk, not jar files. Not sure how jars even got dragged into this.
Comment 75 Ben Bucksch (:BenB) 2010-05-26 11:37:58 PDT
b) Ben's comment on stat()s being cheap.
> You are measurements are wrong, your kernel obviously isn't dropping the
> cache(I know /proc method is currently broken on fedora).

It is. I made other tests, and the numbers are dramatically different after
echo 7 > /proc/sys/vm/drop_caches
Your mileage may vary, but that's what I got and actually measured.

Also, please note that this is after Firefox started. At the time we do the stat(), the OS has already read a number of files from these dirs, so these dirs are likely to be in cache when we stat().

> Not sure how jars even got dragged into this.

I mean a file on disk with .jar extension.
I am talking about the following files:
firefox/chrome/
browser.jar
en-US.jar
pippki.jar
toolkit.jar
Plus typically one JAR file per extension. That's our whole chrome (excluding JS modules and JS components).
Comment 76 (dormant account) 2010-05-26 11:54:25 PDT
(In reply to comment #75)
> b) Ben's comment on stat()s being cheap.
> > You are measurements are wrong, your kernel obviously isn't dropping the
> > cache(I know /proc method is currently broken on fedora).
> 
> It is. I made other tests, and the numbers are dramatically different after
> echo 7 > /proc/sys/vm/drop_caches
> Your mileage may vary, but that's what I got and actually measured.
> 
> Also, please note that this is after Firefox started. At the time we do the
> stat(), the OS has already read a number of files from these dirs, so these
> dirs are likely to be in cache when we stat().

I'm not exactly sure what you are benchmarking here, but it's isn't raw cold stat()s. It may well be that some filesystems cache dirs more aggressively than others, but we can't rely on that. It could also be that some other piece of the code did a stat(), we'll be getting rid of those if we spot em :)

> 
> > Not sure how jars even got dragged into this.
> 
> I mean a file on disk with .jar extension.
> I am talking about the following files:
> firefox/chrome/
> browser.jar
> en-US.jar
> pippki.jar
> toolkit.jar
> Plus typically one JAR file per extension. That's our whole chrome (excluding
> JS modules and JS components).

Aside: pippki.jar shouldn't even be a problem as I've never seen it used.

Sounds like you are ok with no stat()ing files compromise now. Earlier you said that most of your problems are due to components, now you want jars stat()ed.

Currently jars are initialized after prefs are loaded, so disable_xul_fastload pref should work for files loaded from jars you listed. Is that not enough? 

Once we move components into a jar, we can then stat() that particular jar since components are loaded before prefs. Alternatively, I filed bug 562406 which if implemented would only store components in binary form so this whole idea of a component cache would go away(but I still haven't investigated the feasibility of that). Is that reasonable? If the components parts of this bug could be resolved by bumping up the priority on moving .js stuff to jars.
Comment 77 Nickolay_Ponomarev 2010-05-26 12:36:25 PDT
> Sounds like you are ok with no stat()ing files compromise now. Earlier you said
> that most of your problems are due to components, now you want jars stat()ed.

We're going in circles. Trying to clarify, I hope I don't contradict BenB with this comment:

1) Any files that are cached must be stat()-ed to ensure cache freshness (comment 15)
2) To help performance there should be few files to stat() (omnijar) in production.
3) To simplify development, release builds should support flat file layout (chrome, components, modules, etc.). (And stat every one of them.)
4) Any change in any source file (any JAR in production, any flat file in dev.) can invalidate all of the cache, no need to track dependencies (comment 68-69)
Comment 78 Brendan Eich [:brendan] 2010-05-26 12:38:45 PDT
(In reply to comment #73)
> I don't see what's complex here. We stat() the JAR files (which you say is no
> problem). We take the most recent last modification time of them. We stat() the
> XUL/XPC.mfasl file(s). If the mfasl is older than the newest JAR file, we need
> to re-generate it mfasl.
> Alternatively, you can write a line into the mfasl file with the last mod time
> of the newest JAR file at the time of mfasl generation, and when reading the
> mfasl file, compare that (instead of stat()ing the mfasl file) with the current
> last mod time of the JAR files.

This all assumes time is monotonic; it isn't in general. The original design I settled on after some testing back in the day used saved mtimes and checked for any difference. It did not try to stat archive members, though -- whole .jar file mtime only.

I remember benh fixing bugs, but no stat profile or timing data. I'd still like someone to cough up the bug number to review that data.

If we can bound the stats to a handful and show them not to cost significant time, the slippery slope argument is not particularly strong. We have slippery slopes all over, and Ts metrics to keep from sliding further. The real wins are still hiding in costs we pay at startup, as Taras knows well (Firefox Mobile on a desktop starts significantly faster than Firefox, today -- nothing to do with stat(2)).

Why does this matter? Here's a reductio ad absurdum: we won't switch from edit/(shift-)reload/debug cycle for XUL devs to "you must use Eclipse" or "you must run the Flex compiler" first. That would suck.

We know there's a line not to cross, we're simply arguing about where it lies. I'm still on the edit/(shift-)reload/debug side, but extra envars or cmd line parameters might not be fatal. There'll be a learning curve; some will fail to learn and give up after bizarre behavior. It seems worth considering some stat cost, if it is in the noise as measured on cold startup.

/be
Comment 79 Ben Bucksch (:BenB) 2010-05-26 12:44:50 PDT
> This all assumes time is monotonic; it isn't in general.

Sounds funny, but you're right, when considering wrong clocks, bad filesystems, remote filesystems etc.

> The original design I settled on after some testing back in the day
> used saved mtimes and checked for any difference.

Yup, that makes sense, I ended up doing the same in roaming/syncing.
That's actually pretty exactly what my "Alternative" proposal goes to.
(Just save all last mod times, not just the newest, fine.)
Comment 80 Benedict Hsieh [:benedict] 2010-05-26 14:02:42 PDT
https://bugzilla.mozilla.org/show_bug.cgi?id=510991#c8 is the result of landing both bug 511761 and bug 510991 together. We believe that perf wins from stat reductions are more than the sum of their parts, so either of those patches applied individually show much smaller wins. (Because of stat caching, killing ALL of the stats of some file is necessary to show a significant win.)

I only just now noticed Nickolay's comment on documentation in bug 511761, so once this patch lands I'll go through and update the docs he pointed to. Definitely would appreciate pointers to other places we need to update documentation as well, to make the learning curve as easy as possible.
Comment 81 (dormant account) 2010-05-26 14:09:38 PDT
(In reply to comment #78)
> (In reply to comment #73)
> This all assumes time is monotonic; it isn't in general. The original design I
> settled on after some testing back in the day used saved mtimes and checked for
> any difference. It did not try to stat archive members, though -- whole .jar
> file mtime only.

stat()-counts are a good indicator of overhead, counting potentially-expensive events is the best way to keep track of cold startup. I filed bug 559663 to productize this notion.

> 
> If we can bound the stats to a handful and show them not to cost significant
> time, the slippery slope argument is not particularly strong. We have slippery
> slopes all over, and Ts metrics to keep from sliding further. The real wins are
> still hiding in costs we pay at startup, as Taras knows well (Firefox Mobile on
> a desktop starts significantly faster than Firefox, today -- nothing to do with
> stat(2)).

You just optimized away my slippery slope argument with your giant if :)

> 
> Why does this matter? Here's a reductio ad absurdum: we won't switch from
> edit/(shift-)reload/debug cycle for XUL devs to "you must use Eclipse" or "you
> must run the Flex compiler" first. That would suck.

All I'm saying is that when given a choice between the number of users affected by slower startup and developers inconvenienced by an extra cmdline flag/pref I am firmly on the user's side.
 
> We know there's a line not to cross, we're simply arguing about where it lies.
> I'm still on the edit/(shift-)reload/debug side, but extra envars or cmd line
> parameters might not be fatal. There'll be a learning curve; some will fail to
> learn and give up after bizarre behavior. It seems worth considering some stat
> cost, if it is in the noise as measured on cold startup.

I would consider this if it does not regress our cold startup gains. Here is my proposal:
a) Move fastloaded application stuff into jars.
b) Allow checking of jar timestaps
c) Police extensions such that extension developers are aware of performance perils of naked files. bug 551714

I think BenB agrees with this plan of attack. This matches all of Nickolay's concerns except for 1) when extensions introduce non-jarred fastloaded files. I think once AMO has a no-raw files policy this can be relaxed too.

Does this sound reasonable?
Comment 82 Brendan Eich [:brendan] 2010-05-26 14:17:33 PDT
(In reply to comment #81)
> You just optimized away my slippery slope argument with your giant if :)

Why would we do otherwise? Because we once had stat storms is no reason to never use stat. We have had lots of bugs, many now fixed, but their infection surface, the code or pattern they inhabited, lives on. A bad pattern should be removed, of course, but stat-based dependency tracking (make) is not on its face a "bad pattern".

> > Why does this matter? Here's a reductio ad absurdum: we won't switch from
> > edit/(shift-)reload/debug cycle for XUL devs to "you must use Eclipse" or "you
> > must run the Flex compiler" first. That would suck.
> 
> All I'm saying is that when given a choice between the number of users affected
> by slower startup and developers inconvenienced by an extra cmdline flag/pref I
> am firmly on the user's side.

And (for the last time) I'm suggesting this is a false dilemma if there are only ~10 stats.

If a stat-storm is inevitable, I'd like to know why. I'd side with users too, but unlike my big "if", it seems to me you've concluded something that is not yet in evidence.

/be
Comment 83 (dormant account) 2010-05-26 14:26:29 PDT
(In reply to comment #82)
> (In reply to comment #81)
> > You just optimized away my slippery slope argument with your giant if :)
> 
> Why would we do otherwise? Because we once had stat storms is no reason to
> never use stat. We have had lots of bugs, many now fixed, but their infection
> surface, the code or pattern they inhabited, lives on. A bad pattern should be
> removed, of course, but stat-based dependency tracking (make) is not on its
> face a "bad pattern".

It's a terrible pattern when you consider cold cache performance implications of it. This is getting off topic.

> 
> > > Why does this matter? Here's a reductio ad absurdum: we won't switch from
> > > edit/(shift-)reload/debug cycle for XUL devs to "you must use Eclipse" or "you
> > > must run the Flex compiler" first. That would suck.
> > 
> > All I'm saying is that when given a choice between the number of users affected
> > by slower startup and developers inconvenienced by an extra cmdline flag/pref I
> > am firmly on the user's side.
> 
> And (for the last time) I'm suggesting this is a false dilemma if there are
> only ~10 stats.

This is a misconception. There are not 10 stat()s and we aren't just stat()ing jar files.

There are 110 modules/components that would've been stat()ed multiple times if it were not for benh's work. So at minimum the overhead would be 11x higher than ~10 stats.

> 
> If a stat-storm is inevitable, I'd like to know why. I'd side with users too,
> but unlike my big "if", it seems to me you've concluded something that is not
> yet in evidence.

I think stat()ing every module+component can be described as a stat()-storm.

Taras
Comment 84 Brendan Eich [:brendan] 2010-05-26 14:28:13 PDT
Taras, sorry -- I didn't mean to sound so fight-y. No one wants 110 stats, so I agree on the plan of attack you outline in comment 81.

/be
Comment 85 Ben Bucksch (:BenB) 2010-05-26 14:45:43 PDT
Taras, nobody is suggesting that we stat() every JS component and JS module. See comment 42. To clarify, I suggest:
1) Introduce commandline flag -purgecaches, as the last patch does.
   Cost for users: minimal
   Cost for developers: minimal. Easy to do for developers aware of the
   problem, but still an extra thing to do and what they need to *know* to do.
   |make|, at least in opt builds, is still broken.
   Benefit for developers: Cache is sure to work (not get in the way),
   if you use the flag.
2) stat() all chrome listed in .manifest files.
   That currently is 4 files for Firefox, plus typically 1 file per extension.
   Cost for users: Around 1ms.
   Cost for developers: None
   Benefit for developers: All XUL and chrome-JS just works. (Cache does
   not get into the way, without having to do anything.)
3) Later: Allow JS modules (Components.utils.import) to live in chrome JAR
   files (bug 406538).
   Cost for users: None. Actually, startup speed should *increase* notably,
   because that's a lot less stat() and open()s.
   Cost for developers: Small. JS modules work just like chrome, including
   URLs/manifest/location/build. One-time cost to move all JS module files
   to a different source code location and load URL.
   Benefit for developers: Together with 2), all JS modules just work.
   (Cache does not get into the way, without having to do anything.)
   Also, JS module files can have a much more logical source code location.
4) Later: Allow JS components to live in JAR files (bug 562406 or similar).
   Cost/Benefit: similar to 3)

Taras, sounds like a plan?
Comment 86 Ben Bucksch (:BenB) 2010-05-26 15:17:59 PDT
As for stat() cost: Taras was right in one way: When posting the numbers, I forgot that I use an Intel SSD, so it's fast. Numbers with HDDs are much worse. However, I was also right that the OS gets the relevant dirs (chrome/, particularly) in cache just by us starting up, so the stat() cost on them is small. The numbers:
- echo 7 > /proc/sys/vm/drop_caches
- App start (Firefox as XULRunner)
- Time to stat() 10 files in chrome/ ("AChrom") on HDD: 2ms
- Time to stat() 6 files in other dir on HDD: 35ms
- Time to stat() 10 files in chrome/ ("AChrom") on HDD: 2ms
- Time to stat() 6 files in other dir on HDD: 1ms
- echo 7 > /proc/sys/vm/drop_caches (not restarting app)
- Time to stat() 10 files in chrome/ ("AChrom") on HDD: 48ms
- Time to stat() 6 files in other dir on HDD: 35ms
- Time to stat() 10 files in chrome/ ("AChrom") on HDD: 2ms
- Time to stat() 6 files in other dir on HDD: 3ms
- Time to stat() 10 files in chrome/ ("AChrom") on HDD: 2ms
- Time to stat() 6 files in other dir on HDD: 1ms
- Time to stat() 10 files in chrome/ ("AChrom") on HDD: 2ms
- Time to stat() 6 files in other dir on HDD: 1ms
- Time to stat() 10 files in chrome/ ("AChrom") on HDD: 2ms
- Time to stat() 6 files in other dir on HDD: 1ms

So, the cost to stat() the .jar files in chrome/ is 2ms for users.
Comment 87 Ben Bucksch (:BenB) 2010-05-26 15:21:11 PDT
System: Athlon X2, 4 GB RAM, Ubuntu 10.04, Ubuntu kernel, Firefox trunk, Seagate HDD 320 GB. Other systems and esp. OS may vary a lot of course. Feel free to test yourself, with the code in comment 66, with
let dir = Components.classes["@mozilla.org/file/directory_service;1"]
               .getService(Components.interfaces.nsIProperties)
               .get("AChrom", Ci.nsIFile);
Comment 88 Nickolay_Ponomarev 2010-05-28 14:42:27 PDT
Please don't ignore comment 85. IMHO it's the most sensible solution to the problem, satisfying all of my items in comment 77 without introducing extra stat()s for the users.
Comment 89 (dormant account) 2010-05-28 14:54:39 PDT
(In reply to comment #88)
> Please don't ignore comment 85. IMHO it's the most sensible solution to the
> problem, satisfying all of my items in comment 77 without introducing extra
> stat()s for the users.

Sorry forgot to respond. I agree. Will look into addressing those soon.
Comment 91 Alexander L. Slovesnik 2010-06-13 12:17:40 PDT
Could this patch cause Bug 571771?
Comment 92 Armen Zambrano [:armenzg] (EDT/UTC-4) 2010-06-14 07:10:32 PDT
(In reply to comment #90)
> http://hg.mozilla.org/mozilla-central/rev/f7323647892a

This change has caused a regression on L10n repackages for mozilla-central.
We currently don't have L10n repackages since this landed.

> make
>  in dir /builds/moz2_slave/mozilla-central-linux-l10n-nightly/build/mozilla-central/config (timeout 1200 secs)
>  
> touch ../dist/bin/.purgecaches
> touch: cannot touch `../dist/bin/.purgecaches': No such file or directory
> make: *** [default] Error 1
> program finished with exit code 2
> elapsedTime=0.038792
Comment 93 Benedict Hsieh [:benedict] 2010-06-14 11:20:47 PDT
http://hg.mozilla.org/mozilla-central/rev/e64c04523139

I asked khuey to back out for me, I'll try to fix and get relanded. Might wait until bsmedberg is back from vacation, though.
Comment 94 Jonathan Protzenko [:protz] 2010-06-16 05:19:58 PDT
I just bumped into this bug, because I didn't understand why Thunderbird 3.2 did not pick up changes I made into .jsm files. I'm very supportive of comment #0, and I feel compelled to insist on the fact that when starting with XUL, it's a major pain to go through all MDC to find out you have to set some obscure prefs just to get your files to update properly. BTW, nglayout.debug.disable_xul_fastload is not even listed here https://developer.mozilla.org/en/Setting_up_extension_development_environment...

So, to say the least, I'd be very glad if there was a way for this to work out of the box :).
Comment 95 Ben Bucksch (:BenB) 2010-06-17 15:29:02 PDT
Comment 85:
> 3) Later: Allow JS modules (Components.utils.import) to live in chrome JAR
>    files (bug 406538).

mwu just fixed this. Weee!
Comment 96 Ben Bucksch (:BenB) 2010-06-18 01:57:47 PDT
> 4) Later: Allow JS components to live in JAR files (bug 562406 or similar).
>    Cost/Benefit: similar to 3)

Apparently, mwu fixed that, too.

Now, we'd need to actually put the JS modules and components into JARs. Seems like he's on that, too, by allowing resource: to be in the JAR, and allowing components to be loaded from the JAR.

In fact, his goal is for almost everything to be in one JAR file. If that's the case, all we need to do to fix this bug here is to make a very small number of stat()s on the JARs on startup.
Comment 97 (dormant account) 2010-06-18 09:38:47 PDT
(In reply to comment #96)
> > 4) Later: Allow JS components to live in JAR files (bug 562406 or similar).
> >    Cost/Benefit: similar to 3)
> 
> Apparently, mwu fixed that, too.
> 
> Now, we'd need to actually put the JS modules and components into JARs. Seems
> like he's on that, too, by allowing resource: to be in the JAR, and allowing
> components to be loaded from the JAR.
> 
> In fact, his goal is for almost everything to be in one JAR file. If that's the
> case, all we need to do to fix this bug here is to make a very small number of
> stat()s on the JARs on startup.

Wu rocks. Once desktop omnijar(blocker on this bug) lands, should be pretty easy to close this.
Comment 98 Benedict Hsieh [:benedict] 2010-06-22 16:29:43 PDT
Created attachment 453230 [details] [diff] [review]
make invalidation is debug-only, adds environment variable, fixes special build environments

Should fix issuing makes where ($dist)/bin isn't accessible for whatever reason, (bug 571771). Going to submit to try-server for a test run, but unfortunately I think tryserver doesn't test the l10n packaging anyways. Will ask for review if try comes back ok.
Comment 99 Benedict Hsieh [:benedict] 2010-06-22 17:34:28 PDT
Created attachment 453248 [details] [diff] [review]
453230: make invalidation is debug-only, adds environment variable, fixes special build environments

Edge-case: previous patch didn't touch the file on top-level make, this one does.
Comment 100 Nickolay_Ponomarev 2010-06-29 12:56:54 PDT
> Wu rocks. Once desktop omnijar(blocker on this bug) lands, should be
> pretty easy to close this.

Could the order be reversed please? I think this is really hurting developers trying out the latest builds. Here's the latest example: http://groups.google.com/group/mozilla.dev.extensions/browse_frm/thread/5cf8d1201420d902/47ea3bedd1eb90fe?hl=en&tvc=1&q=New+Add-Manager#47ea3bedd1eb90fe
Comment 101 Benedict Hsieh [:benedict] 2010-07-02 11:47:38 PDT
Created attachment 455732 [details] [diff] [review]
unbitrotted, make invalidation is debug-only, adds environment variable, fixes special build environments

Unbitrotted, I'll look for someone to push in a few hours. 

Nickolay or others, do you have any recommendations on where to add documentation for the -purgecaches flag, as well as reminding people about the nglayout.debug.disable_xul_cache pref?
Comment 102 Nickolay_Ponomarev 2010-07-02 14:21:37 PDT
See bug 511761 comment 54. I still think it would be much better to just continue stat'ing files and reduce the number of stat()s by fixing Firefox packaging (omnijar) and forcing extensions on AMO to use JARs for everything.
Comment 103 Brendan Eich [:brendan] 2010-07-02 17:33:14 PDT
From comment 95, comment 96, and comment 97, I too thought mwu fixed everything so we can do ~4 stats max, always. Not true?

/be
Comment 104 Dão Gottwald [:dao] 2010-07-03 01:11:41 PDT
dwitte landed this: http://hg.mozilla.org/mozilla-central/rev/4779f720159d
Comment 105 Benedict Hsieh [:benedict] 2010-07-13 11:57:37 PDT
Brendan, desktop omnijar (blocker for remaining work in this bug) is not done yet.

Should I get my patch landed on a branch?
Comment 106 Brendan Eich [:brendan] 2010-07-13 12:23:16 PDT
What's the schedule for omnijar?

We have blocking bugs all the time, any reason this can't wait for the omnijar blocker? Is it a patch-rot hazard situation?

/be
Comment 107 Nils Maier [:nmaier] 2010-07-15 01:11:51 PDT
Created attachment 457512 [details]
Workaround from comment 46 as installable toolkit extension

Since landing this might take some time and deleting files manually again and again is annoying:
Here is a minimal toolkit extension implementing the workaround from comment 46.

Since I'm commenting anyway:
I agree with all the folks saying that implementing this stuff this particular way just feels wrong. It will certainly affect lots of extension developers.
Comment 108 Brendan Eich [:brendan] 2010-07-15 08:06:06 PDT
Nils, are you referring to the temporary workaround(s) required until mwu's omnijar work lands (at which point we can have a few stats to automate the invalidation process for extension developers, and avoid any workarounds)? I am definitely supportive of mwu's work, stat-based automatic invalidation, etc. Is this still not the final fix everyone is anticipating?

If so, then is the workaround problem only temporary, and only for nightly build users? Could still be bad, I'm not minimizing its effects -- just making sure I am following the facts here.

/be
Comment 109 Nils Maier [:nmaier] 2010-07-15 09:23:14 PDT
Brendan,
yes, it is meant as a workaround until *something* that re-introduces stat'ing of at least the extension files gets landed. And for releases where not even purgecaches is present, i.e. the FX 4.0b1 release (yep, the only way to invalidate stuff is to delete files there, fiddle with compatiblity.ini or use the workaround my extension implements).

AMO began notifying people about 4.0b1, saying that it might be a good time to start updating their addons... purgecaches (or the lack of it in b1) actually hinders sane addon development here. (I just started a discussion about that on amo-editors)
I figure some people will be discouraged to use javascript modules, because "they don't work" and revert back to putting everything in overlay scripts.
Others will think that there is some bug (and there is: this one) and postpone updating their extension until it is fixed. Actually I was one of them, postponing updating my stuff for some month because I was waiting for a real fix.
What makes this even worse is that the purgecaches documentation is not very "discoverable" at this point (and doesn't work with 4.0b1)

Actually, at this point, I'm not that confident that purgecaches and bug 511761 get backed out again once onmijar is ready => or why else is that stuff still present after more than 6 months of messing around with workarounds of all kinds?
Please, back it out already and wait for omnijar to fix stuff.
Comment 110 Benjamin Smedberg [:bsmedberg] 2010-07-15 09:58:05 PDT
omnijar will help reduce stats on Firefox JARs, but it will not help at all with reducing stats on extension JARs, which is equally important during startup. And I firmly object to doing the tracking (URI to JARfile) which was removed in bug 511761: it will not be backed out as is.

We could certainly add a "slow mode" to the extension manager which stats extension files on startup (or maybe it would be sufficient to stat extension directories, which change mtime when any of their files change), and blows away the startup/fastload caches in that case.
Comment 111 Dave Townsend [:mossop] 2010-07-15 10:10:32 PDT
(In reply to comment #110)
> We could certainly add a "slow mode" to the extension manager which stats
> extension files on startup (or maybe it would be sufficient to stat extension
> directories, which change mtime when any of their files change), and blows away
> the startup/fastload caches in that case.

Note that we already stat extension's directories on startup and if their mtime has changed we consider it a re-install/upgrade and flush the caches. This is not reliable as some filesystems do not alter the directory's mtime when files change (FAT for example).
Comment 112 Brian Marshall 2010-07-26 02:05:52 PDT
(In reply to comment #111)
> Note that we already stat extension's directories on startup and if their mtime
> has changed we consider it a re-install/upgrade and flush the caches. This is
> not reliable as some filesystems do not alter the directory's mtime when files
> change (FAT for example).

And on ext3 at least, file changes only update the mtime of their immediate directory, not any parent directories. So for extension development, unless you edit chrome.manifest, install.rdf or some other file in the root directory, it's not going to trigger a cache update.
Comment 113 Nickolay_Ponomarev 2010-08-14 14:19:17 PDT
Feedback from the newsgroups about the -purgecaches flag:
> The "-purgecaches" option doesn't work when I restart the browser using
> the QuickRestart add-on. It only works when I manually restart the browser.
> It was very annoying. I first thought that "-purgecaches" option wasn't
> working.
http://groups.google.com/group/mozilla.dev.extensions/browse_thread/thread/116702878c962a64/d72bc1260f1f6b97?lnk=raot&pli=1

If we don't get stats back, what would be the recommended way to clear caches and restart in one step (from chrome JS)?
Comment 114 Dão Gottwald [:dao] 2010-09-01 05:00:46 PDT
What's the status of this bug? This keeps wasting developer resources (bug 520309 comment 129).
Comment 115 Benedict Hsieh [:benedict] 2010-09-01 10:51:40 PDT
We're working on a fix. The idea is to get jarred extensions (bug 533038) in place, then we can stat the things we need to in the extensions directory without losing too much on startup performance.
Comment 116 Benjamin Smedberg [:bsmedberg] 2010-09-16 14:11:49 PDT
The build-tree part of this bug landed, marking FIXED. The other part (extension stuff) is now in bug 594058.
Comment 117 Ted Mielczarek [:ted.mielczarek] 2011-02-01 05:06:38 PST
Did the environment variable / command line flag that were added here ever get documented anywhere?
Comment 118 Benedict Hsieh [:benedict] 2011-02-01 09:47:53 PST
I confess that I still have no idea where things like this ought to be documented. Someone other than myself has helpfully added notes here:
https://developer.mozilla.org/en/setting_up_extension_development_environment
https://developer.mozilla.org/En/Updating_extensions_for_Firefox_4

But I'd be happy to document elsewhere as well if you'd point me to the appropriate location.
Comment 119 Benedict Hsieh [:benedict] 2011-02-01 09:51:09 PST
Also, the hope is that the flag/envvar aren't needed by the majority of developers anymore, since we turned statting back on (bug 594058).
Comment 120 Michael Kraft [:morac] 2011-02-01 10:31:16 PST
(In reply to comment #119)
> Also, the hope is that the flag/envvar aren't needed by the majority of
> developers anymore, since we turned statting back on (bug 594058).

I just tested and it correctly picked up changes in my development directory without the flag.
Comment 121 Ben Bucksch (:BenB) 2011-02-01 10:35:23 PST
Benedict Hsieh, MDC setting_up_extension_development_environment is a very good place for documenting this. mozjonathan added the commandline flag, eric jung mentioned nglayout.debug.disable_xul_fastload and this bug here, in 11/12 june 2010.

Thanks a lot to everybody who helped with this bug and bug 594058, or made it possible!

Note You need to log in before you can comment on or make changes to this bug.