Closed Bug 40848 (thread) Opened 24 years ago Closed 17 years ago

each docshell should run on its own thread (one thread per frame)

Categories

(Core :: XUL, enhancement, P3)

enhancement

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: bmh_ca, Assigned: brendan)

References

Details

(Keywords: arch, highrisk, perf, Whiteboard: [Hixie-3.0] p-iewin)

There would be substantial improvement in the quality of the UI if the general
UI and the geck UI (as in web pages) were threaded.  This would also protecta
against locked pages.

It would also be helpful if downloads could be through, say, wget or equivalent
external program (very *nix'ish), particularly for long downloads or unstable
connections where a program like wget is sensible.
I don't think we're going to be able to consider such changes for a while.
targetting for 'future', accepting.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Target Milestone: --- → Future
Blocks: 26502
This has been discussed in the past. It would be Very Very Hard (tm) but would be
a help all round for UI responsiveness and general goodness and so on. (Ideally
we could maybe even catch crashes on a per-thread basis so that we don't crash the
entire app when one <frame> dies...)

I think this should be a Mozilla 3.0 blocker. ;-)
No longer blocks: 26502
Severity: trivial → enhancement
Keywords: perf
Summary: thread UI and gecko interfaces → each docshell should run on its own thread
Whiteboard: [Hixie-3.0]
Keywords: arch, highrisk
Summary: each docshell should run on its own thread → each docshell should run on its own thread (one thread per frame)
Keywords: 4xp
Whiteboard: [Hixie-3.0] → [Hixie-3.0] p-iewin
Reassigning to someone who is slightly more likely to eventually fix this.
Assignee: trudelle → hyatt
Status: ASSIGNED → NEW
QA Contact: jrgm → shrir
*** Bug 152691 has been marked as a duplicate of this bug. ***
bug 67752 comment 21 from Robert O'Callahan:
> ...there is one docshell for the chrome and another for content, so if you 
> could have seperate UI threads for those then you would have a responsive UI 
> while reflowing. Unfortunately we'll never be able to have per-docshell threads
> because there's a lot of communication between docshells in the DOM, layout and
> views.

bug 67752 comment 23 from Ian 'Hixie' Hickson:
> The fact that the threads communicate should not be a problem, messages being 
> posted from one thread to another (e.g. status updates) can be buffered until 
> the target thread is ready to poll for them (e.g. using a message pool thread).
> I can't really think of any inter-docshell message which needs to be 
> synchronous/blocking, but then I don't really know the architecture that well.

bug 67752 comment 24 from Robert O'Callahan:
> 1) Painting

Why would painting require synchronous inter-docshell communication? The only
case I can think of is when the docshell is transparent, and that's actually
quite rare (I thought most documents were opaque). I guess it would be a problem
in those cases though, since you really don't want painting to be blocked. But
isn't that a problem anyway? I mean, Windows can post WM_PAINT events to a
window at any time, it doesn't block for them. If that works, we could do
similar things for painting transparent docshells -- post a paint event to the
parent docshell and ask it to notify the child docshell again once it has painted.


> 2) Inter-doc DOM calls

These should be relatively rare.


> 3) Event handling

Should this really be synchronous across docshells? Wouldn't you just pass the
event up and wait for it to come back?


bug 67752 comment 25 from Brendan Eich:
> Our chrome gets lots and lots of DOM property values from content. Ask caillon,
> or study up before you make bold claims about how easy asynchronous messaging 
> is as a solution for docshells in threads needing to communicate.

Well, if you think it is too hard then fair enough. I'd be the first to admit
that I know very little about docshells and what they say to each other.
> Why would painting require synchronous inter-docshell communication?

The most common case is when you have one docshell with content that overlaps
some contained docshell, e.g. a document containing an IFRAME and a DIV where
the DIV overlaps the IFRAME. The DIV may be transparent of course. The way we
handle this is that when we paint the contained docshell's widget, it scans the
entire view tree (subject to optimizations, of course) to see if any parent or
sibling content needs to be painted on top of itself.

[DOM calls]
> These should be relatively rare.

We'd still have to have a scheme for handling them. Note that tooltips and
context menus (at least) depend on DOM calls from the chrome into the content
document.

> Should this really be synchronous across docshells?

There's a problem with complex documents which overlap, basically the same
problem as for painting.
Oh well. :-(
Alias: thread
Would it make sense to have a thread per tab/window rather than a thread per
frame?  That would take care of the transparency problem, at least...
*** Bug 228374 has been marked as a duplicate of this bug. ***
(In reply to comment #8)
> Would it make sense to have a thread per tab/window rather than a thread per
> frame?  That would take care of the transparency problem, at least...

I like the idea of a thread per tab.
I may be wrong, but i think this would certainly help the issue of onoe slow
page totalling the UI until it is part loaded.
Just something that i feel should interest people is this comment regarding IE
7's tab design. From http://blogs.msdn.com/ie/archive/2005/05/26/422103.aspx


"One design decision worth calling out is that our current implementation is
fully multithreaded. Each tab is on a separate thread, and the frame is also on
its own thread. This has some impact on the overall footprint of IE, but we
believe this will allow IE7 to feel faster and provide an overall better user
experience. Internally this creates some additional complexity as we have to
deal with a lot of cross-thread communication, but it also gives us a way to do
things we wouldn’t otherwise be able to do with a single-threaded approach."

I think this give a really good feason to fix what I and I am guessing many
others see as a weak point of firefox before the competition gets out with a
version of their browser that works in this way.
Unless you are going to work on this, or have a constructive technical comment,
do not add to this bug.

Here's a free clue: we are not going to have this bug fixed before the likely
beta release of IE7.  Will that matter to most end users?  Of course not.

Taking this bug, so it can get some technical attention.  Don't spam it, or I'll
do something drastic.

/be
Assignee: hyatt → brendan
(In reply to comment #12)
> Unless you are going to work on this, or have a constructive technical comment,
> do not add to this bug.
I was trying to point out that this is something that 'needs' fixing.
I would happily have a go at fixing it, but since my multithreaded C is at best
sh*te I really didn't figure it was a good idea.

I do however feel that you are wrong regarding users not caring.
When a users browser appears to hang when loading a slow page the users I know
curse the and often kill it before it has recoved.

Anyway thats my 2p's worth, so I'll leave it there.
Indeed, most users do very much care about this, except that they don't know it - they're just cursing their browsers for slowing down / blocking when one of the tabs is hogging the CPU. YMMV on this assumption though.
*** Bug 358905 has been marked as a duplicate of this bug. ***
Blocks: 30942
Any progress on this? Could this get into firefox 3?
Would love to have this feature.
That's a good question, Ludootje. It's been just over seven years since I reported this bug.

It might look to the unwary eye as if nothing has been done on this bug. However, I'm certain a great many of the fine people working under the umbrella of Mozilla have considered the implications of a multi-process(or) universe, and how a project such as Firefox might, and ought to, benefit by taking advantage of modern multiple-processor hardware.

I hate to SPAM this list (one email every seven years is not so bad), but I think that in the age of multi-core processors, this bug ought to be addressed, one way or another. I suggest that what I have proposed (or the manner in which I have proposed it) may not be the best way to take advantage of multiple processes and/or threading, in which case this bug may be best marked WONTFIX. Perhaps the plethora of supported platforms precludes the viability of a fix, and again, it ought to be marked WONTFIX. However, it ought not to linger in limbo.

I suspect this is an especially complex problem, with a solution requiring either a great deal of brute force, nailing down an elegant design, or some combination thereof. Perhaps some brainstorming ought to go into how the Firefox/Mozilla et al platforms take advantage of multiple processors, through multi-threaded programming or otherwise.

In any event, it is the underlying premise of this bug that the user experience would be better by somehow taking advantage of threading and/or multiple processes. Concomitant and adjunct to that, this platform would significantly benefit from being able to take advantage of the extra processing power available in the now common multiple-core machines. As a result, this is a bug whose fix shall be of substantial benefit, and therefore a bug deserving attention.
This is not the best way to take advantage of multiple cores and has significant downsides. WONTFIX.
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → WONTFIX
... but this bug is not about taking advantage of multiple cores - that's just a useful side-effect. Read comment 0 again: It's not a performance bug essentially. While this is a complex problem, I believe threading is something Mozilla sorely misses, and I'm sure many many others share this view. Please don't mark WONTFIX just because "this is not the best way to take advantage of multiple cores".
Benjamin, kindly don't close my bugs for me. I was leaving this "attractive nuisance" around for a reason.

Brian: you can tell by reading my blog and roc's that we've thought about the many-core future:

http://weblogs.mozillazine.org/roadmap/archives/2007/02/threads_suck.html
http://weblogs.mozillazine.org/roc/archives/2005/12/night_of_the_living_threads.html
http://weblogs.mozillazine.org/roc/archives/2006/12/parallelism_1.html
http://weblogs.mozillazine.org/roc/archives/2007/05/performance_obs.html

Eyal: threads are a means to an end, they are *not* an end in themselves, especially not for "users". No one "misses" threads as an implementation technique; indeed Mozilla uses threads for several purposes and has for years.

Users and developers may miss lack of concurrency or hardware parallelism, but concurrency is not provided only by threading, and the parallelism has nothing to do with what this bug says "should" happen.

The many-core solution has nothing to do with docshells.

Comment 8 proposes a thread per tab, which is better than the misbegotten summary idea for this bug of one thread per docshell. But one thread per tab is all three of overkill, insufficient, and just wrong depending on the problem allegedly being solved:

- Tabs may reference one another's variables including DOMs using window.open, and any pair of communicating tabs must serialize all of their scripts to uphold the run-to-completion execution model that is part of the browser JS standard.

- Tabs usually finish layout quickly. Exceptions other than layout bugs to be fixed without threads include runaway scripts, which we police fairly well AFAICT, and runaway plugins.

- Runaway plugins need special treatment. See bug 156493 (and bug 230017, but it adds overhead without memory isolation).

I won't bother to repeat the main problems with threads mentioned in my blog post.

If we are to utilize hardware parallelism in the near term ignoring plugins, we should focus on the places where data is processed independently and intensively. Rendering disjoint regions of a pixel buffer or higher level "scene" is already being optimized using threads in game engines such as Valve's Steam engine:

https://www.cmpevents.com/GD07/a.asp?option=G&V=3&id=92357
https://www.cmpevents.com/sessions/GD/S3728i1.pdf

I'm going to re-open this bug, make a new non-nuisance meta-bug summarizing the separable issues, link other bugs as blockers of it, cite the new bug here, and *then* WONTFIX this bug. Benjamin, feel free to verify it at that point.

/be
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
(In reply to comment #23)
> Users and developers may miss lack of concurrency or hardware parallelism, but
> concurrency is not provided only by threading, and the

"lack of hardware"

> parallelism has nothing to do with what this bug says "should" happen.

/be
Status: REOPENED → ASSIGNED
New mega-meta-bug: Bug 384115.

/be
Status: ASSIGNED → RESOLVED
Closed: 17 years ago17 years ago
Resolution: --- → WONTFIX
(In reply to comment #23)
Ok, point taken about threads being means to an end, although I said 'threading' rather than 'the use of threads'; I guess I should have said 'concurrency in operation of the browser'.

> - Tabs may reference one another's variables including DOMs using window.open,
> and any pair of communicating tabs must serialize all of their scripts to
> uphold the run-to-completion execution model that is part of the browser JS
> standard.

To the extent that they reference each other's variables, then yeah, they should serialize, but this is rare AFAICT, and done on purpose when it's done, AFAIK, plus the main issue that concurrency or use of threads (say per tab) would solve is the ability to use with unrelated tabs without the work being done in one of them blocking you from interacting with the other one.

> - Tabs usually finish layout quickly. Exceptions other than layout bugs to be
> fixed without threads include runaway scripts, which we police fairly well
> AFAICT, and runaway plugins.
> - Runaway plugins need special treatment. See bug 156493 (and bug 230017, but
> it adds overhead without memory isolation).

'Quickly' is a relative term. Plus, sometimes plugins aren't 'runaway' but just take their time, like initializing Java.

Finally, remember that many bugs remain unresolved for many years and some new ones surface (especially when it comes to plugins), so the situation in which a tab-related hang forces you to close the browser altogether is something I would like to see solved rather than supposed-to-be-avoided-eventually - and it seems the use of threads is a way to do this.

(have not had time to read the blog entries yet although I know some people dislike threads).
(In reply to comment #26)
> (In reply to comment #23)
> Ok, point taken about threads being means to an end, although I said
> 'threading' rather than 'the use of threads'; I guess I should have said
> 'concurrency in operation of the browser'.

Ok, that is better but still abstract. The way to proceed, at least in bugzilla, is from symptoms to causes.

So the new bug 384115 can have comments naming bugs that might block it, in the sense that they are bugs to-do with UI starvation or other symptoms that we believe can be fixed by taking advantage of hardware parallelism where it exists.

Other bugs that involve starvation or suboptimal scheduling of ideally concurrent operations may not want or need parallelism, and they may be real bugs -- they just don't need to block bug 384115. We could have another metabug (there may be one already) blocked by these, or all, "responsiveness" or "scheduling" bugs.

> > - Tabs may reference one another's variables including DOMs using window.open,
> > and any pair of communicating tabs must serialize all of their scripts to
> > uphold the run-to-completion execution model that is part of the browser JS
> > standard.
> 
> To the extent that they reference each other's variables, then yeah, they
> should serialize, but this is rare AFAICT,

It doesn't matter how rare it is. The problem is not solved by saying how often it happens. Consider:

Window A:
  i = 1
  document.write(i)
  i = 2

Window B:
  w = window.open("", "A")
  w.i = 3

The two windows race to load. What is written in A by the document.write call? It must be 1, not 3 (and not 2, of course).

If you try to multithread the two windows (which could be tabs, iframes, frames in frameset, or top-level windows), you'll have to join the threads somehow when B executes window.open, before it returns the reference to A that's assigned to w. But if you make B wait for A's script to complete, you will leave B vulnerable to having its currently-running script's invariants violated by script in another window C that calls window.open("", "B") and messes with mutable state reachable via the returned ref to B.

What's more, if A opens B while running, and B opens A, you could have a deadlock. It could be fixed by making one script lose and be preempted by the other, possibly involving invariants varying due to the winner mutating the loser's variables. Such a rule would be an incompatible change to the de-facto standard execution model.

Note that windows may load sequences of documents, some of which run scripts that address other windows (and so join with their threads) while others do not address any other windows.

So while scripts from different (unrelated, cf. document.domain) domains may not address one anothers' variables, a window might in the future load a doc from the same domain as another window. So you cannot assign windows to threads _a priori_ based only on current document domains.

Windows are addressed by name, and even an unnamed window may be given a name after it has been created (and so referenced by script from another window).

> and done on purpose when it's done,

That's irrelevant too, because the APIs do not require cooperating windows to declare their intentions up front. Note that the *windows*, not documents that may or may not be presently loaded in them, must join or leave "threads" in order to preserve run-to-completion.

> AFAIK, plus the main issue that concurrency or use of threads (say per tab)
> would solve is the ability to use with unrelated tabs without the work being
> done in one of them blocking you from interacting with the other one.

That's one way to implement things. It has some serious drawbacks that I've already blogged about. There are browsers that claim better responsiveness without a thread per tab. There are many ways to skin this cat, and picking one based on feelings could lead to very bad outcomes. So please, stop with the cart before the horse already! :-/

> > - Tabs usually finish layout quickly. Exceptions other than layout bugs to be
> > fixed without threads include runaway scripts, which we police fairly well
> > AFAICT, and runaway plugins.
> > - Runaway plugins need special treatment. See bug 156493 (and bug 230017, but
> > it adds overhead without memory isolation).
> 
> 'Quickly' is a relative term.

I was talking about layout, which is code we control. See roc's first blog post linked above.

> Plus, sometimes plugins aren't 'runaway' but just
> take their time, like initializing Java.

Now you are changing the subject. We've been over plugins, which are a distinct issue from scheduling of our layout code; see bug 156493.

> Finally, remember that many bugs remain unresolved for many years and some new
> ones surface (especially when it comes to plugins), so the situation in which a
> tab-related hang forces you to close the browser altogether is something I
> would like to see solved rather than supposed-to-be-avoided-eventually

That's fine, but who said otherwise? I think this is a straw man...

> - and it
> seems the use of threads is a way to do this.

... a straw man riding in a cart in front of the horse (and so going nowhere).

> (have not had time to read the blog entries yet although I know some people
> dislike threads).

It's not a matter of "like" or "dislike" -- although you seem enthusiastic enough to jump to the "thread" implementation-conclusion. Please, enough with the feelings (this ain't "Oprah"! ;-). There are serious trade-offs to assess here based on experience, real data, sound theory, and least-cost-to-fix judgments.

/be
Please de-emphasize the "hardware parallelism".

The crucial point here is that an OS can always do better multitasking than the browser. Regardless of how many CPU cores are on that machine.

My main complaint was that one un-responsive tab / window / plugin / whatever seems to block the entire browser. I am just a dumb user (OK, maybe not, but let's pretend), I don't care about those fine programming points you smart rocket scientists are talking about. I just care that the whole browser is waiting for a single part of it, and sometimes it waits forever.

"Splitting" the browser in smaller execution units is one way to solve this problem. There might be other ways too, but the important thing is to make 100% sure that one small part of the browser does not bog down the whole enchilada. How that is accomplished matters less, but just look at the problem from a dumb user's point of view - so Java has issues loading, therefore my whole browser got stuck? What is this, nineteen eigthy five or something? :-)

Running one thread (or process, or in general one execution unit) per docshell was just an idea, stemming from the fact that the modern OSes are pretty good at multitasking, regardless of how many cores are on the system. Even with just one core, the OS does a good job at distributing CPU cycles to the various execution units.
(In reply to comment #28)
> Please de-emphasize the "hardware parallelism".

Please stop spamming a RESOLVED WONTFIX bug.

> The crucial point here is that an OS can always do better multitasking than the
> browser.

False.

> Regardless of how many CPU cores are on that machine.

*plonk*

/be
(In reply to comment #29)
> (In reply to comment #28)
> > Please de-emphasize the "hardware parallelism".
> 
> Please stop spamming a RESOLVED WONTFIX bug.
> 
> > The crucial point here is that an OS can always do better multitasking than the
> > browser.
> 
> False.

Ok, that was mean. Sorry. If you insist on defining the problem to dictate the solution, then of course "multitasking" is the OS's job.

But responsive browser UI with windows and tabs galore is not "multitasking". I dissent. Many browsers are responsive (modulo plugins, separate issue, dealt with via processes in Konq, e.g.) without threads.

I really do object to putting the thread cart before the various horses (lack of UI responsiveness, lack of CPU utilization on multicores, other throughput and latency complaints) whose best solutions *may or may not* have anything to do with threads.

/be 

 
> The crucial point here is that an OS can always do better multitasking than the
---
Old topic, but FYI -- the above is usually false.

The OS doesn't know the work load of the applications nor does it know the problem space.  The OS has no way of knowing how a problem should be "partitioned" (or broken into chunks that can be run efficiently in parallel.

Part of the key is "efficiently".  Unfortunately, it also depends on architecture to some extent.  In a worst case, think of a problem space that won't fit in memory and needs to be paged off of disk.   A given partition may have dependencies on other partitions -- say you are computing some environmental condition which involves adding in partial components from the surrounding (in space) partitions.  In a graphical sense it's like anti-aliasing.  You can't compute the value of a single dot and display it.  You have to compute, then look at surrounding pixels and add in their effect to determine the final value of the current pixel.  An OS could likely provide a "draw-line" function that anti-aliases more effectively than average applications, but if I'm computing ocean or weather patterns by breaking down 3-D space into adjacent partitions, there's no way the OS could begin to know enough about the problem to intelligently partition the problem.  The application is in the position to know how to layout the data, to allow, perhaps, adjacent cells to be processed in parallel, so the different processors are all accessing values that are likely in memory.

On a more finely tuned 'note', the same thing happens with L2 caches.  In a 2 socket x 2 core layout, the 2-cores on the same socket may be sharing 8MB of cache.  I believe the 1st generation quad chips had the same setup -- 2 pairs of processors with each pair sharing a cache.
Is there a possibility of this getting higher attention now that IE8 has this feature? It would suck to have a major feature advantage go to IE8. 
Matt: IE8 has a thread per tab, or a process? But based on the answer, please don't go over to the open bug on my list and spam there.

/be
IE8 runs tabs in separate processes, here's their "crash recovery" whitepaper

http://code.msdn.microsoft.com/Release/ProjectReleases.aspx?ProjectName=ie8whitepapers&ReleaseId=582
When Firefox feels unresponsive because of the activity in one tab, it is almost always because the webpage is badly written and Firefox is following the bad instructions it is given.  Plugins, and bad Javascript - those are the causes mentioned in the Mozillazine KB for high CPU usage, and they are the truth.  Both of those issues can much more easily be fixed by addressing them directly (not allowing Javascript to run busy loops, and perhaps a sandboxing method for Plugins that limits their max CPU).  Threading will add major complexity and is a bad solution to these problems.  As a user, for now, if you have these problems, the best solution is the Noscript extension and to uninstall all your plugins :)
While this might be true - threads add complexity - the answer seems another of those well-known 'it is not our problem' escapes for which the Firefox project has been criticized in the past (deserved or not). It is clear that the problem exists: Firefox and Seamonkey both stall when something time-consuming happens in one of the tabs or windows. Sometimes it is true that the something can be pinpointed at a specific piece of bad code or a sloppily programmed plugin. Sometimes it can not be pinpointed at all, and sometimes it is clearly Firefox or Seamonkey which misbehaves. Running noscript (or a script-blocking proxy) and plugin-free does not solve the latter situations. A solution to all these problems would be to rectify all errors in scripts everywhere, fix all bugs in the plugins and all those in the browsers and rendering engines. This solution can be found somewhere over the rainbow - just hop on your flying pig and before you know it you're there.

Meanwhile here in the imperfect world we're still stuck - sometimes literally - with a browser which won't browse. What can we do about that? Maybe threads are not the answer, maybe they are. To just state 'WONTFIX' sounds like escapism.
This bug is about threads; if threads are not the answer, WONTFIX isn't escapism, it's realism.  (And I don't think that threads are the answer -- they weren't for IE either, as seen above.)

Unsourced and vague criticisms (that you point out may not be deserved) are not germane to an engineering solution, only to posturing and armchair architecture.  Do some research and see where thread boundaries could be placed, and under what circumstances they would need to be crossed.  Isolate the workloads that cause you to be unable to browse (!), and see how they map to your proposed thread-based decomposition.  Or, if you won't do such research -- perhaps because you think that understanding the problem or helping to solve it is 'not your problem' -- then at least be silent while you wait for someone else to do the work.

App responsiveness is our problem, and we work all the time to improve it.  That the application is not multi-threaded is _not_ the problem, it's just a characteristic.
Then may I propose a new bug report to clear up the obvious bias ('app is unresponsive - use threads to solve this') of this report and return to the core of the issue: FF and SM (and others) inability to keep misbehaving windows/tabs/frames/iframes/... in check. Not a 'bug report' as such but a place to discuss potential solutions to this (real) problem? And then make this bug a duplicate of that one? That way the discussion (and potential solution) of the issue can continue without it being doomed up front for being biased wrt threads. I'm no fan of abuse of threads either but I also would like to see these problems with FF/SM/.. solved.

Or have I missed a bug like this already active in bugzilla?
You could start a thread on a newsgroup if you want to explore options, or you could "propose a new bug" through the usual Bugzilla mechanisms.  Bugzilla isn't really the best place for exploratory research, since it tends towards task orientation: if you want to _do_research_, you can do that and post results in .platform; as far as simply advocating that "someone do something" to improve non-specific cases of poor responsiveness, I don't think that market niche needs filling.  I don't think that the application suffers from a lack of bugs filed about its responsiveness, so I do strongly recommend starting a discussion somewhere on an appropriate newsgroup, and I do strongly recommend starting it with some data, or concrete plans that you will be following.

This bug would not be a duplicate of that other bug, and this bug does not need any other resolution than the appropriate one it already has.  Nor does this bug need more people explaining the importance of responsiveness, or using it to learn how to use bugzilla
Research on a solution should not be a prerequisite to a bug documenting a problem.  Frank, see bug 384115 and bug 156493.  

This bug isn't going anywhere, so let's not continue to spam it.  If we want a new bug "UI should not hang when pages load or misbehave", that should be independent of this bug and possibly hooked into 384115.  I'd be shocked if there weren't a dozen bugs about this already, so finding those, figuring out why they were closed, and possibly re-opening one, would seem appropriate.
> Then may I propose a new bug report to clear up the obvious bias ('app is
> unresponsive - use threads to solve this') of this report and return to
> the core of the issue: ... inability to keep misbehaving
> windows/tabs/frames/iframes/... in check

That's exactly what I did with bug 384323 ("UI responsiveness - core/platform - meta bug"). Please don't spam it with comments "This needs to be fixed", but with concrete proposals along the line that shaver pointed out in comment 39.

FWIW, bug 384115 ("exploiting hardware parallelism") is not necessarily about responsiveness; performance and responsiveness are related only from a superficial view. Even humans (one or a team) can work faster on one task when they are not interrupted, but then they are unresponsive to other tasks.
> concrete proposals

(I meant: They'd be new bugs and only blocking the meta-bug. General "we could do this" chat should be on the newsgroup, as shaver said.)
I run noscript all the time as a matter of course, so I'm reasonably sure 99% of my problems are not due to bad scripts.

I understand the 'not wanting' to rewrite everything to be "fully pre-emptable", but there are things to do that are 'short' of that mark.

An example of a good application that isn't very multi-threaded (has a few), but
main thread wasn't last time I checked...."squid", which does things pretty much off of a 'select' type arrangement.  

Things that can take 'excess' time -- ESPECIALLY DNS LOOKUPS, are relegated to one or more 'helper' threads.  Anything that can cause any sort of I/O (including network I/O) is a "candidate" for getting it's own thread depending on the 'installation'.  Example - with some file systems, deleting files is slow (in CPU time, sometimes, noticeable in human-time).  Depending on what your base FS is, or what caching 'backend' you use, there is the option of using a separate "delete thread", so the main-loop doesn't block.

I'm sorry I don't know more of the exact detail and structure of FF to make exact examples, but since it is a graphics application, it is likely to run off of some "mainloop", type structure where things are dispatched one at a time.  The key is to not have any of the dispatched calls hold things up.  If one of them gets caught doing a DNS lookup, the entire application hangs.  

It isn't desirable (at this time) to try to rewrite all the windowing systems that FF runs on to make sure all of them are thread-safe (I'm presuming).  But that doesn't mean that each "dispatch" can't be broken down into finer dispatches.  That means, for example, the ability to save at certain, allowed pre-emption points.

Having code-decided pre-emption points isn't the same as being fully pre-emptible w/multitasking, but it can help.  Remember, linux didn't get rid of the 'big-lock' usage overnight.  It took years.

But as an example, DNS lookups need a way to check to see if they are already cached (internal to FF), -- if so we know that a "local-within-FF" DNS lookup won't hang.  But outside of that -- we need a way to suspend this "command"
(where command is currently 1 indivisible call from the main-window event loop)
and not synchronously wait on the DNS lookup which can go into multiple seconds.

But if a way to "save state" can be done before, "just DNS lookups" as a stage 1, fix, and later re-dispatch that thread to continue it, I suspect that
many or most of the 'hangs', _in_ _non-script_ _cases_ would disappear.

So it's too much work to go 'multi-threaded', "now".
Is it too much work add a way to programatically save state (in a dispatched called), so it can return to the main dispatcher and give "responsiveness" to the UI or other non-blocked dispatchable "work-units"?

Maybe Firefox needs to become more like an OS, in that it "manage" the tasks that are dispatched off the main dispatch loop -- and if more than a "timeslice" has run updating 1 dispatched-command, set a flag that can be checked at 
"pre-emptable points" to stop updating the page (or dispatchable command) that is hogging the dispatch loop, save their state (at some *SAFE* pre-emption point) and relinquish control to the main-dispatch loop...give the other commands waiting on dispatch their chance to run?  Is this type of
"safe-pre-emption-point" technology possible to implement and migrate toward as a means of helping "responsiveness"?

LindaW

Component: XP Toolkit/Widgets: XUL → XUL
QA Contact: shrir → xptoolkit.widgets
what Linda is describing above might be a well known design approach
covered by the term "asynchronous IO operation".

for all what i've read here there is not even much of an prooven state
for where at all those hangs do originate from - the given guesses might
be valid but still there can be ohter things we would never think about.

so the major task before going into any sort of a change/fix process
is probably the analysis of the problem using tools that provide a deeper
understanding of the problem itselves - thats what is probably missing.

in a classical program you would use a stock debugger, in such a big system
you probably have to pop up with something more advanced or even newly
developed and tailored for the single purpose of the target application.

is just know that i dont know much on the true reasons for the problem
so my only decision can be to measure, test, probe, manipulate and
inspect the system. of course a valid blind guess approach can be in
the form of just drop most existing code and replace it with something
else that is supposed to be a bit different. it might work, or it might not.

designing an own set of diagnostics means is more valueable when the
intention of it is to keep a good bunch even if after the first insights
some major parts might be changed. its probably much more valueable if
such a means of diagnostics is even useable from the first line of beta
coding to the very last moment of code execution at the end users desk.

the whole thing is sort of a dilemma of not knowing whats needed to be done
when next the next stage of evolution steps in. but not starting any research
at all is probably the "best chance" of never fixing the problem as well.

as mankind for example can only see the surface of the sun, it still
can create any sort of more or less detailed models to get an understanding
of the interiors of that big star in the center of our solar system.
with software like FF its different because we can tap it anywhere we want 
whilst it is in true operation an get first grade authentic results
from such doing. at least that was some theory some FS/OS guru once expressed.
Depends on: supersnappy
WONTFIX!! will this be reconsidered after he recent changes to OMTC and E10S?
You need to log in before you can comment on or make changes to this bug.