Closed Bug 371505 Opened 15 years ago Closed 14 years ago

OS/2 repaint issues with Thebes

Categories

(Core Graveyard :: Widget: OS/2, defect)

x86
OS/2
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: mozilla)

References

Details

Attachments

(2 files, 9 obsolete files)

At least for me cairo-os2 builds mess up the screen quite a bit. During page scrolling the invalidated area always gets filled again with the bits that were there before. I suspect that we need some extra code in widget to adapt repaints for Thebes.
Attached patch Better solve size=0 problem (obsolete) — Splinter Review
This doesn't really change anything with the behavior but reminded by Andy's patch in the newsgroup I thought cleaning up the handling of an empty surface (width or height of zero) might be a good idea.
Comment on attachment 266288 [details] [diff] [review]
Better solve size=0 problem

Hmm, not only does this not solve anything, it also causes dialogs (like security alert boxes or the Options dialog) to not display/paint at all.
Attachment #266288 - Attachment is obsolete: true
Attached patch adds invalidate (obsolete) — Splinter Review
This is not a fix as it uses to much CPU but does allow scrolling and I can now see what I am typing into the URL bar (though it still hard to tell when I have the cursor in the bar).  Hopefully, this will help point to what we need to do.
I was able to work with Doodle over IRC and we have a working solution (enough so that I am working from it here, though it is slow).  Here are some of his comments:
<Doodle>	what I don't like is that the cairo surface is created and destroyed every time a window painting is needed.
I'd create a cairo surface when the window itself is created, and would destroy it with the window.
Then the OnPaint() would be much simpler,
it would only need the thebesContext to draw into the cairo surface, and in the current version all the redrawn stuffs would automatically appear on the screen.
What we have right now does all these for every window part redrawing:
- Creates a cairo surface (allocating a big chunk of memory)
- Copies the pixels from the window to the surface
- Draws on the surface
- Copies back the pixels from surface to window
- Destroys surface
that's why it's slow.
What I don't understand in the current code is that what is a gfxOS2Surface needed for? All I can see is that it creates a Cairo surface. That piece of code could be easily moved into nsWindow class.
well, the nsWindow implementation is already platform specific, so I'd move the Cairo surface creation in there. It would make a lot of thing much more easy.
1, the constructor of nsWindow would create a cairo surface for the window
2, the OnResize() would also resize the cairo surface
3, the OnPaint() would be very simple, only the real drawing would have to be done in there.
4, the destructor would destroy the cairo surface
This passes the whole window area to gfxOS2Surface.  It fixes the ugliness that was but is quite slow this way.
Attachment #268191 - Attachment is obsolete: true
Comment on attachment 268218 [details] [diff] [review]
[checked in] whole area

This just repaints the whole window instead of just the invalidated region, no wonder it is slow, especially when typing.

I will check this in anyway, once the tree reopens, as a preliminary solution (with fixed indents and a comment). But we should definitely leave the bug open while we continue to search for the real solution.
Attachment #268218 - Flags: review+
Regarding Doodle's comments in comment 4, yes, we should find a way to do things faster. I don't think we can get rid of the gfxOS2Surface class, though. I am pretty sure that we need a subclass of gfxASurface for Thebes to work. Or have you tried to remove it and it built?

I don't think that the pixel copying is the real problem, as long as only the invalidated part of the window changes for most things. It works fine on the other platforms.
The problem is the way Cairo/2 is implemented I think.  Doodle said he could probably change it fairly easily but it assumes a 1:1 mapping of HPS to Cairo surface.  Because we are creating a new surface with the invalidated portion it isn't just updating but that area but rather creating a new surface that is smaller than the window as a whole (something like that, it was late, or rather early, and I missed copying that part of what Doodle said).  I don't know if it is possible to do what he suggested about creating just the one surface that is then used throughout in gfxOS2Surface or not.
I asked Doodle about the necessity of the gfxOS2Surface.
<Doodle>	but then I would do it that way that the cairo surface would still be created for the nsWindow class, and a "stub" gfxOS2Surface would be created then, which would get the cairo surface handle from the nsWindow class. So, we would still have the gfxOS2Surface class, but the cairo surface would be tightly glued to the window itself. but it's just an idea, without knowing deeply the internals.

Commenting out     cairo_surface_mark_dirty(surf);
greatly lowers the CPU usage.
And then you don't get any other artifacts? OS/2 cairo normally needs to initialize the whole surface otherwise random bits present in RAM will be visible. Hmm, unless Thebes fills the whole surface with something anyway...
OK, I confirmed that this seems to do no harm, so I checked it in, too, along with the workaround from attachment 268218 [details] [diff] [review].
Since this check in I have forced myself to use a trunk Firefox more often. It seems to work surprisingly well regarding the painting, just a scrollbar that isn't repainted every now and then and a lot of flickering.

What I find funny is that animated GIFs only animate if I move the mouse of press a key (seen on http://tinderbox.mozilla.org/showbuilds.cgi?tree=Firefox). Also tooltips of links (e.g. of links in Bugzilla) only appear when I hover the link and press a key, Shift or Ctrl.

Andy, have you been able to work more with Doodle on the cairo topics?
Being in MDT US and him in Hungary puts us mostly out of sync, I stayed up for a couple nights in a row to work with him before and haven't had time to do that again yet.  He strongly suggests linking the nswindow to the cairo surface creation so that just one surface is created.   Considering how much removing cairo_surface_mark_dirty() sped things up I think just a repaint would speed it up quite a bit and get rid of the flicker.  He said he could change Cairo/2 to remove the 1:1 mapping of HPS to Cairo surface.  He said we still wouldn't be quite as efficient and it would add quite a bit to the complexity.  I've looked a bit at trying to tie the Cairo surface creation into nswindow but I haven't gotten anywhere so far.
I've been thinking about this, though I haven't come up with how to implement it yet.  I think that we could make the call to gfxOS2Surface to create the surface and return the surf to nswindow.  We would need to add to gfxOS2Surface a section for updating that surface where we would have to pass that surf back to it.  Then we could create the single surface and not keep destroying it and then for the update section we would be able to just let it update the invalidated regions.
Attached patch use one cairo surface (obsolete) — Splinter Review
This is not working yet, just get a frame with a menu bar and buttons and even the tabs but no content is rendered. I am putting the patch up just as a concept of trying to create just the one cairo surface and update it.
Interesting, thanks for posting that, Andy. I haven't tried it out, but I still have a few comments/questions:

- why do you add the new methods in gfxOS2Surface() as constructors? Should they not just be normal functions acting on the surface? Like gfxOS2Surface::surfaceRefresh() and gfxOS2Surface::surfaceSetSize().

- Why surf to surf1 change in gfxOS2Surface::gfxOS2Surface(HWND aWnd)?

- Global variables (like cSurface, rSurface) should be avoided. I guess it's a hack for now (I don't really understand what you are doing with them) but if you need something like that, better add them to the class definition in nsWindow.h (and use the "m" prefix used for class variables).

- Please also use the "a" prefix for function arguments.
>why do you add the new methods in gfxOS2Surface() as constructors? 

Because I don't know C++ and was copying the style used from the others, in fact would rather have done the normal functions if I had been sure I could do that.

>Why surf to surf1 change in gfxOS2Surface::gfxOS2Surface(HWND aWnd)?

Because it is never called from onPaint, best I could tell it is used for dialogs?  So it wouldn't necessarily use the same surface from what I could glean.

>Global variables (like cSurface, rSurface) should be avoided.

I am using them to determine if 1) a surface was already created (I could use surf but originally I was planning on using one variable for 1) and 2). 2)determine if this is a resize operation and so call the set size.  

What I am trying to accomplish, overall, is to use just the one surface.  If I can get it to work at all then I can look at a better way to handle it than to use global variables.

At this point I don't know if I am missing something or if Thebes design just won't work with a single surface that is updated.  I am thinking it has to be something I am missing but I've played with presentation spaces and rects to no better success than I have with this version of the patch.

Hmm, back on the first point... maybe I need to use them as constructors and build a new targetSurface (at least for refresh).
Attached patch poking in the dark (obsolete) — Splinter Review
I think this may be close to what Doodle meant. Couple mThebesSurface in nsWindow to mWnd and whenever mWnd is resized, you call the cairo function to resize the respective gfxOS2Surface. And when mWnd is painted only really draw the surface the first time, otherwise just refresh it.

Problem is this doesn't work. For some reason a WM_PAINT message is never generated. Too tired now to find out why.
Attached patch newest attempt at single surface (obsolete) — Splinter Review
Here is my newest approach.  This tries to take into account that gfxos2platform is creating offscreen surfaces.  Probably this needs to be changed to create just one offscreen surface as well, though that may not be as critical... I don't know.  It was awhile since I last played with this... it may actually be crashing.
Comment on attachment 268218 [details] [diff] [review]
[checked in] whole area

This was actually checked in on 2007-06-14...
Attachment #268218 - Attachment description: whole area → [checked in] whole area
A more successful try at implementing Doodle's suggestions. This uses a single cairo surface in nsWindow (the mThebesSurface variable). When painting, the cairo context is derived from that surface in OnPaint(), and that is only done when the same surface has not been painted before. Otherwise just refresh is called.
I have added two methods to the gfxOS2Surface class to call the resize and refresh functions and in the constructors I set cairo_os2_surface_set_manual_window_refresh() to true, so that painting can be handled in nsWindow::OnPaint().

I say "more successful" because with this it actually runs and paints something. But it still has similar painting problems as the one without Andy's workaround that is in the tree since June.
Attachment #271532 - Attachment is obsolete: true
Attachment #273183 - Attachment is obsolete: true
Comment on attachment 274439 [details] [diff] [review]
newest attempt at single surface

While I am here I will obsolete this patch.
Peter, I see you found this:
-    if (!mPS) {
-        // it must have been passed to us from somwhere else
-        mOwnsPS = PR_TRUE;
-        mPS = WinGetPS(aWnd);
-    } else {
-        mOwnsPS = PR_FALSE;
-    }
+    mPS = WinGetPS(aWnd);

which I had been meaning to come in here and mention for the last couple of days while I have been playing with the code.  This alone made you last patch to at least paint though not scroll or resize.  The Presentation Space had been deleted and that destroyed the Cairo surface I believe.  Along these lines... shouldn't the PS be passed to the refresh class?  I had been playing with trying to find a way to pass the cairo surface to nswindow on creation and then pass it back to resize and refresh (I am not sure how CairoSurface() would pull the right surface).
Attachment #274439 - Attachment is obsolete: true
Refresh in my implementation is not a class but a class method (or function) of the existing gfxOS2Surface class. That class also has the class variable mPS (class variables are denoted with the leading "m" in Mozilla code). These variables are accessible directly by any class method, so you can just ask for mPS in Refresh, Resize or any other function.

I just changed the part that you commented on because I realized that mPS only gets set in a constructor (the class method that is called by "new gfxOS2Surface" and denoted by the same name as the class itself). While we have two possible constructors -- one, with the HWND argument for calls from nsWindow, one with the PS and size arguments to be called from gfxOS2Platform::CreateOffscreenSurface -- only one of them can be called for any given surface...

[One could also add an extra class for refresh but that should then be a subclass of gfxOS2Surface, to be able to access the class variables like mPS, and really only contain the method that does the refresh. So I think that would be overkill...]
Attached patch add invalidate to scroll (obsolete) — Splinter Review
This is patch should be the same as Peter's last patch except it adds an invalidate to the Scroll function.  It still has some oddness in the menu bar and such but scroll works with it and even the menu bar works just sometimes goes all grey.  
I tried to change the refresh to be passed hPS from WinBeginPaint and it made no appreciable difference.  I also tried for refresh to change the rectl to NULL so that is should refresh the entire surface but that made no difference either. 
It would clean up the scroll if I resized after the scroll so I finally noticed it seemed to add an Invalidate.  I initially just added my invalidate patch in here and it worked OK but was pretty slow.  I then moved the invalidate to the scroll and it works fairly OK but still some cleanup needed somewhere.  This is usable, though not desirable as it is.
Attachment #278320 - Attachment is obsolete: true
(In reply to comment #25)
> Created an attachment (id=278635) [details]
> add invalidate to scroll
> 
> This is patch should be the same as Peter's last patch except it adds an
> invalidate to the Scroll function.  It still has some oddness in the menu bar
> and such but scroll works with it and even the menu bar works just sometimes
> goes all grey.  

I came to a similar conclusion, but my fix is a bit different. Instead of the extra invalidate, add a line "mWasPainted = PR_FALSE;" at the beginning of the Update function. I also think that the mWasPainted = PR_FALSE;" lines in the three Invalidate functions are not all necessary, but I am not sure yet which of those one could remove.

With this, the content is rendered and refreshed correctly but scrollbars and other chrome are not. There seems to be something special about chrome that we haven't found out yet.

> I tried to change the refresh to be passed hPS from WinBeginPaint and it made
> no appreciable difference.  I also tried for refresh to change the rectl to
> NULL so that is should refresh the entire surface but that made no difference
> either. 

Same experience here.

I also found that the extra WM_SIZE lines that I added in the previous patch can be removed.
(In reply to comment #26)
> (In reply to comment #25)
> > I tried to change the refresh to be passed hPS from WinBeginPaint and it made
> > no appreciable difference.  I also tried for refresh to change the rectl to
> > NULL so that is should refresh the entire surface but that made no difference
> > either. 
> 
> Same experience here.

But it seems to be the right thing to do and should also be faster.
Here is the correct placement for mWasPainted:
        case WM_PAINT:
            mWasPainted = PR_FALSE;
            result = OnPaint();
            break;

This fixes it for both chrome and scroll and some pictures I also noticed weren't rendering correctly on OS2World forums.
Attached patch better than last patch (obsolete) — Splinter Review
This one is much better than the last patch.  This one puts the mWasPainted=PRFalse; under the WM_Paint function which seems to clean up everything.  It is better than we currently have checked in, it doesn't flash everything on the screen a whole lot of times like it does and this code is at least as fast if not faster than what we currently have.
This patch also passes the hPS from WinBeginPaint to Refresh.
Attachment #278635 - Attachment is obsolete: true
Well, if we need to set it before calling OnPaint then this means that it always goes through painting the whole window and we can as well completely remove that variable. But there have to be cases where calling the refresh function without going through painting is enough, that should also be quite a bit faster. I am also quite sure that I had tried this kind of patch before checking for mWasPainted in OnPaint, and that did not work. Something else must have changed since then.

Nightly builds are not back yet, so we can take some more time to solve this properly before checking something in.
Andy, does your patch do anything else different that I am missing? Because when I try to add mWasPainted=PRFalse; as first line under the WM_PAINT: I get repeated painting of the content window and 100% CPU usage. I tried that with SeaMonkey of a few days ago.
Other than the mWasPainted=False; line this patch does pass hPS from WinBeginPaint to Refresh... I haven't tried with this configuration without passing the hPS but would have expected it to work because every other configuration it made no difference.  I don't think there was any other change, though I was making lots of changes and may have left something else but don't see it.  I am using an existing profile... if you are creating a new one maybe this code doesn't play well with that?
OK, I applied Andy's version of the patch to cleaned directories of widget and gfx/thebes and now I got a running version. With that I still have redrawing problems of scrollbars and content when switching tabs, but it doesn't flicker as much any more. So yes, we should clean this up and get checked in. With this it may even be worth to produce an M8 release build for wider testing.

(Loosely connected with this bug I just filed bug 395301 because I had a bad feeling about the offscreen surface handling...)
Attached patch cleaned up patch (obsolete) — Splinter Review
OK, this cleaned up version now comes without all those printfs and without mWasPainted. I guess we just have to paint every time mEventCallback is defined. Even in the debug version that I tested this with it is pretty fast now.
As this is about Thebes I guess I can go without a review but if someone can test this final version, I would feel more comfortable...
Assignee: mozilla → mozilla
Attachment #278693 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Looks fine here... as well as the NS_GFX patch that I had to use to build it.
As Andy noted in bug 395301 comment 8 there is quite a bit of memory increase with the current set of patches. I tried three versions, one with just plain trunk CVS code, one with this patch (attachment 280181 [details] [diff] [review]), and one with both this patch and the one from bug 395301. With each of these I go through the same routine to load it with one blank tab, then load my homepage, then maximize and restore, load a bookmark group in tabs, maximize and restore, switch between tabs, create a new window, and close the old one, etc.

The shocking fact is that with this patch the RAM usage of SeaMonkey is more than 30% larger than without this patch, in the end it uses 36% more RAM than the plain trunk (141MB vs. 89MB). Using the patch from bug 395301 in addition improves matters a little bit, but only the level of a few percent (in the end it uses 134MB).

Maybe the problem is that we now keep (reference to) two cairo surfaces, one in nsWindow and one in Thebes, but I haven't really investigated this.

[ By the way, SeaMonkey 1.1.3 uses a lot less RAM, at the end of the same routine it uses just 54MB or 64% less with respect to "plain trunk". ]
I have made some progress with the repaint problem itself. When loading tab groups (or tabs in the background) the chances were very high that when switching to the new tab even with the patch from attachment 280181 [details] [diff] [review], the content of the tab or its scrollbars were not painted or painted black. I now discovered that this was because the window event was not handled and so the "we'll just end up painting with black" comment in nsWindow::OnPaint was true.

A workaround is to replace the single DispatchWindowEvent call with a loop, so that it tries more often to dispatch it. The few times I have observed it, only a second try was necessary, so a loop with a max of 10 tries is plenty. Like this:

         event.renderingContext = context;
-        rc = DispatchWindowEvent(&event, eventStatus);
+        // try to dispatch a few times
+        for (int i = 0; i < 10; i++) {
+          rc = DispatchWindowEvent(&event, eventStatus);
+          if (rc) {
+            // this was handled, so we can stop trying
+            break;
+          }
+        }
         event.renderingContext = nsnull;

I have now tried many tab groups and they all display fine, including their scroll bars. :-)
I didn't get any responses after asking in for hints in mozilla.dev.tech.gfx, so the trick with the small loop until the window paint event gets _successfully_ dispatched is our best solution. So I would like to get this in.

Mike, as this contains stuff in nsWindow, you should perhaps take a quick look.
Attachment #280181 - Attachment is obsolete: true
Attachment #284877 - Flags: review?(mozilla)
Comment on attachment 284877 [details] [diff] [review]
[checked in] further improved patch

OK, I have checked this in. (As it is basically cairo-os2 only I did that without review as per agreement.)

Mike, if you have comments later on, please tell me. :-)
Attachment #284877 - Attachment description: further improved patch, hopefully final → [checked in] further improved patch
Attachment #284877 - Flags: review?(mozilla)
I guess this finally fixes this bug.

Now we just need to get out a build to get wider testing.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
(In reply to comment #40)
> I guess this finally fixes this bug.
Peter, you are really cool. Even tigerdirect.com looks perfectly.
> 
> Now we just need to get out a build to get wider testing.
> 
I might be of in Nov for a week, if Mike's machine is still broken and the beta release date is not in this time (Murphy) I can contribute a build and host it on my webpage if needed
(In reply to comment #41)
> Even tigerdirect.com looks perfectly.

If that is so, I can hardly take credit for it. You just happen to have set the "right" font size...

> > Now we just need to get out a build to get wider testing.
> > 
> I might be of in Nov for a week, if Mike's machine is still broken and the beta
> release date is not in this time (Murphy) I can contribute a build and host it
> on my webpage if needed

Well, until I have upload permissions for Mozilla FTP I thought I just put them up on my sf.net page. SM and FX are already built.
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.