Closed Bug 178292 Opened 22 years ago Closed 19 years ago

Simplify printing without splitting frames

Categories

(Core :: Printing: Output, defect, P3)

x86
Windows 2000
defect

Tracking

()

RESOLVED WONTFIX
mozilla1.4beta

People

(Reporter: karnaze, Assigned: john)

References

Details

Attachments

(3 files, 1 obsolete file)

Our current implementation of intelligent page breaking by splitting frames is
very complex and can't handle cases where absolutely positioned frames start on
pages prior to where they are first encountered. An alternative approach would
be to reflow the entire document in galley mode and then clip to achieve pages.
It needs to be intelligent and not cut across certain things (e.g. text,
borders) and also honor page-break-before/after etc.
Status: NEW → ASSIGNED
Priority: -- → P3
Target Milestone: --- → mozilla1.3beta
Repeated headers/footers on tables is going to make this a lot more complicated
than I thought.
Target Milestone: mozilla1.3beta → mozilla1.4beta
Blocks: 191033
I am working on this.  A current summary exists at
http://www.johnkeiser.com/mozilla/newprinting.html.
Assignee: karnaze → jkeiser
Status: ASSIGNED → NEW
Depends on: 206347
I have no idea how this is really going to work. Can you really get printing
working with page breaks and all just using clipping and not using next-in-flow?
Somehow I don't think you can.

Suppose you have markup like this lying over a page break:

<div style="float:left; width:100px; border:2px solid blue;">hello kitty hello
kittyhello kitty hello kittyhello kitty hello kittyhello kitty hello kittyhello
kitty hello kittyhello kitty hello kittyhello kitty hello kittyhello kitty hello
kittyhello kitty hello kitty</div>
<div style="width:200px;">hello kitty hello kitty hello kitty hello kittyhello
kitty hello kittyhello kitty hello kittyhello kitty hello kittyhello kitty hello
kittyhello kitty hello kittyhello kitty hello kittyhello kitty hello kittyhello
kitty hello kittyhello kitty hello kitty</div>

With clipping, you can't place the page break without horizontally chopping a
line of text either in the float or the body.
Yeah, I was discussing this with jkeiser earlier today, and I posted some testcases:

http://dbaron.org/mozilla/tests/printing/

(although using the border trick in roc's testcase in comment 3 will have
problems more reliably than some of mine, where I used different font sizes)
roc: page breaks will not be simple lines, they will be a series of jagged line
segments at the bottom of a page.  When you say "I want the page broken along
this line" you are saying "I don't want content physically below this point to
show on the previous page."  If we have aaa with a baseline at 1200px at the
bottom of the page and there is bbb with a baseline at 1210px and a height of
20px, we will clip to the bottom of the aaa and have another clip region that
goes to the top of bbb.  Then on the next page we will start the page at 1190px,
*remove* the bottom of aaa with clipping, and not clip bbb at all.  In this way
we can avoid the chopping of text.

In your example, on the first page.  This is hard to draw in ASCII text, but
I'll post an example in a bit with MS Paint.

Simple tables would have the problem you are describing as well.
Attached image Page Break Example
I changed the example to have a 5px border so that the skew would be more
apparent.  The gray line is the place where the page break is set at, and the
black line is where the clipping will actually occur.  Therefore we will first
print the page from the top to the gray line, and clip out the stuff under the
black lines, and then we will scroll to the top of the black lines, clip out
from *above* the black lines, and print to the bottom.
Using negative margins or relative positioning or many other tricks, aaa could
overlap bbb. Actually this happens in my example if you play with nonbreaking
words, font sizes, and the width of the float. Then what would you do? You can't
separate aaa and bbb by drawing a jagged line.

Can you say a bit more about how this would work with print preview? If I have a
very tall block frame, in print preview it's going to be displayed as a series
of disconnected rectangles. How are you going to do that without breaking it
into continuation frames? From your web page it looks like you're willing to
give up the ability to display more than one print preview page at once. Is that
accurate?
Attached patch overlap testcase (obsolete) — Splinter Review
How well this demonstrates the point depends on fonts and default sizes. But at
some settings you will see kittyhello overlapping the main text. If that
happens on a page break you won't be able to 'cut around' the kittyhello to
move it to the next page.
Attached file another testcase
Here's another testcase. Here, your approach basically requires that after the
<p> misaligns the lines in the two columns, on subsequent pages the
misalignment will be preserved. There'll be small vertical gap at the top of
the page at each left-hand column. Currently, after the page containing the
<p>, subsequent pages are reflowed so the lines are aligned again.

If you agree that the latter is correct and the former is not, then there's no
way to format pages correctly without getting reflow deeply involved.
Er, sorry about that.
Attachment #123748 - Attachment is obsolete: true
I don't think the re-alignment of the paragraphs to the top of the page is
really the issue here.  It's the clipping of text.  And yes, my approach will
cause us not to re-align the text to the page boundary, which I think is *more*
correct: as much as possible, it should look on the printed page the way it
looks on the screen.

That said, your case is not complex enough to evidence the problem you're aiming
at.  If a page break falls through both overlapping pieces of text then there is
still a boundary that runs on top of both of them and does not clip through any
other text.  However, it is possible to construct an overlapping chain of lines
such that the entire page from bottom to top is filled with overlapping text. 
In that case it is impossible to clip around the text and still be reasonable. 
Thus, if you work hard enough to make a testcase, you are right: it is possible
to make the algorithm cut through text.

Now the question is, is the occasional cutting through text so bad that we want
to deny the huge complexity (and probably footprint) win of removing all
paginated frame-splitting from layout?  Are these cases going to happen so often
that this will be seen all the time, or are we talking "let's make our engine do
absolutely perfect printing in all cases"?  Do we want to keep the existing huge
set of crashers, hangs and bad behavior that result from printing using a reflow
codepath that is so rarely used and tested?  Many people print, but most of them
do it for a really really small fraction of the pages they view; and only a
really small fraction of the pages I have seen have any overlapping text at all,
let alone enough to fall on page boundaries enough to be noticeable.

I say out with complexity, and shed a tear or two for the few features that we
lose.  Prodigious proportions are a repulsive price to pay for perfect printing.

Features we lose:
- overlapping text can cut in the middle if you try hard enough
- content cannot flow around table repeated headers and footers (though it flows
around the first header and footer)

Features we don't have yet that get harder or impossible:
- allow CSS to have different page width on different pages and actually reflow
the content (this may be possible as long as the content transformed is not a
child of an actual visual element).  In the current world, this will mean boxes
can have different widths, which I doubt we really want to do.
- allow floats to move to completely different places depending on whether they
fit on the page (like TeX figures and tables)

Things we gain:
- a *lot* fewer reflow bugs, crashes and hangs for printing
- significantly less complexity and code in layout
- an understandable and maintainable printing engine
- an owner for printing :)
> I don't think the re-alignment of the paragraphs to the top of the page is
> really the issue here.

Yes, there are many separate issues here. I apologize if I have confused them.

> as much as possible, it should look on the printed page the way it looks on
> the screen.

I totally disagree. We should try to make the printed document look as
attractive as possible. If that means reformatting it for printing, fine. If you
want to provide screendumps as a separate feature, fine. But I think most people
who print want to see a document that looks good on paper, not one that would
look good if it had been on the screen.

Perhaps we should settle our goals before we spend too much time talking about
methods.

> If a page break falls through both overlapping pieces of text then there is
> still a boundary that runs on top of both of them and does not clip through
> any other text.

So your algorithm would walk up the page until it found a safe place to cut? How
far up would you be willing to go?

Your jigsaw-puzzle algorithm sounds like it could be at least as complex as
anything it might replace. Have you sketched out how it will actually work?

One not-uncommon situation where it will have problems is when line-height or
other features are being used in such a way that the inline frames of adjacent
lines overlap. Either you will chop text or you will have very poor page breaks.

> the huge complexity (and probably footprint) win of removing all paginated
> frame-splitting from layout?

How much complexity is this, and where is it? We will need to retain general
frame splitting and next-in-flow infrastructure for inlines. Last I looked at
nsSimplePageSequence and friends it looked under control. Is there a whole lot
of logic in nsBlockFrame that your approach would get rid of? The printing hacks
that I see in layout are mostly to do with unit hacks that would be gotten rid
of by units rework.

> Are these cases going to happen so often that this will be seen all the time,
> or are we talking "let's make our engine do absolutely perfect printing in all
> cases"?

I don't know and probably neither will you until it's implemented. But I fear a
"90% design" because I've seen many such designs come and go in Gecko. Suppose
you take this approach and decide there are cases you refuse to handle. First of
all you need a very clear spec describing exactly what you will do and what you
won't do. Otherwise when bugs come in you don't know if it's WONTFIX or a real
bug in your implementation. Even if you can come up with such a spec, typically
what happens is that you, or your successors, or your assistants, either forget
the spec or else decide that you really do want to get that case right after all
(when it shows up on yahoo.com or your employer's favourite site, for example).
Then you are forced to go in and mangle your 90% design, bring back the bloat
(and more), and not even get it very right. This happened a lot with z-index
until I rewrote the view manager. (Well, reality never quite follows this script
because in reality the person with the 90% design doesn't have a clear idea of
what cases they're refusing to handle.)

To put it another way, arguments of the form "oh, that case will never be
important" are not convincing. I'm not just talking about overlapping frames.
I'm talking about odd gaps at the top and bottom of pages, multipage print
preview, table headers and footers not flowing properly, and other things that I
haven't thought of yet. (OK, I just thought of one more: we should probably fix
block borders so block continuations get borders on all four sides on a new
page. Your approach makes that impossible.)

Note that I am not saying we need to get it 100% perfect today, or even ever. We
do need to be able to cleanly fix any particular case if it happens to become
important.

> I say out with complexity

Me too, but I think there are plenty of places we can reduce complexity without
 cutting features and introducing bugs.

> Features we lose:

Your list is not complete, see above.

> - significantly less complexity and code in layout
> - an understandable and maintainable printing engine

I'm not convinced, especially if we look at the long term when we might have to
fix some of the issues you're deliberately punting.

> - an owner for printing :)

That would be nice. However, it is likely that your code will survive your
tenure as owner. And we all have considerable experience at dealing with the
legacy of departed owners @netscape.com. (Not that I lump you into that group, I
have far more respect for you than some of your predecessors.)
roc, whether intentional or not, such comments may discourage predecessors (who
may even be participating currently) from future participation.
I agree with roc.  I think all the cases that the new code will need to handle
need to be considered much more carefully before proceeding with this, and I
think any algorithm to find the "clip line" using our current data structures
will be a messy and extremely complicated hack that will quickly become
unmaintainable.

Just because the current printing code in layout is too complicated doesn't mean
that the basic design is wrong.  That part of our code could be simplified
greatly, and I'm not entirely sure where such simplifications would lead us.
I apologize, Chris. My comment was out of line. But I assure you that I was
thinking of people long gone.
> Perhaps we should settle our goals before we spend too much time talking about
> methods.

The goals for printing as I see them:
(1) readable output
(2) maintainable code with few bugs
(3) output that is not unexpected to the reader or designer, given especially
    that the designer is generally not testing for printing.

Granted, we can do without 3 and just tell designers to go stuff themselves,
they should have thought about printing when they designed the page if they
wanted it to print in a particular way--but the new printing architecture makes
us meet the goal for those designers.  Still, I'll focus on 1 and 2.

Right now we achieve (1) in most instances and there are many more complexities
we could add can get us closer.  What Karnaze and I propose is, we believe, a
method of achieving maintainability and reducing bugs hugely while sacrificing a
little bit of output.  (See below for how much.)

If we can achieve maintainability without removing splitting from layout, I'd
love to hear a proposal.  I think it's not possible, though.  The fact that no
one cares about printing is in direct opposition to having printing code mixed
in with our layout code.  Layout hackers regularly break printing, as karnaze
and rods can attest.  The more modular, the better.  When you make printing use
normal layout you remove these headaches.

I'm trying to think how we could make the splitting architecture easier to deal
with but coming up blank so far.  Nonetheless, let's move on to the model we are
proposing.

The argument goes:
- If you use a trusted codepath for reflow, you get fewer bugs
- Much code can be removed:
  * All paginated splitting code, including block, absolute, fixed, float, and
    table.  (Karnaze, am I missing any?)
  * nsSplittableFrame can be removed and next/prev-in-flow can be moved to text
    frame (actually it is my hope that it can be removed altogether and we can
    use the sibling pointer in the text frame instead).
  * Probably inline splitting can be simplified a bit; have not investigated
    this.
  * The existing print engine can be removed and replaced with something smaller
    and closer to a common codepath (with bug 206347 as its base).
- This allows us to separate printing almost entirely out of the layout engine.
   Modularity and ease of maintenance of layout and printing comes out of it.
- Module owners and maintainers become *possible*.  Current code is cumbersome
  and difficult to fix bugs in.
- The cases that may break (that I have heard and thought of so far) do not
  justify the complex architecture we have now.  See below for more thorough
  discussion of this.
- We can make absolute-positioned elements work when their placeholder is on
  page 3 but they should appear on page 1.  That doesn't work now due to the way
  reflow works.  It's possible to fix this by various methods including separate
  reflow passes for absolute positioned elements.

> First of all you need a very clear spec describing exactly what you will do
> and what you won't do.

I will be happy to write a very clear spec describing this, but I do not wish to
lock myself or future owners into "you cannot implement this feature"--in fact,
I cannot do that, because the owner has discretion.  That is really the point of
a module owner.  So I will simply say "the goal is maintainability--if you add
new codepaths and special cases for printing then think long and hard about how
often they will be used and tested and whether future maintainers will spit on
your name for the monster you have created."

The problems with this architecture fall under three points:
- extra jagged whitespace at the top and bottom of a page due to clipping from
  the previous page.  This will *not* be dealt with presently.  (ed: given that
  the whitespace clipping only happens for text and maybe a few pixels of
  images, this can only be noticeable when there are huge fonts; <h1>
  way down on a page with left and right navbars is one place it could happen.
  We *may* consider an algorithm to move the entire page break line up a bit to
  smooth out the jaggedness if this causes major problems.)
  * this also means that if the things that are clipped have colored
    backgrounds, the jaggedness will be more noticeable.  This could be a real
    problem, and the hack around it would probably be of loathesome difficulty.  
    Fortunately, printing generally does not print backgrounds to save ink, so
    the problem in practice will be seldom seen.
- Some objects that we do not want to split across pages get split
  * Overlapping text hitting a page boundary may cause one of the two
    overlapping text frames to get clipped.  This problem will remain.  (ed:
    there are not many pages I have seen that have this feature, nor do I
    currently see the effectiveness of excessive overlapping text as a design
    feature.  Still, the possibility remains that it could be used effectively,
    even if we have not seen it happen.)
  * text with its bottom few pixels overlapping the top few pixels of an image
    may get its bottom few pixels clipped off if a page break appears *just so*.
- We cannot layout based on what page we are on and where we are on the page
  * This means floaters cannot act like in TeX, moving to previous or later
    pages.  I can see real arguments for having something like this, and the
    only thing I can say is it seems like it will be hard (though not
    impossible) in our *current* layout system.  We are a browser, not a book
    printer.
  * We cannot repeat things like block borders and table headers and footers at
    the top of the page.  We do not do this in the current system either.
    However, in the future we reserve the option to create a blank space at the
    top and bottom of the page to do just that if it proves a worthwhile feature
    to implement.  This will increase the amount of jagged whitespace on the
    page, especially for table haders and footers.
  * Per-page margins become slightly more difficult: if the left and right
    margins have different *sizes* per page, we cannot reflow the page to
    different sizes at different points.  This may be doable if content attached
    to a different page width is top-level (i.e. child of body for html).  Note
    that this is difficult for our current layout system as well, because a
    single box will have to reflow to different widths.

> The printing hacks that I see in layout are mostly to do with unit hacks that
> would be gotten rid of by units rework.

The unit hacks make things complex but the total size of them is small.  The
paginated layout hacks are complex and large as well.  If you can honestly say
you understand the table splitting code then I will buy you a steak dinner.

> Me too, but I think there are plenty of places we can reduce complexity
> without cutting features and introducing bugs.

OK.  Let me rephrase.  I say out with *unnecessary complexity.*  If the features
are unnecessary and we can live with the bugs we know we will create, then
complexity occurs.

I think we're going to find the bug count goes down over the long run, not up. 
We're reducing code and using common codepaths here.  We *are* going to create
more bugs initially due to the replacement of the print engine, but the new
engine is much simpler than the old.  (I'll post the initial non-clipping
version of the engine once I deal with my font scaling problems.)

> So your algorithm would walk up the page until it found a safe place to cut?
> How far up would you be willing to go?
>
> Your jigsaw-puzzle algorithm sounds like it could be at least as complex as
> anything it might replace. Have you sketched out how it will actually work?

The algorithm for clipping basically just take all frames that cross the page
break and sets clip over them if they are text.  Other things we can do is (for
example) is clip over images that have only a few pixels of the image sitting at
the bottom of the page.  That's not complex at all, and handles most cases.

Even if the algorithm gets more complex (I honestly can't see it getting
anywhere near the complexity of our block, absolute, float, fixed and table
splitting code), the fact that it is *outside* of layout is worth a lot in terms
of complexity.  It's a layered rather than monolithic approach.  Decoupling is a
happy thing.

> However, it is likely that your code will survive your tenure as owner. And we
> all have considerable experience at dealing with the legacy of departed owners 
> @netscape.com.

I presume you are talking about unreadable, unmaintainable code legacy.  If this
code is either of the two, then I expect a thrashing and a very heavy review-
and super-review-.  The major point of this rewrite is maintainability.  The
review system was built in part to deal with the problem of unmaintainable code
that only one guy can understand.  So I just don't think your comments apply
anymore.
> Granted, we can do without 3 and just tell designers to go stuff themselves,
> they should have thought about printing when they designed the page if they
> wanted it to print in a particular way

If they wanted it to print in a particular way, then surely that implies they
thought about printing and maybe even tested it?

How many real examples are there where we're not achieving goal (3) today and
someone actually cares?

> this can only be noticeable when there are huge fonts

This is only because you're willing to cut images arbitrarily, which is
something I only just realized. That's going to make a lot of pages with images,
look very ugly on paper. Your average news article with a large floated picture
of Osama bin Laden is likely as not going to have that picture chopped up.

> If we can achieve maintainability without removing splitting from layout, I'd
> love to hear a proposal.

There are several proposals for improving layout maintainability in general.
They would help printing along with everything else. The units plan for example.

> I do not wish to lock myself or future owners into "you cannot implement this
> feature"--in fact, I cannot do that, because the owner has discretion.

In theory yes. But your design will certainly constrain the module owner. If a
bug gets filed regarding one of the things that you're deliberately breaking
here, and it gets marked topembed+ or something similar, the module owner is in
a tough situation: WONTFIX and take the heat, or do some monstrous hack which
violates your design. In my experience the monstrous hack wins every time. I'm
pleading with you to not put the future owner in this situation.

If you do anyway, a document describing exactly what cannot be expected of the
current design will be invaluable to the future owner in case he or she needs to
explain why someone's favourite bug should not be fixed. I'm very serious about
this.

> * All paginated splitting code, including block, absolute, fixed, float, and
>   table.  (Karnaze, am I missing any?)

This doesn't look like a lot of code to me, except table. But there you have
table headers and footers so tables are 'hard' any way you cut it.

>  * nsSplittableFrame can be removed and next/prev-in-flow can be moved to text
>    frame (actually it is my hope that it can be removed altogether and we can
>    use the sibling pointer in the text frame instead).

This could be a win, I guess, but again it doesn't look like much code.

> nor do I currently see the effectiveness of excessive overlapping text as a
> design feature.

Sometimes the text doesn't visibly overlap but the inline frames do because of
ascent and descent. I see this reasonably often (it leads to slightly strange
effects when you select text, because the selection background of the
most-recently-painted frames overpaints the text of less recently painted
frames). Maybe it's because I don't have the fonts Web designers expect. But
then, neither do printers.

[block borders and table headers and footers]
> in the future we reserve the option to create a blank space at the top and
> bottom of the page to do just that if it proves a worthwhile feature
> to implement.

But how would you paint block borders at the top and bottom of pages, since no
block actually has a border there in your world? Looks like you'll have to
duplicate block border painting code.

> * Per-page margins become slightly more difficult:

I have no concerns about this. I have no idea how anyone would even configure
per-page margins.

> If you can honestly say you understand the table splitting code

Oh, I wouldn't say that :-). But I wouldn't necessarily jump from "this code is
nasty" to "this whole approach is wrong". We often have been able to avoid that.

> I think we're going to find the bug count goes down over the long run, not up.

Perhaps so, but I fear we're going to move from bugs that can be fixed to bugs
that can't (well, not without undoing what you hope to achieve here).

> We're reducing code and using common codepaths here.

I'm all in favour of this. I just ripped out native scrollbars for form controls
and replaced them with standard Gfx scrollbars. But in that case there was
nothing the native scrollbars were doing for us that Gfx scrollbars couldn't do.
In fact, Gfx scrollbars do more, so it was a win. There are many places where
these no-brainer wins exist, but I don't think this is one of them.

> The algorithm for clipping basically just take all frames that cross the page
> break and sets clip over them if they are text.

So inline images will not be clipped? They'll appear alone at the bottom of the
page? How about bullets?

> I presume you are talking about unreadable, unmaintainable code legacy.

Only partly. A few specific things that have been done that I don't want to see
repeated:
-- legacy code designed without providing a path to address possible needs
(e.g., the view manager was designed with no way to provide for content which
overlaps sibling IFRAMEs. Presumably wasn't "important" when the view manager
was designed, became superimportant during the DIG era circa Moz1.0. Required a
view manager rewrite.)
-- legacy code designed to work in a way that doesn't resemble the way CSS or
HTML is organized and described (even though the design may actually meet a
certain level of CSS compatibility). This has bitten us in various ways.
Typically CSS is revised or extended and it turns out to have diverged from what
we're doing and we have to get back to it by circuitous paths.
-- legacy code which was prematurely optimized. Not a problem here as far as I
can tell.
-- legacy code which is just a hack to solve the #1 bug of the day but which
would need to be removed by a real patch to fix the underlying issue. Its
presence makes the real fix harder and reduces the chance of a real fix being
produced. Probably not a problem here either.

You can write beautiful code that commits all of the above sins. You might even
get super-review and review. But I don't want to rely on r/sr to filter out the
above problems. That's a waste of everyone's time.

> The major point of this rewrite is maintainability.

I've already expounded at far too much length on why I think this could easily
lead to a new kind of maintenance nightmare. Time for me to leave the stage.
This needs more discussion, in a newsgroup -- m.layout is the obvious place.

"Rewrite" should set off alarm bells in every Mozilla developer's brain.

/be
> That's a waste of everyone's time.

I meant especially a waste of time for the person who's doing all this work.

You complained about infrequently tested code paths. I sympathize. But whatever
we do we'll have a lot of infrequently tested code paths in layout. The solution
is to test better or provide new features which exercise those code paths. For
example, a sidebar which provided paginated thumbnails of the current document
would be very useful for testing layout patches.
Multi-column text could be another future feature that you're foreclosing here.
http://www.w3.org/TR/css3-multicol
> I have no idea how anyone would even configure per-page margins.

I believe it could be done using the CSS2 @page construct, if we ever support it.
Multicolumn support would make it much easier to exercise block and table
splitting. What's the status of that W3C proposal, Hixie?
OK, after long and hard thinking I have come up with a model that allows us to
accomplish a simplification of the printing architecture, do everything we can
do currently, *and* not stymie future attempts to implement good features like
multi-column layout.  This architecture still removes splitting, but keeps
layout "page-aware".  Basically you lay out to an unconstrained canvas, but
layout is aware of where page breaks are and moves things to work around them
when it is necessary.  In the case of text, this means that when a block notices
text is about to be placed on a page break, it moves it down below the page
break and continues with the algorithm.

This architecture, and how it works, is described at
http://www.johnkeiser.com/mozilla/newprinting/index.html .  There are more
features I need to describe in there, but the architecture should be flexible
enough to handle *anything* we need to do in the future, including splitting (I
have come up with a simpler splitting algorithm in this document that could be
implemented in the future).

Borders around boxes aren't handled with this algorithm, though they are not
precluded and could be handled with the simpler splitting in the future.  I do
not believe we *should* handle these (it would look bad IMO) but since your
point was that the architecture makes it impossible to do so, I think this
addresses those concerns.

Further comment welcomed, either here or (probably preferable) email.
Status: NEW → ASSIGNED
I'd rather have comments here than in email, because this is more permanent and
more accessible for others in the future.
So here are a few comments:

> People who hack on Gecko layout should not have to know about printing for it
> to work: new types of frame should print for free.

This is already true; most frame types do not have any printing-specific code
(except possibly scaling code, which you're not addressing).

Do you agree that the current architecture supports all your goals, except for
the goal that isn't listed, "simpler code"? (I think "print fixed position
frames on all pages" can be accomodated; for fixed frames whose placeholder is
not on the first page, we just have to run backwards and create new frames for
the previous pages, then incrementally reflow them. They're out of flow so this
ought to converge.)

> when text is about to be placed on a page break, block moves that text down
> far enough to place it on the next page and continues with its algorithm.

This requires the block to calculate margins for the pages (and header/footer
reservations, etc) so it can know how far down to push the child frames. To me
this seems like you're pushing more printing knowledge into "core layout", not
less. IIRC the way we have things now, the block just splits off a continuation
and the page frames use their knowledge of the page environment to reflow the
continuation.

> When images are placed on a page break boundary and look bad (it has not been
> determined what specifically the criteria is for this, my suggestion is when a
> very small part of the image will show up on one or the other page)

So you're not going to try to prevent large floating images from being cut up? I
think this is a big lose for anyone who prints magazine-style articles.

[columns]
> This will be done by having such blocks aware of where the page break is and
> flowing to the next column.

OK, so here (and possibly for table headers/footers) you'd bring back frame
splitting. If we reach this point, won't this new architecture be superfluous,
since we could then use frame splitting to handle normal page breaks again?
> This is already true; most frame types do not have any printing-specific code
> (except possibly scaling code, which you're not addressing).

Not true; all frames have to support splitting.  If they manage to inherit from
another frame that supports splitting (nsBlockFrame) then that is a lucky thing.

> Do you agree that the current architecture supports all your goals, except for
> the goal that isn't listed, "simpler code"?

Given the above, "no."  It is very difficult (though admittedly not completely
impossible, you can make major changes to accomodate them) to do some of the
things listed, such as print absolute positioned frames whose placeholders are
in the wrong place (this is the requirement that everything that can show up on
the screen must be able to print as well) .

I was trying to avoid the goals of "simpler code" and more importantly "code
that runs through the normal codepath as much as possible".  Those are not goals
of a printing architecture as much as they are of getting from this printing
architecture to the new one.

Either architecture, with *further* architectural changes, can be made to
support anything.  I think the architecture we're up and developing now is a
whole lot more conducive to those things.  I am glad that you popped in here; it
forced me to bring this flexibility that it didn't have before.

> (I think "print fixed position frames on all pages" can be accomodated; for

Note that absolute-positioned frames do not split right now at all; there is a
patch for it that needs work.

Both absolute- and fixed-positioned frames have this problem, and the way we
reflow things currently would require major hacks like that, probably involving
lots of code, definitely involving bizarre code.  Which is OK, we can do
anything we want.  We are coders, with big enough changes you can support
anything.  You could even say that changing from the original model in this bug
and the new model would have been just another major change.

> This requires the block to calculate margins for the pages (and header/footer
> reservations, etc) so it can know how far down to push the child frames. To me
> this seems like you're pushing more printing knowledge into "core layout", not
> less.

This is handled when you create the size of the viewport.  Nobody knows nothing
about page margins.  The viewport is the size of the page after margins.  The
page breaks are spaced vertically the size of the page.  When we scroll and
paint, we paint with an offset on the page.  Am I explaining this clearly?  (I'm
not sure I am, it's a bit late :)

> IIRC the way we have things now, the block just splits off a continuation
> and the page frames use their knowledge of the page environment to reflow the
> continuation.

The way we have it now nobody knows about margins either, because the blocks you
reflow in are page content frames the size of the page after margins are
applied.  Same deal with respect to margins, pretty much, except the proposed
architecture is simpler.

> So you're not going to try to prevent large floating images from being cut up?
> I think this is a big lose for anyone who prints magazine-style articles.

(a) we don't do this now
(b) this will be just as easy in the new architecture if we decide to do it

I don't plan to do it now; I called it a "controversial feature" because I don't
currently believe web authors will expect or want it unless we put a specific
CSS selector in there like "float-cohesion: stay-together" for people that are
designing for print.  I wouldn't have a problem with that, and it should be just
as easy (read: really hard) to do in this architecture as the old one.

> OK, so here (and possibly for table headers/footers) you'd bring back frame
> splitting. If we reach this point, won't this new architecture be superfluous,
> since we could then use frame splitting to handle normal page breaks again?

Depends whether you do splitting in the new way described or not.  The new way
does not require every frame to be printing-aware.  Just the frames that
actually are interested in splitting.  The new philosophy is, "if you want to
split, go ahead, but don't force everyone else to bend to it."  Do not force
blocks to pay for the sins of table cells.

Finally, even if you do splitting in the *old* way, it's *still* not
superfluous.  You don't have to split the parent containers of the thing being
split, just the children ... and absolutely positioned frames and fixed
positioned frames do not have to split, which is a ton of code (some of which is
not yet checked in!).  And you have a much simpler and more maintainable print
*engine.*
Aha, I thought of a way to explain the concept of the margins better.  Pretend
we're laying out on the long sheet of paper, so to speak, and then we cut each
page off and *frame* it (the frame being the margins).  Layout itself knows
nothing of the page margins.
> Not true; all frames have to support splitting.

Well, currently they don't. The only frames with splitting code are the ones
that actually get split. Images and form controls and HRs and most other frames
don't think about splitting at all.

In the current architecture, who else has splitting code other than
nsBlockFrame, tables, and nsAbsoluteContainingBlock?

I can see two advantages of your approach as far as simplifying coding goes:
1) containers for page-break-aware frames don't necessarily need to be
page-break-aware
2) any frame can be split just by cutting it along an arbitrary horizontal line
Is this an accurate summary?

But I don't see 1) as a big win since we want blocks and tables to have
intelligent page breaking anyway. And the only containers for blocks and tables
are other blocks and tables, plus nsAbsoluteContainingBlock. So all you really
get from 1) is making nsAbsoluteContainingBlock easier to handle.

For 2), I don't see why we would ever want frames other than blocks and tables
to span page boundaries.

> Am I explaining this clearly?

Yes... but this implies that table header/footer reservations must be applied to
all pages since they'll determine the viewport size. So when there are multiple
consecutive tables, or tables side-by-side, with different header and footer
heights, you'll have to reflow the document once to compute the maximum size of
headers and footers and hence the viewport height. Then you'll have to reflow it
again so the blocks can get the right page breaks. You'll have to reserve space
for headers and footers even if no table actually spans a page boundary because
you won't know where the page breaks are until the second reflow. Is this all
correct?

> (a) we don't do this now
> (b) this will be just as easy in the new architecture if we decide to do it
We don't do this now for floats, but we do do it for block and inline images. I
guess it's easy enough to get block and inline images to work in your new
architecture.

> I called it a "controversial feature" because I don't currently believe web
> authors will expect or want it
Perhaps for floats in general, but I think that people will not expect things
like images or form elements to be split across page boundaries. That's why I
don't think your advantage #2 I mentioned above is very helpful.

> Do not force blocks to pay for the sins of table cells.
But you would be splitting blocks to make columns work. Right? And even before
that blocks will be "page break aware" so they can stop lines from spanning a
page boundary.

> And you have a much simpler and more maintainable print *engine.*

I'm not sure what you mean by this. You'd have made nsAbsoluteContainingBlock
simpler than it would otherwise have to be. On the other hand you'd have added a
lot of new code and still not achieved very satisfactory results for things like
table headers/footers, and print preview showing multiple pages at once. Your
plan does include many other worthwhile improvements in the "engine" (offline
printing, grid printing and N-up printing, better printing API) but they could
be achieved without changing the way layout does splitting. Note "screendump
printing" would be a special case of grid printing where you reflow the document
to the page width but with arbitrarily high page height, and then cut like mad.
nsImageFrame inherits from nsSplittableFrame, the CSSFrameConstructor has code
for splitting nsImageFrame, and the code in nsImageFrame::Paint most certainly
knows about frame splitting....
Ah yes, I stand corrected. I'm not sure why we really want to split images so
maybe we could just take all that out.
Oh, I see from bug 105166 that some pages have tall spacer images that were
causing bad page breaks, so we do want to split images that are larger than a
certain height. The new architecture would eliminate the need for this splitting
code in nsImageFrame, but the code doesn't seem like a big deal to me. Also, it
would have to be brought back for columns.
We're not doing this.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: