Open Bug 1381823 Opened 7 years ago Updated 2 years ago

Keep attachment header and inline attachments together when printing

Categories

(Thunderbird :: General, defect)

defect

Tracking

(Not tracked)

People

(Reporter: thomas8, Assigned: thomas8)

References

Details

(Keywords: testcase-wanted)

Attachments

(1 file, 2 obsolete files)

+++ This bug was initially created as a clone of Bug #1381338 +++

STR

With "View > Display Attachments Inline", certain attachments will be printed with the message.

Actual result:

Unfortunately, esp. with big images etc., printouts often end up like this (header and content on different pages):

Page 1:
...
-- attached-image.jpg ---------

Page 2:
{img src=attached-image.jpg}

Expected result:
We really want to keep attachment header together with its attachment:

Page 1:
...

Page 2:
-- attached-image.jpg ---------
{img src=attached-image.jpg}
No longer blocks: 699000
Keywords: checkin-needed
Here's a very rough proof of concept patch. 

CSS property {page-break-inside:avoid;} on a surrounding div works, tested with FF browser.

{page-break-*after*:avoid} would make things a lot easier if I understand it correctly, but unfortunately: No support (bug 132035).

So we need a div container where everything we don't want to be separated by page-break is *inside*. That's where the problem starts.

CAVEAT: This patch is NOT RIGHT in many ways (i.e. it's just experimental to get better ideas from others!):

0) I can't test this, so it's a blind flight and may not work at all.

1) For images, it should work quite well (I hope).

2) For inline text attachments, we'd want the file name header <fieldset> to be together with the first lines of text, but not with the entire text. Bad.
Also, I think my closing </div> gets applied to context where I didn't add the opening <div>, because opening div is only added for inline attachments, but I think that text parser is used elsewhere, too. Wrong.

3) List of attachments table: I think it makes sense to keep the entire list together. We're not forcing page-break, but avoiding. I.e. if the entire list fits on last page, it'll stay there. However if it doesn't fit under the last inline attachment, I'd want the entire list to be printed on new page. Some of the attachment list would end up on that page anyway, so it's better to keep it together, for better overview. Which also conveniently allows to throw the printed page with the list away if you don't need it, or to not print that page with the list in the first place. If the list is longer than one page, that might add an extra page, but I think especially in that case the list is probably important enough to have its own dedicated pages, rather than starting somewhere in the middle behind some other printed attachment.

4) Other attachment types. This patch only covers images, text, and attachment list. I'm not sure if we have other types of attachments which might end up without the closing </div>, like calendar invites etc.

5) Scalability. Closing each type in its own routine looks like a very unscalable way of doing this. I think on the starting <div> we might be good, because it's created on the generic attachment header (I think). But the closing </divs> are not in a good place; if we could find a more generic place in code which applies to all types of inline attachments, that would be much better.

I don't think I can push this much further, but maybe somebody else with more knowledge and build environment can.
Attachment #8888042 - Attachment description: Proof of concept patch v.1 → Proof of concept patch v.1 (untested, may not work correctly or not at all, see comment 1)
Attachment #8888042 - Flags: feedback?(jorgk)
Some test.eml's would certainly help.
Keywords: testcase-wanted
Comment on attachment 8888042 [details] [diff] [review]
Proof of concept patch v.1 (untested, may not work correctly or not at all, see comment 1)

Review of attachment 8888042 [details] [diff] [review]:
-----------------------------------------------------------------

I didn't try it since it has obvious problems.

::: mail/themes/linux/mail/messageBody.css
@@ +100,5 @@
>  .moz-attached-image-container {
>    text-align: center;
>  }
>  
> +.moz-mimeAttachmentContainer {

Why use camel case when snake case is used right above?

::: mailnews/mime/src/mimei.cpp
@@ +1698,5 @@
>      opt->state->separator_queued_p = false;
>      if (opt->state->separator_suppressed_p)
>        opt->state->separator_suppressed_p = false;
>      else {
> +      const char *sep = "<DIV CLASS=\"moz-mimeAttachmentContainer\"><FIELDSET CLASS=\"mimeAttachmentHeader\">";

So you put another div around the fieldset. Can't you just use CSS on the existing class? Maybe page-break-after: avoid.

That div should then be closed at
https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/mimei.cpp#1727 which is a few lines down.

::: mailnews/mime/src/mimemoz2.cpp
@@ +1092,5 @@
>    const char *prefix;
>    /* Wouldn't it be nice if attributes were case-sensitive? */
>    const char *scaledPrefix = "<DIV CLASS=\"moz-attached-image-container\"><IMG CLASS=\"moz-attached-image\" shrinktofit=\"yes\" SRC=\"";
>    const char *unscaledPrefix = "<DIV CLASS=\"moz-attached-image-container\"><IMG CLASS=\"moz-attached-image\" SRC=\"";
> +  const char *suffix = "\"></DIV></DIV>";

This is wrong, as well as the following.

::: mailnews/mime/src/mimetpla.cpp
@@ +247,5 @@
>    {
>        MimeInlineTextPlain *text = (MimeInlineTextPlain *) obj;
>        if (text->mIsSig && !quoting)
>        {
> +        status = MimeObject_write(obj, "</div></div>", 6, false);  // .moz-txt-sig

This is wrong, and it won't have any effect since you didn't increase the 6 to 12 ;-)
Attachment #8888042 - Flags: feedback?(jorgk) → feedback-
(In reply to Jorg K (GMT+2) from comment #3)
> Comment on attachment 8888042 [details] [diff] [review]
> Proof of concept patch v.1 (untested, may not work correctly or not at all,
> see comment 1)
> 
> Review of attachment 8888042 [details] [diff] [review]:
> -----------------------------------------------------------------

Thanks for rapid feedback! :)

> I didn't try it since it has obvious problems.

Remember that psychological trick where starting on a positive note (e.g. "Thanks! :)") makes people feel so much more appreciated...? ;)

> ::: mail/themes/linux/mail/messageBody.css
> @@ +100,5 @@
> >  .moz-attached-image-container {
> >    text-align: center;
> >  }
> >  
> > +.moz-mimeAttachmentContainer {
> 
> Why use camel case when snake case is used right above?

Because there was camel case somewhere close in the final print markup.

> ::: mailnews/mime/src/mimei.cpp
> @@ +1698,5 @@
> >      opt->state->separator_queued_p = false;
> >      if (opt->state->separator_suppressed_p)
> >        opt->state->separator_suppressed_p = false;
> >      else {
> > +      const char *sep = "<DIV CLASS=\"moz-mimeAttachmentContainer\"><FIELDSET CLASS=\"mimeAttachmentHeader\">";
> 
> So you put another div around the fieldset.

Hmm, no, not really... (see below)

> Can't you just use CSS on the
> existing class? Maybe page-break-after: avoid.

Kindly refer to my comment 1 on the patch (just reading the first 4 sentences will suffice...) to see why most of your review comments unfortunately don't apply...

> That div should then be closed at
> https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/mimei.cpp#1727
> which is a few lines down.
> 
> ::: mailnews/mime/src/mimemoz2.cpp
> @@ +1092,5 @@
> >    const char *prefix;
> >    /* Wouldn't it be nice if attributes were case-sensitive? */
> >    const char *scaledPrefix = "<DIV CLASS=\"moz-attached-image-container\"><IMG CLASS=\"moz-attached-image\" shrinktofit=\"yes\" SRC=\"";
> >    const char *unscaledPrefix = "<DIV CLASS=\"moz-attached-image-container\"><IMG CLASS=\"moz-attached-image\" SRC=\"";
> > +  const char *suffix = "\"></DIV></DIV>";
> 
> This is wrong, as well as the following.

See above. Or try it. It might be right.

> ::: mailnews/mime/src/mimetpla.cpp
> @@ +247,5 @@
> >    {
> >        MimeInlineTextPlain *text = (MimeInlineTextPlain *) obj;
> >        if (text->mIsSig && !quoting)
> >        {
> > +        status = MimeObject_write(obj, "</div></div>", 6, false);  // .moz-txt-sig
> 
> This is wrong, and it won't have any effect since you didn't increase the 6
> to 12 ;-)

Good catch.
(In reply to Jorg K (GMT+2) from comment #3)
> Comment on attachment 8888042 [details] [diff] [review]

> That div should then be closed at
> https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/mimei.cpp#1727
> which is a few lines down.

For the avoidance of doubt: Unfortunately, that won't have the desired effect (see comment 1).
(In reply to Thomas D. (currently busy elsewhere; needinfo?me) from comment #4)
> Kindly refer to my comment 1 on the patch (just reading the first 4
> sentences will suffice...) to see why most of your review comments
> unfortunately don't apply...
Right, I realised that myself when I was away from the desk (on a moving train) and couldn't withdraw them :-(

What my comment shows - and you also mention it - is that opening a <div> in one place and closing it elsewhere is extremely error prone and confusing. Surely we could locate the code at the same level that inserts the <div> to also insert the </div> and not push that into the children.

I was wrong to suggest closing it after the closing of the fieldset, but I think it could be closed after opt->output_fn():
https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/mimei.cpp#1738
since this is what writes the child.

It's a pity that page-break-after:avoid doesn't work according to bug 132035.

Browsing that bug one gets the impression that page-break-inside:avoid also doesn't work but then you said you've tested it in FF. Perhaps you can attach that HTML page here.
This appears to be working ;-)
Attachment #8888042 - Attachment is obsolete: true
(In reply to Jorg K (GMT+2) from comment #7)
> Created attachment 8888358 [details] [diff] [review]
> 1381823_-_inlineAttachPrint-noPageBreak.patch
> 
> This appears to be working ;-)

That's great, and much better as it's more generic as I was hoping for in comment 1.

Working along my comment 1 (I really wrote that comment for a reason), we've now covered 1), 3), and 5).
However, I think we still fail for 2): Inline plaintext attachments should not always be pushed to a new page competely, we only want to keep their header and the first few lines together. Not sure if that's doable at all.
About 4), I'm not sure if there are any other types of attachments which we're now preventing from page-breaking (but probably should not).
I've restored the <BR> that was removed before the fieldset since that extra space was needed unless we want to add space with CSS.

I don't have time to check all the up's and downs of the approach, so here's a try build. Instead of putting "testcase-wanted" the author should do his own testing and supply test cases.

https://treeherder.mozilla.org/#/jobs?repo=try-comm-central&revision=9efc6649571139e8278b5eccc740247639d88922

Builds and logs will be available at:
https://archive.mozilla.org/pub/thunderbird/try-builds/mozilla@jorgk.com-9efc6649571139e8278b5eccc740247639d88922/
Attachment #8888358 - Attachment is obsolete: true
(In reply to Thomas D. (currently busy elsewhere; needinfo?me) from comment #8)
> However, I think we still fail for 2): Inline plaintext attachments should
> not always be pushed to a new page competely, ...
That's what happens with the patch.
(In reply to Jorg K (GMT+2) from comment #10)
> (In reply to Thomas D. (currently busy elsewhere; needinfo?me) from comment
> #8)
> > However, I think we still fail for 2): Inline plaintext attachments should
> > not always be pushed to a new page competely, ...
> That's what happens with the patch.

I understand from Jörg's comment that the *unwanted* behaviour happens, i.e. that with the patch, we'll always push the complete text of an inline plaintext attachment unto a new page.

I've done some more research and I can't see any way of controlling that behaviour for texts with out-of-the-box css. Anything page-break has very little support in browsers. There's break-after:avoid-page without support. There's orphans/widows without support. And MDN documentation doesn't even list all of these in their overview page (https://developer.mozilla.org/en-US/docs/Web/CSS/paged_media), break-* currently missing.

Hence including inline plaintext attachments into our page-breaking trickery doesn't look like a good idea, as we can end up creating large amounts of whitespace without need. Also, the problem which we're trying to solve here is much less likely/frequently to occur with inline plaintext, because for many/most cases (before the patch) where the attachment header doesn't end up totally at the bottom edge of the page (which is an edge-case quite literally), at least some lines of its text will be printed on same page with the header. Otoh, the problem is much more frequent/significant with images which can easily be bigger (think DCIM), and as opposed to text, we can't break up the data content, so we really always want the attachment header to go on the next page together with its image child.

So in terms of expected behaviour, we're looking for a way to *exclude* inline plaintext attachments from the page-break trick, or only to *include* images and the attachment list at the end (see comment 1, section 3). I'm still not sure if we're showing any other types of attachments inline which we'd want to handle here, like calendar invitations?

Ways of doing this with the current css limitations from our current location in c++ code seem limited.

This is the function in which current patch is doing the trick:
https://dxr.mozilla.org/comm-central/source/mailnews/mime/src/mimei.cpp#1683
> MimeOptions_write(MimeDisplayOptions *opt, nsCString &name, const char *data,
>                   int32_t length, bool user_visible_p)


1) For attachment list, this might be easy: the string inside <legend class="mimeAttachmentHeaderName"> is always "Attachments:", so assuming we can access that localized string from inside c++, we can just do string comparison with the name we get inside the c++ function. Can we?

2) For images, I considered :has pseudo-class to only apply our break-avoiding css on the outer <div> *if* there's an <img> somewhere inside. Unfortunately, :has is not allowed in style sheets, only with functions like document.querySelector. But afasics, even those functions are not available here where we are, in C++ code which generates the markup which is later rendered by browser. No chance to use :has from inside c++, or is there? (see next point wrt javascript)

3) :has selector reloaded, for images: perhaps we could have some *javascript* code which adds/modifies the class attribute of our outer anti-page-break div, let's say in DOMContentLoaded or some other suitable event handler of the generated HTML document or even of our attachment-related HTML elements? Like if we add class="moz-mimeAttachmentContainer" for all attachments here in c++, then later in javascript we change that to class="moz-mimeAttachmentContainer image" with a javascript function which checks for image with class="moz-attached-image-container"  inside our moz-mimeAttachmentContainer? That looks like a possible and relatively safe and easy strategy to me, what do you think?

4) Alternatively for images, any chance that we can compare against &name, e.g. an exhaustive list of inline images file name extensions? We're not an image viewer, so I guess that list might be short?
I would not consider extension-less images to be a problem; first, they're unlikely in real life; hence second, we wouldn't care or lose much if our page-break trick doesn't work for extention-less images, as long as it works for all "regular" images.

5) Alternatively for images: is there anything available inside our c++ function parameters, or can we pass in a parameter to positively identify *image* content?
E.g., can we compare against the beginning of *data parameter, is there anything consistent which marks images?

6) Alternatively for inline plaintext: is there anything available inside our c++ function parameters, or can we pass in a parameter to positively identify *plaintext* content?
E.g., can we compare against the beginning of *data parameter, is there anything consistent which marks inline plaintext attachments?

7) Any other ideas?
Flags: needinfo?(jorgk)
I have no good ideas, it's unfortunate that there is no CSS selector to identify elements which have certain children.

Other then that I'd try to work out up-front in the MIME code that an image will be placed into the container and in this case apply the page-break-inside:avoid;

In the scheme of things, not the most important bug we have and should be spending time on.
Flags: needinfo?(jorgk)
Regarding the comment above that this cannot be easily tested, presumably because of not having a printer, you don't actually need a printer as you can do a Print Preview in Thunderbird. You'll need to tick to show the menu bar, then find it in the File menu.
(In reply to John Veness from comment #13)
> Regarding the comment above that this cannot be easily tested, presumably
> because of not having a printer, you don't actually need a printer as you
> can do a Print Preview in Thunderbird. You'll need to tick to show the menu
> bar, then find it in the File menu.

Thanks John, but no, it's not about having a printer (which can be mimicked by print preview as you correctly point out). It's about creating an appropriate email message as a testcase for this bug, with image and plaintext attachments, and they must have the right dimensions / text length so that in the inline preview, some of the attachment headers (both for at least one image and one text) would be orphaned on one page, and the actual preview alone on the next page. Then that message must be tested against the trybuild of comment 9 with the current patch applied, to see if that fixes the issue (which it currently does but only for images, not for texts, see my comment 8). With images, we just ensure that the image header is with the image on same page. With texts, we want to keep only the first few lines of text together with the text header, but not the entire text as that would result in lots of unused space on pages.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: