Closed Bug 288261 Opened 20 years ago Closed 19 years ago

"Save as Text" inserts a lot of extraneous stuff

Categories

(Firefox :: File Handling, defect)

x86
Windows ME
defect
Not set
minor

Tracking

()

RESOLVED DUPLICATE of bug 131166

People

(Reporter: jonlwhite35, Assigned: bugs)

References

()

Details

User-Agent:       Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2
Build Identifier: Mozilla/5.0 (Windows; U; Win 9x 4.90; en-US; rv:1.7.6) Gecko/20050317 Firefox/1.0.2

The "SavePageAs" protocol, when the SaveAsType "TextFiles" is selected, seem to
insert a lot more extraneous stuff than all the other browsers.  For example,
text which is in a Bold or accented font (as rendered from HTML,) or is somehow
highlighted, gets multiple asterisks surrounding it. The top of the page has a
spurious "</>" line.   Horizontal lines which are displayed in the page---which
are NOT text---are simulated by a full line of text dashes.  Email addresses
which are seen (in the html rendering) merely as e.g. foo@bar.com are
embellished with a mailto: phrase to show the invisible (hence, non-Text) part.
 I expect always "foo@bar.com", but often get "foo@bar.com <mailto: foo@bar>"
when the html has a clickable link.

Reproducible: Always

Steps to Reproduce:
1.Go to a typical site with substantial "text" in it (e.g., an online newspaper.)
2.Select File->SaveAS and save to some disk file; in the file selection
dialogue, select "TextFiles" (as opposed to HTML, etc)
3. Observe a lot more than the text from the website page.
Actual Results:  
As mentoned in the details above  lots of non-text stuff is added, presumably to
simulate the effect of certain minimal graphics, like font changes, horizontal
lines, headers enbedded mail links, etc.

Expected Results:  
The facts, m'am, just the facts, m'am.  (where "facts" is "text", and apologies
to Sgt. Joe Friday)  None of the other three common browsers add this unwanted
detritus.
dupeme 255657

1) Reporters URL invalid (requires registration). Testing on:
http://www.mozilla.org/projects/security/security-bugs-policy.html
2) Confirm reported behaviour.
3) Think reported behaviour is by design.
4) Bug 255657 is currently handling the "save as text" issue. This is either a
dupe or a dependency of 255657.

Summary: The "SavePageAs" protocol, when the SaveAsType "TextFiles" is selected, seem to insert a lot more extraneous stuff than all the other browsers. → The "SavePageAs" protocol, when the SaveAsType "TextFiles" is selected, seem to insert a lot more extraneous stuff than all the other browsers.
(In reply to comment #1)
. . .

>2) Confirm reported behaviour.

Then perhaps you should change the Status Field to "CONFORMED"?


> 4) Bug 255657 is currently handling the "save as text" issue. This is either a
> dupe or a dependency of 255657.

Not quite.  255657 recommends flushing the "Textfiles" mode entirely; the
present request is to prune down the objective of this mode to mean scraping all
and only "text" from the page, without trying to mimic non-text formatting, etc.
 Check out Opera for example; even the hated MSIE 6.x gets it "right" (except
for some spurious tab indentation.)

By the bye, I gave a somewhat incorrect version of the URL; the original
offering was the news story in its full html/advertsiing/bells-and-whistles
glory; but my intent was to point to the URL of the so-called "printer friendly"
version; hence have updated the URL field to that one.

True, most newspapers require a "login", but almost all of them---including the
San Jose Mercury News---freely provide new accounts.   You could even use a
"throwaway" email address just to test it out.

You might be surprised to see how many web pages offer a "printer friendly"
version of the page; an this version is indeed more more easily seen to be a
candidate for the "Textfiles" mode.
This is an automated message, with ID "auto-resolve01".

This bug has had no comments for a long time. Statistically, we have found that
bug reports that have not been confirmed by a second user after three months are
highly unlikely to be the source of a fix to the code.

While your input is very important to us, our resources are limited and so we
are asking for your help in focussing our efforts. If you can still reproduce
this problem in the latest version of the product (see below for how to obtain a
copy) or, for feature requests, if it's not present in the latest version and
you still believe we should implement it, please visit the URL of this bug
(given at the top of this mail) and add a comment to that effect, giving more
reproduction information if you have it.

If it is not a problem any longer, you need take no action. If this bug is not
changed in any way in the next two weeks, it will be automatically resolved.
Thank you for your help in this matter.

The latest beta releases can be obtained from:
Firefox:     http://www.mozilla.org/projects/firefox/
Thunderbird: http://www.mozilla.org/products/thunderbird/releases/1.5beta1.html
Seamonkey:   http://www.mozilla.org/projects/seamonkey/
Dup of bug 146951 or bug 131166 or bug 138568.

*** This bug has been marked as a duplicate of 131166 ***
Status: UNCONFIRMED → RESOLVED
Closed: 19 years ago
Resolution: --- → DUPLICATE
Summary: The "SavePageAs" protocol, when the SaveAsType "TextFiles" is selected, seem to insert a lot more extraneous stuff than all the other browsers. → "Save as Text" inserts a lot of extraneous stuff
You need to log in before you can comment on or make changes to this bug.