Open Bug 134457 Opened 23 years ago Updated 2 years ago

Relative URLs in text output

Categories

(Core :: DOM: Serializers, defect)

defect

Tracking

()

People

(Reporter: BenB, Unassigned)

Details

(Whiteboard: needs review)

Attachments

(1 file)

Reproduction: 1. Run "<a href="/foo/index.html">a link</a>" (in the document <http://www.example.com/base/url/somedoc.html>) through the HTML->TXT converter. Actual result: "a link </foo/index.html>" Expected result: "a link <http://www.example.com/base/url/foo/index.html>"
Expected result: 'a link' You're asking for a text file. The link in the HTML page is invisible unless present in the status bar or you are viewing the HTML source, and so is not expected from/in a text file. If you want a link, you save as html. If you need the link elsewhere, you copy the link from the html page. There is no need to redefine text. Text has been text a very long time. If you want something different from what is currently defined as a text file, give it a new name instead of emulating M$ and stealing an established name.
> There is no need to redefine text. That's what bug 131166 is about, please keep the discussion where it belongs.
--> Tanu
Assignee: harishd → tmutreja
To me the most sensible solution is: while doing a copy & paste, behavior should be wysiwyg but while saving file as text, we should try to retain the important information to the maximum possible extent. So while pasting the stuff as plain text URL need not be retained though while saving as text it must be. I know this is the kind of bug where we can always have counter arguments but I'm trying to fix this bug on the above basis.
Status: NEW → ASSIGNED
Keywords: patch
Whiteboard: needs review
Akkana, I need to confirm my understanding about the situations where PlainTextSerializer gets Parser nodes as input. I think it's only Mail-> sent as plain text, in which case I do not see any sensible value as base URL of the document. Are there other places from where PlainTextSerilaizer gets parser nodes? In the patch I have passed the value to BaseURI only from DocumentEncoder, I'm not sure if this is required at some other places too.
Does every document have an URL? Is that URL always meaningful? What if I copies from a html mail where someone has written a relative url? What if the html file is opened from the file system?
The plaintext serializer gets parser nodes as input whenever there isn't a document created. For some reason mail does this (serializes to html then re-parses and then serializes to plaintext), but copy/paste will also go through this path, because copy saves the html, and if the pasting application requests plaintext then we parse the html to convert to plaintext. Plus of course the output tests. There may be a few other places where this happens, but those are all I can think of offhand, but you definitely can't assume that only mail does this.
Once again we are close to "how" to do but considering comments #5 and #6, we need to decide "what" to do.
I need some inputs to move further...
QA Contact: sujay → dom-to-text

The bug assignee didn't login in Bugzilla in the last 7 months.
:hsinyi, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: t_mutreja → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(htsai)
Flags: needinfo?(htsai)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: