Kathy has a fix for this...
Status: NEW → ASSIGNED
OS: Windows 98 → All
Hardware: PC → All
Summary: publishing doesn't put in the right cr/lf endings; it does "platform" endings → publishing doesn't put in both cr/lf; it does only platform-specific setting
Target Milestone: --- → mozilla0.9.9
Created attachment 70543 [details] [diff] [review] patch to use persist flags and always publish with cr/lf Note: specifications indicate that http should be able to accept all cr/lf variations but that's only http. 4.x sends both cr and lf so we should do this for 6.x.
Comment on attachment 70543 [details] [diff] [review] patch to use persist flags and always publish with cr/lf firstname.lastname@example.org It would nicer if we could identify the platform we are going to and adjust the output flags accordingly, but I understand that part of the problem is that we have to set these flags before the channels are opened, which means that info isn't available yet. CR/LF is readable on all platforms so I'll give a thumbs up on this. Could we add a comment notating why we are using CR/LF?
Attachment #70543 - Flags: superreview+
Kathy asked me to review, but I can't in good conscience. This will make me avoid using publishing for my own pages; it would bother me very much if my files all ended up with Windows linebreaks when they'd never been touched by a Windows machine, so that I had o run a script on them before I edited them with something other than composer. Mapping linebreaks to something that doesn't match the platform native line break on either the user's machine or the server is evil. Sniffing the server first to determine its platform would be worth it, IMO, to avoid mis-mapping (mapping to the linebreaks that are correct for the server makes a lot of sense to me). Kathy and I discussed it at some length and I said I would clarify that I'm not trying to block checkin of this, just make sure everybody understands that it's evil and it's going to piss some people off. If the only audience we care about is people who use composer exclusively and will never try to edit the source themselves, then perhaps it doesn't matter. Kathy asked if I would prefer if files on the server were mismatched, some with mac linebreaks, some with windows, etc. depending on who edited the file -- and yes, that doesn't bother me, since it's what normally would have happened before publishing, when multiple people edit files locally and ftp them. And yes, if you check this in, please do add a comment explaining it (perhaps referring back to this bug) so someone like me doesn't come along later and try to "fix" it.
What does a standard FTP program do? I bet it simply moves the bits and doesn't touch CR/LF, thus the server gets files with different OS cr/lf styles. So I'm leaning toward Akkana's opinion to not to any substitutions during publishing.
Kathy says FTP programs do appropriate cr/lf conversions, so I think that is what we should do. But for first release, it seems to be non-trivial to figure out server OS, so I'm ok with just a particular convention for now. I don't have a strong opinion on which one.
Here are the results of my testing: ftp file with LF from linux to linux: linebreaks remain LF ftp file with CRLF linebreaks from linux to linux: linebreaks remain CRLF ftp file with CR linebreaks from mac OS9.04 to linux using fetch: linebreaks change to LF. LF linebreaks from linux to OSX: remain LF CRLF linebreaks from linux to OSX: remain windows CR linebreaks from linux to OSX: remain CR CRLF linebreaks from Windows (ftp in DOS prompt under Win ME) to linux: linebreaks change to LF I don't know how to test ftp to windows or Mac OS9 -- how does one run a server on those OSes? Maybe Charley can help. Conclusion: Windows and mac ftp translate linebreaks appropriately for the server (i.e. they don't translate blindly to Windows). Linux ftp doesn't translate at all. None of the clients tested mapped to CRLF when uploading to a linux server.
Can you clarify what ftp client software you were using? What about http? important note about ftp: ftp should do translation (mozilla doesn't work correctly because it always sends data in binary format instead of text) This issue is a separate bug from what http should do.
Note that this means that windows/mac clients, when downloading a file with LF from a unix server, will also do appropriate substitution back to the current platform's line breaks (verified with mac, not tested with windows). One relevant question is: what happens if you have a file that uses CRLF residing on a Unix server? Obviously it will be right on windows and wrong on unix. Will the mac translate this file correctly, or will it end up with double linebreaks (CR translated from the Unix LF, plus the extra CR that didn't belong on Unix)? Likewise, what happens if you have a file with linux line breaks on a windows server, and you ftp download it? It'll be right on linux, but what will it look like on windows and mac? Last study I read, something like 22% of servers were Windows, and that number may be declining (or so the news sites say) due to fears over security issues. So if we're going to guess based on what type of server is more likely, Unix is a better guess than Windows. Looks like both aol.com and netscape.com are both running solaris, so Unix linebreaks are consistent with our own servers. What's the argument for using CRLF?
Here's the list of ftp clients again: mac=fetch, windows=ftp from dos shell, linux=ftp from command line.
The arguments for CRLF include: 4.x parity (tested with Linux 4.x Composer publishing to Linux server) text/html line ending canonical form is CRLF ftp clients who send text (which we don't at the moment) should send CRLF (servers "fix" line endings if necessary) Relevant rfc's include: text/html (http://ietf.org/rfc/rfc2854.txt) section 4: 'As with all MIME text subtypes, the canonical form of "text/html" must always represent a line break as a sequence of a CR byte value (0x0D) followed by an LF (0x0A) byte value.' http 1.1 (http://ietf.org/rfc/rfc2616.txt) section 3.7.1: 'When in canonical form, media subtypes of the "text" type use CRLF as the text line break. HTTP relaxes this requirement and allows the transport of text media with plain CR or LF alone representing a line break when it is done consistently for an entire entity-body.' FTP (http://ietf.org/rfc/rfc0959.txt) 18.104.22.168: "In accordance with the NVT standard, the <CRLF> sequence should be used where necessary to denote the end of a line of text." My conclusions: FTP issue is non-issue at the moment due to bug where ftp data is always uploaded in binary form. If we ever fix that bug, we need to always send CRLF (unless I misunderstand what I've read). HTTP doesn't really care what format you use as long as the whole document uses the same technique. CRLF is the canonical form for text/html files. I prefer we go with the patch in this bug for now. It is the same as 4.x behavior so our users won't be surprised. I'd be ok with just sending LF based on percentage argument Akkana posed but we'll be wrong some day if/when ftp upload is fixed to send only binary data as binary.
I believe that CRLF is the correct way to publish files though it would be nice to see an ascii mode for uploading text on ftp channels.
ftp binary (always) issue is now in bug 127292
Status: ASSIGNED → RESOLVED
Last Resolved: 17 years ago
Resolution: --- → FIXED
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.