Closed Bug 62189 Opened 20 years ago Closed 20 years ago

Clean up plain text converter


(Core :: DOM: Serializers, defect, P3)






(Reporter: bratell, Assigned: bratell)



(1 file)

Known "issues":
We set mCacheLine everywhere. Set it at one place or maybe remove it completely. 
(Do we need the code that handles mCacheLine == false or can the normal code 
be used after some tweak? Would reduce code size.)

What's the relationship between mPreformatted and the preformatted flag in the 
flag field? Only use mPreformatted instead of looking at both locations.

Fix comment "wcwidth() and wcswidth()" and maybe rename one of the methods.

mColPos is used as a bool (0 or not 0). Do it a bool.

Move the wrap code to a routine of it's own.
And remove the old converter nsHTMLToTxtSinkStream if it's not used anymore.
The evolution of the HTML->TXT converter has been quite organic and some things
in it make it really hard to understand. There is also things that are done
differently at different places in the code and things that are done
unnessicarily and too complicated. That makes the code hard to maintain and
makes it harder for new people to understand it. I've made a pass over it last
week and would like a review now. It's mostly name changes and very little real
programmatic changes to make the patch easier to review. There shouldn't be any
deep analysis necessary.

So what have I done:

* Convert all runtime conversions to unicode SPACE with a constant. It can't be
made const static because that causes problems with OpenBSD and module
unloading. (performance, clarity)

* Only handle SELECT once in DoAddLeaf (performance, clarity)

* Only create a linewrapper if we are going to do line wraps (performance)

* Only get data from preferences if we are going to use it. (performance)

* Reformulated some comments, and added function comments. (clarity)

* Changed test for -1 to test for kNotFound. (clarity)

* Renamed the unichar width functions to GetUnicharWidth and
GetUnicharStringWidth. (clarity)

* Removed the mCacheLine member variable and all code that was used when
mCacheLine was false. The code that is used when mCacheLine is true handles that
data too now. (code size, clarity, maintainability)

* Renamed |WriteSimple| |Output| which is a descriptive name. (clarity)

* Renamed |WriteQuotesAndIndent| |OutputQuotesAndIndent| which is a more
descriptive name. (clarity)

* Made the two almost identical code paths in EndLine one codepath. (code size,

* Added check for preformatted text before stripping spaces in EndLine (correctness)

* Made two small functions inlined. One because it's used so much and the other
because it is so small. 

* Unified use of string functions by replacing explicit conversions with

* Renamed arguments to functions so that they starts with an 'a'.

* mColPos was used as a bool. Made it a bool with the name mAtFirstColumn.
Sorry for the delay. Here's the patch, Anthony. I hope you can look at it soon, 
because it will make yours and everyone else's future contacts with 
nsPlainTextSerializer so much less painful.

For the content of the patch see my comment of 2000-12-18 10:59.
Keywords: patch, review
adding jfrancis to the cc list in the off-chance this would effect his code, but 
it looks good to me, and it runs well, so...
For the linebreak assignment code in the Init method (and in a few other
places): its mildly cheaper to do a PRUnichar() cast of single characters
instead of doing a NS_LITERAL_STRING wrapping when doing Assigns and Appends to
strings. Other than that nit, sr=vidur.
This is checked in with single chars using a PRUnichar instead of a literal 
string as mentioned by vidur.
Closed: 20 years ago
Resolution: --- → FIXED
Bratell, please verify and mark VERIFIED-FIXED...thanks!
The code is in. There hasn't been any reports of regressions, but since the bug 
is already VERIFIED, I won't do anything more.
You need to log in before you can comment on or make changes to this bug.