Closed
Bug 348901
Opened 18 years ago
Closed 17 years ago
Extra whitespace from HTML source in accessible text
Categories
(Core :: Disability Access APIs, defect)
Core
Disability Access APIs
Tracking
()
RESOLVED
FIXED
People
(Reporter: parente, Assigned: aaronlev)
References
(Blocks 2 open bugs)
Details
(Keywords: access)
Attachments
(2 files, 10 obsolete files)
224 bytes,
text/html
|
Details | |
50.46 KB,
patch
|
surkov
:
review+
dbaron
:
approval1.9+
|
Details | Diff | Splinter Review |
Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a1) Gecko/20060816 Minefield/3.0a1 1) Run firefox. 2) Navigate to http://www.mozilla.org/access/unix/new-atk.html 3) Use at-poke. 4) Look at the third from the bottom accessible in the document (i.e. the table cell in the first column of the last row of the big table on the page). 5) Notice the leading whitespace in the static text that is not visible on the page. The whitespace appears to be coming from the raw HTML source.
Assignee | ||
Updated•18 years ago
|
OS: Linux → All
Hardware: PC → All
Assignee | ||
Updated•18 years ago
|
Summary: Extra whitespace in accessible text → Extra whitespace from HTML source in accessible text
Assignee | ||
Comment 1•18 years ago
|
||
This is going to make things more quite a bit more complicated in nsHyperTextAccessible.cpp
Comment 2•18 years ago
|
||
Comment 3•18 years ago
|
||
i think first we should have a list showing all scenerio which leading whitespace is omitted.
Assignee | ||
Comment 4•18 years ago
|
||
We probably want to expose the transformed text, whatever it is. For example, CSS rules can make test that is exposed as all caps or lowercase, even though the original content is not. No doubt the text transformer has all sorts of interesting smarts.
Assignee | ||
Comment 5•18 years ago
|
||
After the new text frame rewrite in bug 333659, roc will add a method for us to get the transformed text directly. That will include giving us all caps if CSS text transform to uppercase is used, and giving us the rendered whitespace instead of source whitespace. So we need to wait until that lands and ask for a public method to get that.
Depends on: 333659
*** Bug 356369 has been marked as a duplicate of this bug. ***
Assignee | ||
Updated•18 years ago
|
Comment 7•18 years ago
|
||
I'm curious if anyone has any suggestions for how an assistive technology can identify the difference between extra whitespace that supposed to be shown versus whitespace that isn't supposed to be shown.
Comment 8•18 years ago
|
||
Sorry, I pressed "Commit" too soon - what I meant to type was "how an assistive technology can determine if whitespace is erroneously exposed by Gecko."
Assignee | ||
Comment 9•18 years ago
|
||
Will, I worked on this some today. It's dependent on some core layout changes which will allow us to query for the actual rendered text, as well as turn a DOM offset into a rendered offset into that text. It's in the works. As soon as bug 375760 is checked in, I can start trying different things. If you need a quick fix now, I suggest mapping multiple spaces to one space, but this will not help you with <pre> etc. And of course you can't do it in entry fields. If we for some reason have to get this done in the next few weeks, we'll have to expose an object attribute to tell you whether to compress multiple spaces to 1 or not.
Comment 10•18 years ago
|
||
(In reply to comment #9) > Will, I worked on this some today. It's dependent on some core layout changes > which will allow us to query for the actual rendered text, as well as turn a > DOM offset into a rendered offset into that text. Thanks Aaron! The main impact this has is on our braille users. If we can detect that we're getting bad whitespace, we can compress it, but we'd really like to also present good whitespace. :-) BTW, things are starting to look a lot better these days. The fix for bug 363214 is especially useful.
Assignee | ||
Comment 11•18 years ago
|
||
We need to call nsTextFrame::EnsureTextRun and get the returned gfxSkipCharsIterator and the mTextRun. This means new a API to call that externally from layout. Once we have those pieces of info we can use: gfxTextRun::GetText8Bit/GetTextUnicode -- get the rendered text without extra whitespace. gfxSkipCharsIterator::ConvertOriginalToSkipped -- convert content offset into trimmed text run offset gfxSkipCharsIterator::ConvertSkippedToOriginal -- convert content offset into trimmed text run offset
No, you should add a textframe method that returns the string you want. You should probably implement this in a new nsITextFrame interface implemented as a tear-off so it doesn't bloat textframes.
Assignee | ||
Comment 13•18 years ago
|
||
Assignee | ||
Comment 14•18 years ago
|
||
Unfortunately this is a bit inconistent with nsIFrame in that callers will leak unless they use an nsCOMPtr<nsITextFrame> or explictly release the nsITextFrame. They might not realize that because most places no longer do that with nsIFrame -- in fact Release() doesn't do anything for it. I guess if the frame is destroyed but nsITextFrame still exists, that is also a problem. But, I'm not sure how to prevent that or if it would really happen. This old comment in nsIFrame would worry me if it wasn't Hyatt's from 2001. Should it be removed? /* nsIFrame is in the process of being deCOMtaminated, i.e., this file is eventually going to be eliminated, and all callers will use nsFrame instead. At the moment we're midway through this process, so you will see inlined functions and member variables in this file. -dwh */
Alright, how about creating nsTextFrameBase, making it a subclass of nsFrame, make nsTextFrameThebes subclass that, put your APIs in there.
Hmm, I could use nsTextFrameBase too :-)
Assignee | ||
Comment 17•18 years ago
|
||
The new text frame is turned on in trunk so we can work on this again.
Assignee | ||
Comment 18•18 years ago
|
||
nsTextFrameBase seems like a bad idea, because either: 1) We cannot use directly as an interface in mozilla/accessible because it depends on nsFrame or 2) We need to add a new interface that it can support, in which case we don't need the extra class at all, and I think we're adding 4 bytes per text frame instance because of multiple inheritance. Robert, what am I missing?
Attachment #270568 -
Flags: review?(roc)
Assignee | ||
Comment 19•18 years ago
|
||
Attachment #270568 -
Attachment is obsolete: true
Attachment #270569 -
Flags: review?(roc)
Attachment #270568 -
Flags: review?(roc)
Assignee | ||
Comment 20•18 years ago
|
||
Per our email, putting the new methods directly in nsIFrame.
Attachment #264502 -
Attachment is obsolete: true
Attachment #270569 -
Attachment is obsolete: true
Attachment #270732 -
Flags: superreview?(roc)
Attachment #270732 -
Flags: review?(roc)
Attachment #270569 -
Flags: review?(roc)
+ virtual PRUint32 GetRenderedTextLength() { return 0; } + + /** + * Convert a text content (original) offset to a skipped (rendered text) offset + */ + virtual PRUint32 OriginalToSkippedOffset(PRInt32 aOriginalOffset) { return aOriginalOffset; } + + /** + * Convert a skipped (rendered text) offset to a text content (original) offset + */ + virtual PRInt32 SkippedToOriginalOffset(PRUint32 aSkippedOffset) { return aSkippedOffset; } Why do you need these three methods? The first two are trivial to implement with AppendRenderedTextTo. The last one can be implemented with it using binary search.
Assignee | ||
Comment 22•17 years ago
|
||
> The first two are trivial to implement with AppendRenderedTextTo. I'm trying to avoid the string copies. There will be a lot more of them if I use AppendRenderedTextTo() to implement. > The last one can be implemented with it using binary search. That sounds extremely slow and overcomplicated, compared with adding a method here. I can buy maybe removing GetRenderedTextLength() and OriginalToSkippedOffset(), but it seems a little extreme to do remove this method by implementing it using a binary search.
Is there any reason to believe this code is performance sensitive? How many times per second can it be called?
Assignee | ||
Comment 24•17 years ago
|
||
> Is there any reason to believe this code is performance sensitive? It depends on the particular AT that's being used. Assistive technologies often walk the entire document or selection. At the moment, Firefox's superior performance with Windows screen readers is one of the reasons blind users switch to it, and I'd like it to stay that way :) > How many times per second can it be called? Again, it depends on the particular AT. What do we gain by whittling down the nsIFrame methods down to just 1? Is it worth the extra work? I don't feel like a11y really asks the main codebase for all that much, but I can work to try and remove 1 or 2 methods if you require it.
Assignee | ||
Comment 25•17 years ago
|
||
One more detail on the perf issue. The problem is that these methods are cumulative. When dealing with text offsets at the end of a large paragraph there can be many nodes etc. in between.
OK ok. How about a single method that returns a string and a gfxSkipCharsIterator representing the mapping between content offsets and string offsets? We'd have to move gfxSkipCharsIterator to gfxSkipChars.h/.cpp but that's OK. That gives you the ultimate interface. Do you want to do this frame by frame? We could have the method return a string + iterator for the entire text node (you'd then only call it once on the primary frame for the node). That could be faster for you if performance does matter. You should be aware though that that could represent a mix of RTL and LTR text runs which may not be visually contiguous. If you like that idea I can write the implementation for you. Keeping the number of methods down does matter. Each one costs 4 bytes of vtable data per frame type, there's >120 frame types, so that's 1.4K of code if we can save three methods. Worth doing if it's not too painful. BTW your patch doesn't take account of text-transform.
Assignee | ||
Comment 27•17 years ago
|
||
(In reply to comment #26) > How about a single method that returns a string and a > gfxSkipCharsIterator representing the mapping between content offsets and > string offsets? We'd have to move gfxSkipCharsIterator to gfxSkipChars.h/.cpp > but that's OK. That gives you the ultimate interface. That sounds good. > We could have the method return a string > + iterator for the entire text node (you'd then only call it once on the > primary frame for the node). I do like it, but I'm not sure I want to hold up the patch more for it. > You should be aware though that that could represent a mix of RTL and > LTR text runs which may not be visually contiguous. How can text direction switch without a change of DOM nodes? > there's >120 frame types, so that's 1.4K of code if > we can save three methods. Wow, that's true. I understand. > BTW your patch doesn't take account of text-transform. I realized that after I tested it, but didn't want to hold it up for that right now. Is there any easy way to take text-transform into account?
> How can text direction switch without a change of DOM nodes? Embedded characters with a different natural direction. For example, the DOM text english HEBREW (where capital letters are Hebrew characters) would be rendered as english WERBEH with two text frames, one mapping "english " and one mapping "HEBREW > Is there any easy way to take text-transform into account? Not really. I'll do it.
Assignee | ||
Comment 29•17 years ago
|
||
Problem: If I include gfxSkipCharsIterator inside of nsIFrame, then we need to update tons of Makefile.in's to add thebes to REQUIRES.
Assignee | ||
Comment 30•17 years ago
|
||
Actually a worse problem is linker errors -- I can't use gfxSkipCharsIterator from inside mozilla/accessible.
The idea was to move gfxSkipCharsIterator to gfxSkipChars.h/.cpp. Then you just need to link to thebes from accessible. I assume you can do that...
> If I include gfxSkipCharsIterator inside of nsIFrame, then we need to update
> tons of Makefile.in's to add thebes to REQUIRES.
just take a gfxSkipCharsIterator* to fill in, and declare "class gfxSkipCharsIterator;" at the top, so you don't have to include gfxSkipChars.h from nsIFrame.h.
Assignee | ||
Comment 32•17 years ago
|
||
> gfxSkipCharsIterator to gfxSkipChars.h/.cpp
It's already there, or am I missing something
I like the pointer to gfxSkipCharsIterator.
Assignee | ||
Comment 33•17 years ago
|
||
Attachment #271039 -
Flags: superreview?(roc)
Attachment #271039 -
Flags: review?(roc)
Assignee | ||
Updated•17 years ago
|
Attachment #270732 -
Attachment is obsolete: true
Attachment #270732 -
Flags: superreview?(roc)
Attachment #270732 -
Flags: review?(roc)
Assignee | ||
Comment 34•17 years ago
|
||
Attachment #271039 -
Attachment is obsolete: true
Attachment #271045 -
Flags: superreview?(roc)
Attachment #271045 -
Flags: review?(roc)
Attachment #271039 -
Flags: superreview?(roc)
Attachment #271039 -
Flags: review?(roc)
How do you want to handle line breaks? If we trim spaces at the start and end of lines, AND we return the text for an entire text node at once, you'll get things like <p> Hello there -> Hello there -> "Hellothere" Should we replace trimmed whitespace at the end of the line with a single \n, perhaps?
Try this. I haven't tested it, but it builds. I had to add a gfxSkipChars* parameter because I'm constructing a whole new offset-mapping.
Attachment #271045 -
Attachment is obsolete: true
Attachment #271045 -
Flags: superreview?(roc)
Attachment #271045 -
Flags: review?(roc)
Assignee | ||
Comment 37•17 years ago
|
||
Soft line breaks should come out as a single " ". We only want \n for forced line breaks in white-space: pre or via <br>.
Assignee | ||
Comment 38•17 years ago
|
||
In other words, I only meant to trim extra spaces at the start and end of hard line breaks like block level elements, because it's redundant If that's too much extra logic for now we can deal with it later.
Assignee | ||
Comment 39•17 years ago
|
||
Haven't tried it yet, but looking at the code, I would expect these impls to change: +PRUint32 nsHyperTextAccessible::OriginalToSkippedOffset(nsIFrame *aFrame, PRInt32 aOriginalOffset) +PRInt32 nsHyperTextAccessible::SkippedToOriginalOffset(nsIFrame *aFrame, PRUint32 aSkippedOffset) They haven't changed, but the amount of text at the start of the string can be less now, because whitespace is trimmed off.
Assignee | ||
Comment 40•17 years ago
|
||
Hi Roc, the GetRendererText() is completely different. Are you walking through the text continuations like this because of the line trimming? Again, I did not mean that I wanted space trimmed there. Only at the start or end of a DOM node where multiple spaces should have disappeared. The kind of extra space i was talking about is like this: 1. <p> Hello </p> 2. <p>Hello</p> These are rendered the same but my previous code gives me: 1. " Hello " 2. "Hello" I wanted just "Hello" in both cases.
Assignee | ||
Comment 41•17 years ago
|
||
Sorry for the noise. Now that I'm looking at the code it seems it may be right. But, I ran out of time for testing. Back after next week.
Assignee | ||
Comment 42•17 years ago
|
||
Had a quick moment to run it. It gets stuck in an infinite loop -- keeps coming back line in GetRenderedText() skipChars.KeepChar(); -- maybe stuck on the same char? Will have to work more on it with you when I return. I can come up with a Javascript testcase to help us work together more easily.
+ TrimmedOffsets offsets = f->GetTrimmedOffsets(frag, PR_TRUE); Change PR_TRUE to PR_FALSE here to not trim the space from the ends of lines. You still might get a space at the end of a block. I don't see an easy way to suppress that. Infinite loop --- yes! Add "iter.AdvanceOriginal(1);" to the end of the while-loop body.
Assignee | ||
Comment 44•17 years ago
|
||
One question: we're appending 1 character at a time. Is setting the string size and then using an iterator a lot better?
Attachment #272771 -
Flags: superreview?(roc)
+ PRBool isLastInContinuation = !textFrame->GetNextContinuation(); + TrimmedOffsets trimmedContentOffsets = textFrame->GetTrimmedOffsets(textFrag, isLastInContinuation); This means that if you have text like "<span>Foo bar baz </span>qux" and there happens to be a line break at the end of the span, the text nodes returned will be "Foo bar baz" and "qux" ... i.e. you lose the space. But if the line break goes after "bar", then you won't lose any spaces. That inconsistency bothers me; I think GetRenderedText should probably not depend on the exact layout. Is it so bad to just possibly have a trailing space at the end of a block? Appending one character at a time is OK I think. Other approaches might be a little faster but they're not going to be hugely better; we should just go for the simpler code unless we discover this is a critical path for some important scenario.
Assignee | ||
Comment 46•17 years ago
|
||
Note: this also fixes bug 387418.
Assignee | ||
Comment 47•17 years ago
|
||
Attachment #272771 -
Attachment is obsolete: true
Attachment #273403 -
Flags: superreview?(roc)
Attachment #272771 -
Flags: superreview?(roc)
Attachment #273403 -
Flags: superreview?(roc)
Attachment #273403 -
Flags: superreview+
Attachment #273403 -
Flags: review+
Assignee | ||
Updated•17 years ago
|
Attachment #273403 -
Flags: review?(surkov.alexander)
Comment 48•17 years ago
|
||
you forgot to update uuid of nsPIAccessible interface.
Assignee | ||
Comment 49•17 years ago
|
||
Attachment #273403 -
Attachment is obsolete: true
Attachment #273748 -
Flags: review?(surkov.alexander)
Attachment #273403 -
Flags: review?(surkov.alexander)
Comment 50•17 years ago
|
||
Probably 'skiped' is much more for layout but for accessibility is it better 'rendered'?
Comment 51•17 years ago
|
||
(In reply to comment #50) > Probably 'skiped' is much more for layout but for accessibility is it better > 'rendered'? > I mean probably OriginalToSkippedOffset -> ContentToRenderedOffset.
Comment 52•17 years ago
|
||
Also since we have two concepts of offsets and texts that are mixed through accessibility module then every our internal method should mention what type of concept it uses. For example. AppendTextTo(string, offset, length) - appends rendered text to a string, offset - rendered offset. and nsIntRect nsHyperTextAccessible::GetBoundsForString(nsIFrame *aFrame, PRInt32 aStartOffset, PRInt32 aEndOffset) - aStartOffset and aEndOffset are content offsets. Though it adds some complexity to code. But I assume we can't avoid this and we guarantee only that public interface methods works with rendered offsets/text. Right?
Comment 53•17 years ago
|
||
Also if our internal methods will use two concepts of offsets (i.e if it's impossible to work with one concept in method arguments) then method shouldn't have 'offset' word, it should have either 'renderedOffset' or 'contentOffset'.
Comment 54•17 years ago
|
||
Sometime in code you work with gfxSkipChars, gfxSkipCharsIterator, GetRenderedText() (I look at nsIFrame* nsHyperTextAccessible::GetPosA), can we put their usage to separate methods to hide unusual structures from every day accessibility code like you did in methods SkippedToOriginalOffset/OriginalToSkippedOffset?
Comment 55•17 years ago
|
||
Comment on attachment 273748 [details] [diff] [review] Update uuid of nsPIAccessible interface. > /** >+ * Append the rendered text to the passed-in string. >+ * The appended text will often not contain all the whitespace from source, >+ * depending on whether the CSS rule white-space: pre is active for this fram e. nit, please quote "white-space: pre" for easier reading. >+static PRUnichar TransformChar(const nsStyleText* aStyle, gfxTextRun* aTextRun, >+ PRUint32 aSkippedOffset, PRUnichar aChar) >+ case NS_STYLE_TEXT_TRANSFORM_CAPITALIZE: >+ if (aTextRun->CanBreakLineBefore(aSkippedOffset)) { >+ nsContentUtils::GetCaseConv()->ToTitle(aChar, &aChar); >+ } >+ break; >+ default: >+ break; >+ } nit, you don't need default statement I didn't dig deeper in the patch yet ;)
Assignee | ||
Comment 56•17 years ago
|
||
> public interface methods works with rendered offsets/text. Right? Right. (In reply to comment #54) > Sometime in code you work with gfxSkipChars, gfxSkipCharsIterator, > GetRenderedText() (I look at nsIFrame* nsHyperTextAccessible::GetPosA), can we > put their usage to separate methods to hide unusual structures from every day > accessibility code like you did in methods > SkippedToOriginalOffset/OriginalToSkippedOffset? In that case it's an optimization to avoid calling layout to calculate those structures twice.
Assignee | ||
Comment 57•17 years ago
|
||
As far as changing the name of all the offsets to be rendered/content: All offsets into the hypertext accessible are rendered offsets. Should we use the name hypertextOffset instead of renderedOffset? Otherwise, in the case of DOMPointToOffset we would get DOMPointToRenderedOffset(), but somehow it's not clear to me from that it's an offset into the hypertext itself. Does that make sense?
Assignee | ||
Comment 58•17 years ago
|
||
If you want to find an easy way to fix it for Firefox 3 that's fine. We can always file a follow-up bug for after Firefox 3.
Comment 59•17 years ago
|
||
I have also seen a related problem for the IAccessible.description field in ARIA widgets. I use to see the problem for the button on http://www.mozilla.org/access/dhtml/button but the problem is no longer there. Has this been fixed or is it a transient problem?
Assignee | ||
Comment 60•17 years ago
|
||
Comment 58 was for a different bug, my mistake.
Assignee | ||
Comment 61•17 years ago
|
||
Scott, I don't see the extra whitespace in description even in Firefox 2, unless it's via describedby. In any case, all of those codepaths will now go through the rendered text method instead, so description should be fine.
Assignee | ||
Comment 62•17 years ago
|
||
Attachment #271166 -
Attachment is obsolete: true
Attachment #273748 -
Attachment is obsolete: true
Attachment #274443 -
Flags: review?(surkov.alexander)
Attachment #273748 -
Flags: review?(surkov.alexander)
Comment 63•17 years ago
|
||
Comment on attachment 274443 [details] [diff] [review] Change some offset names in methods to make it clear what kinds of offsets they are r=me
Attachment #274443 -
Flags: review?(surkov.alexander) → review+
Assignee | ||
Comment 64•17 years ago
|
||
Without this Braille display support of Firefox content is very difficult. Patch is ready to go (has r+sr=roc, r=surkov)
Flags: blocking1.9?
Assignee | ||
Updated•17 years ago
|
Attachment #274443 -
Flags: approval1.9?
Updated•17 years ago
|
Attachment #274443 -
Flags: superreview?(roc)
Comment on attachment 274443 [details] [diff] [review] Change some offset names in methods to make it clear what kinds of offsets they are a1.9=dbaron. But what's the point of passing a null aAppendToString?
Attachment #274443 -
Flags: approval1.9? → approval1.9+
Assignee | ||
Comment 66•17 years ago
|
||
Pavlov, the changes I made for surkov after roc's sr= were only changes to names and comments, and an added UUID. Also, I didn't make any more changes to the layout bits. Are you sure I need the extra sr=? David, passing the null string avoids extra string copies.
Assignee | ||
Updated•17 years ago
|
Attachment #274443 -
Flags: superreview?(roc)
Assignee | ||
Updated•17 years ago
|
Status: NEW → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•17 years ago
|
Flags: blocking1.9?
Comment 67•17 years ago
|
||
Building Camino Trunk is failing with --enable-accessibility (built at approx. 05:00GMT). Removing that line from .mozconfig results in a successful build. I was pointed here by phiw13, I posted my error at pastebin: http://pastebin.mozilla.org/181393
Comment 68•17 years ago
|
||
(In reply to comment #67) > Building Camino Trunk is failing with --enable-accessibility (built at approx. > 05:00GMT). Removing that line from .mozconfig results in a successful build. I > was pointed here by phiw13, I posted my error at pastebin: > > http://pastebin.mozilla.org/181393 > This is due to bug 390280, not this one.
You need to log in
before you can comment on or make changes to this bug.
Description
•