Closed
Bug 177047
Opened 22 years ago
Closed 22 years ago
use regexes when testing attribute values
Categories
(SeaMonkey :: Page Info, defect)
SeaMonkey
Page Info
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: db48x, Assigned: db48x)
References
()
Details
Attachments
(2 files, 9 obsolete files)
12.79 KB,
patch
|
Details | Diff | Splinter Review | |
12.55 KB,
patch
|
Details | Diff | Splinter Review |
I mainly need to do this so that odd whitespace doesn't cause page info to miss
things.
Assignee | ||
Comment 1•22 years ago
|
||
Assignee | ||
Comment 2•22 years ago
|
||
mozilla should have given me a syntax error.
Attachment #104343 -
Attachment is obsolete: true
Comment 3•22 years ago
|
||
Comment on attachment 104368 [details] [diff] [review]
thanks timeless for catching that :)
>Index: xpfe/browser/resources/content/pageInfo.js
>===================================================================
>+const XHTMLNSre = "http\:\/\/www\.w3\.org\/1999\/xhtml";
>+const XHTML2NSre = "http\:\/\/www\.w3\.org\/2002\/06\/xhtml2";
>+const XHTMLre = RegExp(XHTMLNSre + "|" + XHTML2NSre, "i");
A namespace is an opaque string (in other words, we can't say anything about
what's in it), "A" and " A " are two different namespaces, as are "A" and "a",
so the only valid comparison is |rawnamespace1 == rawnamespace2|. No
toLowerCase, no case insensitive regexp matching, no stripping of whitespace.
Then for the attribute values you shouldn't have to strip whitespace in most
cases, the parser should do that for you. Where it is needed (this is where
you'll wanna ask someone else) you don't want to do /bar/.test(str), since that
will return true if str = "XXXbarXXX", not what you want in most cases.
Attachment #104368 -
Flags: needs-work+
Comment 4•22 years ago
|
||
Comment on attachment 104368 [details] [diff] [review]
thanks timeless for catching that :)
+const XLinkNS = "http://www.w3.org/1999/xlink";
+const XULNS =
"http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul";
+const XMLNS = "http://www.w3.org/XML/1998/namespace";
+const XHTMLNS = "http://www.w3.org/1999/xhtml";
+const XHTML2NS = "http://www.w3.org/2002/06/xhtml2"
These should not be duplicated here, they are already in at least one place in
the C++, can't we avoid code duplication? This is going to get out of hand --
where's the SVG and MathML namespaces, for example -- as people add namespaces
in one place and not the other.
+ if (ns) // if !ns then it's html, and won't have a namespace
What about unnamespaced elements? e.g. the root element in the following XML
document:
<test/>
+ if (!XHTMLre.test(elem.namespaceURI)) // otherwise, it has to be XHTML
This would match "http://www.w3.org/1999/xhtml-this-is-not", as well as "
http://www.w3.org/1999/xhtml ", neither of which are XHTML.
+ if (/icon/i.test(rel))
This is extremely wrong, as are several other tests here. It matches "lexicon",
for example.
"rel" is a space separated list. It should be treated as such. "next icon" is
an "icon". "stylesheet alternate" is an alternate stylesheet, as is
"stylesheet appendix alternate".
If there is whitespace in CDATA attributes it should be dealt with by the
parser, not the page info code.
Comment 5•22 years ago
|
||
See also bug 177069, which could provide a single API for handling <link>
elements. I think there may be value in moving a lot of the page info code out
of the page info UI and into a separate component, and simply making the page
info code be a consumer of that component.
Assignee | ||
Comment 6•22 years ago
|
||
ok, the namespace checks should be more correct, duplication of effort not
withstanding. If the namespace is null the only thing left to check is the mime
type, which wouldn't be bad I suppose. Also, I changed the checks to test for
equality with uppercase element names, since html documents allways seem to
have uppercase nodenames, whereas xml doc do not. It's faster than using
.toLowerCase() too.
I changed the expression to correctly match space seperated lists when they
should, and single words only in the other cases by testing for word boundaries
and beginning/end of line.
bug 177069 will be great, but until then...
Also, you'll have to forgive me, but I went and fixed another bug in this file
while I was waiting. I should seperate the two diffs, but I'm running short on
time this morning. I'll see what I can do in a few hours at work.
Attachment #104368 -
Attachment is obsolete: true
Comment 7•22 years ago
|
||
HTML element names use capital letters, XHTML elements are lower case. XML in
general is case sensitive, so it could be either upper or lower case or both.
That is to say, in X(HT)ML, <img> and <IMG> are very different elements.
- if ((nn == "link" || nn == "input" || nn == "img" || isBG) &&
- isProtocolAllowed)
+ if ((/LINK|INPUT|IMG/.test(nn) || isBG) && isProtocolAllowed)
This will also catch, e.g., <BLINK>. You need to dot "^...$" around a lot of you
regexps, I think. Why use regular expressions though? I'd have thought direct
string comparisons were faster. (They feel more right to me too.)
Assignee | ||
Comment 8•22 years ago
|
||
Suprisingly enough, case insensitive regexes are around 7x faster than using
.toLowerCase() to make the equality test case insensitive. If you make them
case sensitive the regex doesn't speed up, and the equality test is 35x faster
than it was before. My theory is that .toLowerCase() has to go through some
unicode crap to find out what letters to substitute, while regexes assume it's
ANSI or the C locale or whatever.
The sad thing is, XHTML documents are XML, so mozilla has lowercase .nodeNames
for them, which means this last patch breaks page info for XHTML documents.
This means that the best way to do it is to test for both cases: (nn == "foo"
|| nn == "FOO" || nn == "bar" ...) It's faster than regexes because it avoids
the call to .toLowerCase();
I'm assuming we don't normalize attribute values, so those I'll leave as
regexes. Also, I'm assuming that since XHTML is supposed to use lowercase nodes
(<img> vs <iMg> or <IMG>), missing an <iMg> or similar won't upset anyone.
Attachment #104490 -
Attachment is obsolete: true
Assignee | ||
Comment 9•22 years ago
|
||
http://db48x.dyndns.org/pageInfo/tests/cmptest.html has the 'benchmarks' I used
should anyone care.
Assignee | ||
Comment 10•22 years ago
|
||
suprisingly enough, it doesn't even really slow it down more than half a second
or so, which isn't very much compared to the amount to time it generally takes
to begin with.
Provided I've gotten everything just right, this will prevent a node like
<html:LINK> from showing up in the links tab for an xml document, since it's
not a valid xhtml link element (it should be <html:link>).
In fact, initial tests prove this to be the case. (see
http://db48x.dyndns.org/pageInfo/tests/case.xml)
Attachment #104537 -
Attachment is obsolete: true
Comment 11•22 years ago
|
||
You use .nodeName, I think you were right when you said you had to use localName.
Note: this code, I think, still treats unnamespaced XML as HTML, so:
<foo xmlns="">
<LINK href="fail"/>
</foo>
...will find a link when it shouldn't. No?
Also, is there a bug on the analyse-in-the-background issue?
Assignee | ||
Comment 12•22 years ago
|
||
yes, but I don't think there's anything I can do about it.
and I'm not sure there's a bug for the background processing, but I can look and
then file one.
that and I just noticed that that's not the right diff, I'll correct that.
Assignee | ||
Comment 13•22 years ago
|
||
the previous one didn't have the very last change I made, which fixed the calls
to getAbsoluteURL(). I'm adding those calls in this patch rather than the one
for bug 153545 because the two conflict badly. I'm still fixing
getAbsoluteURL() in bug 153545 though.
as for <LINK> in an xml doc, this code thinks it's an html link node, but then
goes to get .rel, but that doesn't exist. The same will happen with other node
types as well, because rather than being HTMLLinkNodes or whatever, they're
just Elements. So, maybe the correct thing to do is call toString() on the node
to get it's class name, and match on that as well? Seems silly, but it does
expose the information we need that way...
Attachment #104608 -
Attachment is obsolete: true
Comment 14•22 years ago
|
||
Can't you check to see if the document is HTML or XHTML? There really should be
a way. I know there is at the C++ level, because I've done it.
<?xml version="1.0" ?>
<LINK rel="stylesheet" href="fail"/>
...must not be recognised as any kind of link
Assignee | ||
Comment 15•22 years ago
|
||
well, with this patch it isn't really recognized as a rel link. In fact, it
shows up as a rev link, because someone else determined that it wasn't a real
link, so it's just an Element, not an nsIHTMLLinkElement or whatever, therefore
the test for link.rel fails, and it's inserted in as a link.rev, but with no
value for any of the columns except type. Similar things happen for other kinds
of nodes.
I was thinking of fixing this by calling .toString() on the object, and greping
that for HTML and the element name, but this doesn't work for anchors, which
return the url rather than a string like [object Foo]. I could explicitly test
for the presence of certain properties, and just not add to the tree if they're
missing. In the case of link nodes, I could do something like this:
if ("rel" in elem && elem.rel)
// it's real
else if (elem.rev)
// it's real but reverse
else
// it's fake so do nothing
but that's a pain, and really doesn't solve the problem of false positives from
isa(). Is there a way we can possibly add an attribute to the idl for Element
that indicates if this is an html or xhtml node? That would clear this right up.
Assignee | ||
Comment 16•22 years ago
|
||
looks like |elem instanceof Components.interfaces.nsIDOMHTMLAnchorElement| would
be a valid way to test a node to see if it's an html anchor element. I can just
remove isa() and use that for all of the tests too.
Comment 17•22 years ago
|
||
Excellent.
Assignee | ||
Comment 18•22 years ago
|
||
Hixie, sure you don't want to review it, or at least test it? You're qualified.
Attachment #104673 -
Attachment is obsolete: true
Comment 19•22 years ago
|
||
Comment on attachment 104869 [details] [diff] [review]
well, it works
- //XXX When Java is enabled, the DOM model for <APPLET> is broken. Bug
#59686. ... get a real fix on the trunk.
how's that going?
useless style nit, you seemed to use if (cond), and then there was an if(cond)
in @@ -595,15 +595,+14 @@
else after return in +function FormControlFilter(node)
+ if (!item instanceof Components.interfaces.nsIDOMHTMLInputElement)
i don't think !a instanceof b behaves the way you want (so use parens):
js> ({}) instanceof Function
false
js> !({}) instanceof Function
false
^^ this is the only real bug i found in your code. everything else in this
comment is some sort of minor comment.
you might consider doing something like this:
const inputElement = Components.interfaces.nsIDOMHTMLInputElement;
it would make things easier to read/write, but it might cost more.
> imageView.addRow([getAbsoluteURL(elem.code || elem.object, elem), gStrings.mediaApplet, "",elem]);
please add a space after "",
Attachment #104869 -
Flags: review+
Comment 20•22 years ago
|
||
I'm afraid I don't have a build environment at the moment. And I'm definitely
not qualified, I hardly know JS and I've never seen this area of code before.
Assignee | ||
Comment 21•22 years ago
|
||
the applet stuff seems to have been dormant for some time now, with talks of
rewriting something or other. not a good sign, as far as I'm concerned.
Attachment #104869 -
Attachment is obsolete: true
Comment 22•22 years ago
|
||
Comment on attachment 104874 [details] [diff] [review]
fix stuff
>Index: xpfe/browser/resources/content/pageInfo.js
>===================================================================
>RCS file: /cvsroot/mozilla/xpfe/browser/resources/content/pageInfo.js,v
>retrieving revision 1.44
>diff -u -r1.44 pageInfo.js
>--- xpfe/browser/resources/content/pageInfo.js 1 Nov 2002 02:12:10 -0000 1.44
>+++ xpfe/browser/resources/content/pageInfo.js 1 Nov 2002 19:35:48 -0000
>@@ -212,11 +212,30 @@
> // do nothing, later code will handle the error
> }
>
>+// interfaces for the different html elements
>+const anchorElement = Components.interfaces.nsIDOMHTMLAnchorElement
...
So typically to make it harder to clash with variable names, and to make the
code a little easier to read, I use const nsIDOMHTMLAnchorElement =
Components.interfaces.nsIDOMHTMLAnchorElement.
That way I know I'm comparing against an interface when I'm reading the code
below.
Other than that, I like this. sr=jag
Attachment #104874 -
Flags: superreview+
Assignee | ||
Comment 23•22 years ago
|
||
Attachment #104874 -
Attachment is obsolete: true
Comment 24•22 years ago
|
||
- var formfields = form.elements;
+ var formfields = findFormControls(form);
This will not find the controls on many HTML pages where the parser fixes up the
HTML and moves the controls outside the form... (it keeps form.elements in sync,
though). Simple testcase:
<table><form><tr><td>
<input>
</td></tr></form></table>
Assignee | ||
Comment 25•22 years ago
|
||
I'm rather inclined to leave that as an evang issue.
Comment 26•22 years ago
|
||
I think you underestimate the number of sites doing that... it's the only way to
keep <form> from screwing with layout in NS4. There's a reason we go to such
pains to support it....
But it's your call, of course. ;) I can land this as-is if you want to keep it
as-is
Assignee | ||
Comment 27•22 years ago
|
||
Go ahead and check it in, I'll see what I can do later today. Does form.elements
always come out in a particular order? The document order maybe?
Comment 28•22 years ago
|
||
Yeah, it comes out in document order. I'll land this when I get home this
afternoon.
Comment 29•22 years ago
|
||
Good to see instanceof being put to good use.
Comment 30•22 years ago
|
||
hmm.. .that patch does not apply to a clean pageInfo.js....
Assignee | ||
Comment 31•22 years ago
|
||
where did it fail? I cannot get a clean copy, since cvs-mirror is gone and lxr
is broken. maybe you could mail me one :P
Assignee | ||
Comment 32•22 years ago
|
||
actually, lxr seems to have caught up, how does this patch look?
Comment 33•22 years ago
|
||
That applies. ;) I assume you tested it and such? If so, I'll land it....
Assignee | ||
Comment 34•22 years ago
|
||
Attachment #105750 -
Attachment is obsolete: true
Assignee | ||
Comment 35•22 years ago
|
||
Yes, I was getting to that. It does apear that I've made one small mistake,
nsiAppletElement on line 222 should be changed to nsIAppletElement.
Then there's just bug 122125... :)
Comment 36•22 years ago
|
||
> var physWidth = physHeight = 0;
I changed this to
var physWidth = 0, physHeight = 0;
to avoid the strict warning. Checked in with that.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Updated•20 years ago
|
Product: Browser → Seamonkey
You need to log in
before you can comment on or make changes to this bug.
Description
•