Closed Bug 177047 Opened 22 years ago Closed 22 years ago

use regexes when testing attribute values

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: db48x, Assigned: db48x)

References

(
URL
)

Details

Attachments

(2 files, 9 obsolete files)

I think this fits the bill 22 years ago Daniel Brooks [:db48x] 5.52 KB, patch		Details \| Diff \| Splinter Review
thanks timeless for catching that :) 22 years ago Daniel Brooks [:db48x] 5.21 KB, patch		Details \| Diff \| Splinter Review
address issues 22 years ago Daniel Brooks [:db48x] 9.41 KB, patch		Details \| Diff \| Splinter Review
seperate the fixes, change those two regexes back to straight equality tests 22 years ago Daniel Brooks [:db48x] 8.35 KB, patch		Details \| Diff \| Splinter Review
include great suggestions by Hixie. 22 years ago Daniel Brooks [:db48x] 11.13 KB, patch		Details \| Diff \| Splinter Review
correct diff 22 years ago Daniel Brooks [:db48x] 10.84 KB, patch		Details \| Diff \| Splinter Review
well, it works 22 years ago Daniel Brooks [:db48x] 11.72 KB, patch	timeless : review+	Details \| Diff \| Splinter Review
fix stuff 22 years ago Daniel Brooks [:db48x] 13.39 KB, patch	jag+mozilla : superreview+	Details \| Diff \| Splinter Review
change varible names 22 years ago Daniel Brooks [:db48x] 12.79 KB, patch		Details \| Diff \| Splinter Review
clean diff? 22 years ago Daniel Brooks [:db48x] 12.55 KB, patch		Details \| Diff \| Splinter Review
clean diff 22 years ago Daniel Brooks [:db48x] 12.55 KB, patch		Details \| Diff \| Splinter Review

Daniel Brooks [:db48x]

Assignee

Description

•

22 years ago

I mainly need to do this so that odd whitespace doesn't cause page info to miss things.

Daniel Brooks [:db48x]

Assignee

Comment 1

•

22 years ago

Attached patch I think this fits the bill (obsolete) — Details — Splinter Review

Daniel Brooks [:db48x]

Assignee

Comment 2

•

22 years ago

Attached patch thanks timeless for catching that :) (obsolete) — Details — Splinter Review

mozilla should have given me a syntax error.

Attachment #104343 - Attachment is obsolete: true

jag (Peter Annema)

Comment 3

•

22 years ago

Comment on attachment 104368 [details] [diff] [review] thanks timeless for catching that :) >Index: xpfe/browser/resources/content/pageInfo.js >=================================================================== >+const XHTMLNSre = "http\:\/\/www\.w3\.org\/1999\/xhtml"; >+const XHTML2NSre = "http\:\/\/www\.w3\.org\/2002\/06\/xhtml2"; >+const XHTMLre = RegExp(XHTMLNSre + "|" + XHTML2NSre, "i"); A namespace is an opaque string (in other words, we can't say anything about what's in it), "A" and " A " are two different namespaces, as are "A" and "a", so the only valid comparison is |rawnamespace1 == rawnamespace2|. No toLowerCase, no case insensitive regexp matching, no stripping of whitespace. Then for the attribute values you shouldn't have to strip whitespace in most cases, the parser should do that for you. Where it is needed (this is where you'll wanna ask someone else) you don't want to do /bar/.test(str), since that will return true if str = "XXXbarXXX", not what you want in most cases.

Attachment #104368 - Flags: needs-work+

Hixie (not reading bugmail)

Comment 4

•

22 years ago

Comment on attachment 104368 [details] [diff] [review] thanks timeless for catching that :) +const XLinkNS = "http://www.w3.org/1999/xlink"; +const XULNS = "http://www.mozilla.org/keymaster/gatekeeper/there.is.only.xul"; +const XMLNS = "http://www.w3.org/XML/1998/namespace"; +const XHTMLNS = "http://www.w3.org/1999/xhtml"; +const XHTML2NS = "http://www.w3.org/2002/06/xhtml2" These should not be duplicated here, they are already in at least one place in the C++, can't we avoid code duplication? This is going to get out of hand -- where's the SVG and MathML namespaces, for example -- as people add namespaces in one place and not the other. + if (ns) // if !ns then it's html, and won't have a namespace What about unnamespaced elements? e.g. the root element in the following XML document: <test/> + if (!XHTMLre.test(elem.namespaceURI)) // otherwise, it has to be XHTML This would match "http://www.w3.org/1999/xhtml-this-is-not", as well as " http://www.w3.org/1999/xhtml ", neither of which are XHTML. + if (/icon/i.test(rel)) This is extremely wrong, as are several other tests here. It matches "lexicon", for example. "rel" is a space separated list. It should be treated as such. "next icon" is an "icon". "stylesheet alternate" is an alternate stylesheet, as is "stylesheet appendix alternate". If there is whitespace in CDATA attributes it should be dealt with by the parser, not the page info code.

Hixie (not reading bugmail)

Comment 5

•

22 years ago

See also bug 177069, which could provide a single API for handling <link> elements. I think there may be value in moving a lot of the page info code out of the page info UI and into a separate component, and simply making the page info code be a consumer of that component.

Daniel Brooks [:db48x]

Assignee

Comment 6

•

22 years ago

Attached patch address issues (obsolete) — Details — Splinter Review

ok, the namespace checks should be more correct, duplication of effort not withstanding. If the namespace is null the only thing left to check is the mime type, which wouldn't be bad I suppose. Also, I changed the checks to test for equality with uppercase element names, since html documents allways seem to have uppercase nodenames, whereas xml doc do not. It's faster than using .toLowerCase() too. I changed the expression to correctly match space seperated lists when they should, and single words only in the other cases by testing for word boundaries and beginning/end of line. bug 177069 will be great, but until then... Also, you'll have to forgive me, but I went and fixed another bug in this file while I was waiting. I should seperate the two diffs, but I'm running short on time this morning. I'll see what I can do in a few hours at work.

Attachment #104368 - Attachment is obsolete: true

Hixie (not reading bugmail)

Comment 7

•

22 years ago

HTML element names use capital letters, XHTML elements are lower case. XML in general is case sensitive, so it could be either upper or lower case or both. That is to say, in X(HT)ML, <img> and <IMG> are very different elements. - if ((nn == "link" || nn == "input" || nn == "img" || isBG) && - isProtocolAllowed) + if ((/LINK|INPUT|IMG/.test(nn) || isBG) && isProtocolAllowed) This will also catch, e.g., <BLINK>. You need to dot "^...$" around a lot of you regexps, I think. Why use regular expressions though? I'd have thought direct string comparisons were faster. (They feel more right to me too.)

Daniel Brooks [:db48x]

Assignee

Comment 8

•

22 years ago

Attached patch seperate the fixes, change those two regexes back to straight equality tests (obsolete) — Details — Splinter Review

Suprisingly enough, case insensitive regexes are around 7x faster than using .toLowerCase() to make the equality test case insensitive. If you make them case sensitive the regex doesn't speed up, and the equality test is 35x faster than it was before. My theory is that .toLowerCase() has to go through some unicode crap to find out what letters to substitute, while regexes assume it's ANSI or the C locale or whatever. The sad thing is, XHTML documents are XML, so mozilla has lowercase .nodeNames for them, which means this last patch breaks page info for XHTML documents. This means that the best way to do it is to test for both cases: (nn == "foo" || nn == "FOO" || nn == "bar" ...) It's faster than regexes because it avoids the call to .toLowerCase(); I'm assuming we don't normalize attribute values, so those I'll leave as regexes. Also, I'm assuming that since XHTML is supposed to use lowercase nodes (<img> vs <iMg> or <IMG>), missing an <iMg> or similar won't upset anyone.

Attachment #104490 - Attachment is obsolete: true

Daniel Brooks [:db48x]

Assignee

Comment 9

•

22 years ago

http://db48x.dyndns.org/pageInfo/tests/cmptest.html has the 'benchmarks' I used should anyone care.

URL: http://db48x.dyndns.org/pageInfo/test...

Daniel Brooks [:db48x]

Assignee

Comment 10

•

22 years ago

Attached patch include great suggestions by Hixie. (obsolete) — Details — Splinter Review

suprisingly enough, it doesn't even really slow it down more than half a second or so, which isn't very much compared to the amount to time it generally takes to begin with. Provided I've gotten everything just right, this will prevent a node like <html:LINK> from showing up in the links tab for an xml document, since it's not a valid xhtml link element (it should be <html:link>). In fact, initial tests prove this to be the case. (see http://db48x.dyndns.org/pageInfo/tests/case.xml)

Attachment #104537 - Attachment is obsolete: true

Hixie (not reading bugmail)

Comment 11

•

22 years ago

You use .nodeName, I think you were right when you said you had to use localName. Note: this code, I think, still treats unnamespaced XML as HTML, so: <foo xmlns=""> <LINK href="fail"/> </foo> ...will find a link when it shouldn't. No? Also, is there a bug on the analyse-in-the-background issue?

Daniel Brooks [:db48x]

Assignee

Comment 12

•

22 years ago

yes, but I don't think there's anything I can do about it. and I'm not sure there's a bug for the background processing, but I can look and then file one. that and I just noticed that that's not the right diff, I'll correct that.

Daniel Brooks [:db48x]

Assignee

Comment 13

•

22 years ago

Attached patch correct diff (obsolete) — Details — Splinter Review

the previous one didn't have the very last change I made, which fixed the calls to getAbsoluteURL(). I'm adding those calls in this patch rather than the one for bug 153545 because the two conflict badly. I'm still fixing getAbsoluteURL() in bug 153545 though. as for <LINK> in an xml doc, this code thinks it's an html link node, but then goes to get .rel, but that doesn't exist. The same will happen with other node types as well, because rather than being HTMLLinkNodes or whatever, they're just Elements. So, maybe the correct thing to do is call toString() on the node to get it's class name, and match on that as well? Seems silly, but it does expose the information we need that way...

Attachment #104608 - Attachment is obsolete: true

Hixie (not reading bugmail)

Comment 14

•

22 years ago

Can't you check to see if the document is HTML or XHTML? There really should be a way. I know there is at the C++ level, because I've done it. <?xml version="1.0" ?> <LINK rel="stylesheet" href="fail"/> ...must not be recognised as any kind of link

Daniel Brooks [:db48x]

Assignee

Comment 15

•

22 years ago

well, with this patch it isn't really recognized as a rel link. In fact, it shows up as a rev link, because someone else determined that it wasn't a real link, so it's just an Element, not an nsIHTMLLinkElement or whatever, therefore the test for link.rel fails, and it's inserted in as a link.rev, but with no value for any of the columns except type. Similar things happen for other kinds of nodes. I was thinking of fixing this by calling .toString() on the object, and greping that for HTML and the element name, but this doesn't work for anchors, which return the url rather than a string like [object Foo]. I could explicitly test for the presence of certain properties, and just not add to the tree if they're missing. In the case of link nodes, I could do something like this: if ("rel" in elem && elem.rel) // it's real else if (elem.rev) // it's real but reverse else // it's fake so do nothing but that's a pain, and really doesn't solve the problem of false positives from isa(). Is there a way we can possibly add an attribute to the idl for Element that indicates if this is an html or xhtml node? That would clear this right up.

Daniel Brooks [:db48x]

Assignee

Comment 16

•

22 years ago

looks like |elem instanceof Components.interfaces.nsIDOMHTMLAnchorElement| would be a valid way to test a node to see if it's an html anchor element. I can just remove isa() and use that for all of the tests too.

Hixie (not reading bugmail)

Comment 17

•

22 years ago

Excellent.

Daniel Brooks [:db48x]

Assignee

Comment 18

•

22 years ago

Attached patch well, it works (obsolete) — Details — Splinter Review

Hixie, sure you don't want to review it, or at least test it? You're qualified.

Attachment #104673 - Attachment is obsolete: true

timeless

Comment 19

•

22 years ago

Comment on attachment 104869 [details] [diff] [review] well, it works - //XXX When Java is enabled, the DOM model for <APPLET> is broken. Bug #59686. ... get a real fix on the trunk. how's that going? useless style nit, you seemed to use if (cond), and then there was an if(cond) in @@ -595,15 +595,+14 @@ else after return in +function FormControlFilter(node) + if (!item instanceof Components.interfaces.nsIDOMHTMLInputElement) i don't think !a instanceof b behaves the way you want (so use parens): js> ({}) instanceof Function false js> !({}) instanceof Function false ^^ this is the only real bug i found in your code. everything else in this comment is some sort of minor comment. you might consider doing something like this: const inputElement = Components.interfaces.nsIDOMHTMLInputElement; it would make things easier to read/write, but it might cost more. > imageView.addRow([getAbsoluteURL(elem.code || elem.object, elem), gStrings.mediaApplet, "",elem]); please add a space after "",

Attachment #104869 - Flags: review+

Hixie (not reading bugmail)

•

22 years ago

Attached patch change varible names — Details — Splinter Review

Attachment #104874 - Attachment is obsolete: true

Boris Zbarsky [:bzbarsky]

Comment 24

•

22 years ago

- var formfields = form.elements; + var formfields = findFormControls(form); This will not find the controls on many HTML pages where the parser fixes up the HTML and moves the controls outside the form... (it keeps form.elements in sync, though). Simple testcase: <table><form><tr><td> <input> </td></tr></form></table>

Daniel Brooks [:db48x]

Assignee

Comment 25

•

22 years ago

I'm rather inclined to leave that as an evang issue.

Boris Zbarsky [:bzbarsky]

Comment 26

•

22 years ago

I think you underestimate the number of sites doing that... it's the only way to keep <form> from screwing with layout in NS4. There's a reason we go to such pains to support it.... But it's your call, of course. ;) I can land this as-is if you want to keep it as-is

Daniel Brooks [:db48x]

Assignee

Comment 27

•

22 years ago

Go ahead and check it in, I'll see what I can do later today. Does form.elements always come out in a particular order? The document order maybe?

Boris Zbarsky [:bzbarsky]

Comment 28

•

22 years ago

Yeah, it comes out in document order. I'll land this when I get home this afternoon.

neil@parkwaycc.co.uk

•

22 years ago

Attached patch clean diff — Details — Splinter Review

Attachment #105750 - Attachment is obsolete: true

Daniel Brooks [:db48x]

Assignee

Comment 35

•

22 years ago

Yes, I was getting to that. It does apear that I've made one small mistake, nsiAppletElement on line 222 should be changed to nsIAppletElement. Then there's just bug 122125... :)

Boris Zbarsky [:bzbarsky]

Comment 36

•

22 years ago

> var physWidth = physHeight = 0; I changed this to var physWidth = 0, physHeight = 0; to avoid the strict warning. Checked in with that.

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

Myk Melez [:myk] [@mykmelez]

Updated

•

20 years ago

Product: Browser → Seamonkey

You need to log in before you can comment on or make changes to this bug.