Closed Bug 311681 Opened 19 years ago Closed 19 years ago

Google Reader can't be loaded in newest nightly build of Mac OS X

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9alpha1

People

(Reporter: wwzhang, Assigned: bzbarsky)

References

()

Details

(Keywords: regression)

Attachments

(2 files, 6 obsolete files)

User-Agent:       Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.9a1) Gecko/20051007 Firefox/1.6a1
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.9a1) Gecko/20051007 Firefox/1.6a1

The page just showing "loading", while nothing happens in the end. Safari is OK

Reproducible: Always

Steps to Reproduce:
1. Subscribe some feeds on Google Reader
2. Try to read the entries
3.

Actual Results:  
The page just showing "loading", while nothing happens in the end.

Expected Results:  
I want to read the entries, not staring at the bubbles.
I'm seeing this also in current windows trunk build.
I see this error in the js console:
Error: g(a) has no properties
Source File: http://www.google.com/reader/ui/196534137-main.js
Line: 529

this regressed between 2005-09-07 and 2005-09-08. It doesn't happen on the 1.8
branch:
http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2005-09-07+06%3A00%3A00&maxdate=2005-09-08+09%3A00%3A00&cvsroot=%2Fcvsroot
Only thing I can come up with is bug 205778?
Assignee: nobody → darin
Status: UNCONFIRMED → NEW
Component: General → Networking
Ever confirmed: true
Keywords: regression
OS: MacOS X → All
Product: Firefox → Core
QA Contact: general → benc
Version: unspecified → Trunk
Flags: blocking1.9a1?
Could also be a regression from bug 299689... could someone with an account on
this site check?
(In reply to comment #2)
> Could also be a regression from bug 299689... could someone with an account on
> this site check?
Yes, apparantly the patch for that bug is responsible for the regression, I've
tested this with backing out that patch.
Blocks: 299689
Attached file Sorta testcase
This isn't what the website does, but it shows a bug in the ID hashtable code.
Attached patch Possible patch (obsolete) — Splinter Review
This should fix it, I hope -- the problem is that if the table is live we don't
walk the DOM, so can only find things that have been notified on.  While the
table is not live we can actually find nodes that haven't been notified on yet,
which is what saves us if mIdContent comes up null in the entry.

I guess the other option would be to flush in the "IdTableIsLive" case when e
is null, but that would be basically duplicating this whole chunk of code...

Martijn, does this fix the site for you?
Attachment #199031 - Flags: superreview?(jst)
Attachment #199031 - Flags: review?(jst)
(In reply to comment #5)
> Martijn, does this fix the site for you?
No, I still get the same error (I've rebuilt layout and content for the
'possible patch')
I hope you mean content and then layout, right?  ;)

Do we have a test account of some sort of google reader?  Or should I just
create an account for myself?
(In reply to comment #7)
> I hope you mean content and then layout, right?  ;)
Oops, yes, I meant that.

> Do we have a test account of some sort of google reader?  Or should I just
> create an account for myself?
Probably best to create an account for yourself. It's not that hard.
OK, I think we (or reader's developers, depending on viewpoint) have a slight
problem here.  The only question is how to deal.

The code on the page that breaks looks something like this (numbering the lines
for ease of referring to them):

1) var listNode = document.getElementById("something");
2) var node = document.getElementById("x").cloneNode(true);
3) node.style.visibility = "hidden";
4) listNode.appendChild(node);
5) this.itemHeight = this.listNode.offsetHeight/10;
6) listNode.innerHTML = "";

Now at step 2 there is clearly a node in the document with id "x".  It's stored
in our ID hashtable.  In step 4 a _second_ node with the same id is inserted
into the document.  It clobbers the first one in our hashtable.  In step 6, this
second node is removed from the document, and we note that it's in our hashtable
and remove it from there.  Then later on the code calls getElementById("x")
again.  We look up in the hashtable and find nothing.

All this works the same in builds both with and without the patch for bug 299689.

Now comes the difference.  In current builds, since we're maintaining hashtable
state all the time, we "know" that no hashtable entry means no nodes with that
id in the document.  So we just return null.  In older builds we'd search the
whole document.  Unfortunately, said "know" is only valid if people don't
violate the rules by having multiple nodes with the same id in the same document
at the same time.

I think I see three ways forward from here (short of backing out bug 299689), in
increasing order of work on our part:

1)  Do nothing to our code and evangelize google to fix reader to not be broken
    like this.  Might leave other sites broken; hard to tell.
2)  Change UpdateIdTableEntry to do the same thing as AddToIdTable does -- only
    set the content in the entry if it's not already set.  That would fix this
    specific case, but not the one where the _first_ node is what's removed from 
    the document.  Might also break sites which depend on the current behavior,
    of course.
3)  Change the ID table entry to store a list of all nodes with that id instead
    of a single one.  Return just the first one, but maintain the entire list.
    This will guarantee that as long as there is _a_ node in the document with
    that id, we'll return it.  Properly adjusting how we insert into the list in
    various places means we can ensure we change no behavior as far as the
    existing inconsistency between UpdateIdTableEntry and AddToIdTable goes.

The obvious drawback of #3 is the increased memory requirement, of course.

Thoughts?
Assignee: darin → general
Component: Networking → DOM
QA Contact: benc → ian
Hardware: Macintosh → All
Attached patch Patch that does "solution" 2 (obsolete) — Splinter Review
Note that we still need the first patch I posted in this bug, no matter what...
I think 3 is the way to go. It is too easy for a user to screw up and accidently
insert two nodes with the same id into the tree so I don't think we should start
behaving erratic in the face of it.

By keeping a list and always returning the first (in document order) node we
have consistent and logical behaviour always. It might also prepare us for a
schema world where, iirc, multiple nodes with same id is allowed.

I doubt it will be much of a bloat issue if we optimize for the case where only
one node correspons to an id.


Alternativly we could start enforcing only-one-of-same-id. Either by throwing
when the user tries to insert a node with an id that already exists, or by
removing the id-attribute of the newly inserted node. But I think this would
break too much content.
Attached patch Draft solution (obsolete) — Splinter Review
I tried to preserve our current behavior wrt which nodes win out in which cases
when ids are duplicated, but if you really think we should do document order I
can do that.
Attachment #199255 - Flags: review?(bugmail)
Attached patch Slightly better (obsolete) — Splinter Review
Attachment #199256 - Flags: review?(bugmail)
Attachment #199255 - Attachment is obsolete: true
Attachment #199255 - Flags: review?(bugmail)
If there are multiple nodes we should return whichever is the quickest to
return, assuming that is a stable state. But it has to be reliably reproducable
and predictable, otherwise authors will go nuts.

(Note that what you describe in comment 9 as being a possibly faulty document is
IMHO quite legitimate. There should not be a requirement that the document be
conformant _while script is running_. Sure, between script executions I agree
that it should be conformant (though you'll never get the Web to actually be
that way anyway), but while the script is running all bets are off. IMHO.

When there is only one node with a particular ID in a document, we should always
return that node when gEBI is called on that ID, regardless of the history of
the document.)
> But it has to be reliably reproducable and predictable, otherwise authors will
> go nuts.

I don't really see them going nuts now, when it's not very predictable.  In
fact, my patch aims to make sure that we return whatever we would have returned
before bug 299689.  If someone likes, I can log a nice warning to the JS console
whenever we know we have multiple elements with that id, but sometimes we won't
even know that -- we'll know we have one and never bother looking for the other,
and which one we know about could depend on insertion order, whether id
attributes got changed, what other getElementById calls happened when, etc.

In other words, the "quickest" and "predictable" requirements are contradictory,
imo.

> _while script is running_.

What's special about "while script is running"?  I see nothing in the DOM that
says other threads can't access the DOM while it's doing so.  I see nothing that
says layout and rendering cannot happen while script is running (and indeed
Opera seems to do just that; Gecko sometimes does layout, and may do rendering
if the slow script dialog happens).  If a script does a sync XMLHttpRequest is
it running while it's waiting on the return?  Because during this time the user
can interact with the page in Gecko.

So any DOM access that can happen while script is not running can also happen
while it's running, basically....

> When there is only one node with a particular ID in a document, we should
> always return that node when gEBI is called on that ID, regardless of the
> history of the document.)

That I will buy, and my patch does that.
Comment on attachment 199031 [details] [diff] [review]
Possible patch

No matter how lame it is that we need to deal with this case, I think #3 is the
way to ho here, so r- on this patch...
Attachment #199031 - Flags: superreview?(jst)
Attachment #199031 - Flags: superreview-
Attachment #199031 - Flags: review?(jst)
Attachment #199031 - Flags: review-
Attached patch Updated to tip (obsolete) — Splinter Review
Attachment #200085 - Flags: superreview?(jst)
Attachment #200085 - Flags: review?(bugmail)
Attachment #199256 - Attachment is obsolete: true
Attachment #199256 - Flags: review?(bugmail)
Comment on attachment 200085 [details] [diff] [review]
Updated to tip

sr=jst
Attachment #200085 - Flags: superreview?(jst) → superreview+
Comment on attachment 200085 [details] [diff] [review]
Updated to tip

>-#define ID_NOT_IN_DOCUMENT ((nsIContent *)1)
>+#define ID_NOT_IN_DOCUMENT ((nsIContent *)2)

Might be a good idea to comment on why 1 can't be used.

>+  PRBool AddIdContent(nsIContent* aContent, PRBool aForce);

Please document what the |aForce| argument does.

>   nsString mKey;
>-  nsIContent *mIdContent;
>   nsBaseContentList *mContentList;
>+private:
>+  nsSmallVoidArray mIdContentList;
> };

I wouldn't mind seing mContentList being renamed mNameContentList. Or does it contain both things with id mKey as well as name mKey?

>@@ -2328,23 +2384,23 @@ nsHTMLDocument::GetElementById(const nsA
...
>-  nsIContent *e = entry->mIdContent;
>+  nsIContent *e = entry->GetIdContent();
> 
>-  if (e == ID_NOT_IN_DOCUMENT) {
>+  if (!e || e == ID_NOT_IN_DOCUMENT) {

Just trying to understand the code: Wasn't the !e check needed before too?

> nsHTMLDocument::UpdateIdTableEntry(const nsAString& aId, nsIContent *aContent)
...
>+  if (PL_DHASH_ENTRY_IS_BUSY(entry) &&
>+      NS_UNLIKELY(!entry->AddIdContent(aContent, PR_TRUE))) {

Why aForce=PR_TRUE here? 

>+    // failed to update...
>+    rv = NS_ERROR_OUT_OF_MEMORY;
>   }
> 
>-  return NS_OK;
>+  return rv;

I'd really prefer if you keep return NS_OK here and use an early-return for OOM. Ending functions with return rv has a tendency to break if code is added (or removed).

If aForce doesn't need to be true in nsHTMLDocument::UpdateIdTableEntry then the argument can be removed since the only other place where it is true the list should be empty making aForce not matter.

r=me
Attachment #200085 - Flags: review?(bugmail) → review+
(In reply to comment #20)
> Might be a good idea to comment on why 1 can't be used.

Done.

> Please document what the |aForce| argument does.

Done.

> I wouldn't mind seing mContentList being renamed mNameContentList.

Done.

> Just trying to understand the code: Wasn't the !e check needed before too?

Yes.  See comment 5 and the first patch I posted in this bug.

> Why aForce=PR_TRUE here? 

That's what the existing code did -- overwrote mIdContent even if it was already set.

> I'd really prefer if you keep return NS_OK here

Done.

> If aForce doesn't need to be true in nsHTMLDocument::UpdateIdTableEntry then
> the argument can be removed since the only other place where it is true the
> list should be empty making aForce not matter.

That arg only exists because I was preserving what existing code did.  I agree it's silly; I'll file a followup to remove it.
Assignee: general → bzbarsky
Attachment #199031 - Attachment is obsolete: true
Attachment #199137 - Attachment is obsolete: true
Attachment #199138 - Attachment is obsolete: true
Attachment #200085 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Fixed.   Filed bug 315770 on removing aForce and bug 315771 on using atoms in all this code.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9alpha
Flags: blocking1.9a1?
Added mochitest.
Flags: in-testsuite+
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: