Closed Bug 897904 Opened 6 years ago Closed 6 years ago only partially loaded


(Core :: Networking, defect, major)

Not set



Tracking Status
firefox23 --- unaffected
firefox24 --- unaffected
firefox25 + verified
firefox26 + verified


(Reporter: glazou, Assigned: sworkman)




(3 files, 2 obsolete files)

Tested with Nightly 25.0a1 (2013-07-24) OS X.

1. open

EXPECTED RESULT: the whole page appears, with a textarea, see second screenshot attached (made with Safari)

ACTUAL RESULT: the page is empty and grey. If I shift-reload, I get little more content but the source view still shows a truncated document. See first screenshot attached (made with Nightly)

On IRC, smaug said he sees the bug on Linux (Fedora 18 / 64bit) and Six
said he does not see it on linux64b.
Smaug is  having this issue on Fedora 18 / 64bit
OS: Mac OS X → All
Hardware: x86 → All
FF23/beta seems to work here.
this is wfm so I can't be that much help.

but from the pushlog as a swag you might try a build right before 497003 and right after it (I think it landed in 2 csets).. that patch is going to change a lot of timing. but its more of a guess than a suspicion.
Hmm, I can't reproduce this anymore with my local builds, but can with the latest nightly.
It seems easier to reproduce the issue after clearing the browser's history, especially if the page was already correctly loaded on non-affected versions.
Blocks: 497003
No longer blocks: 497003
Depends on: 497003
Steve, what's the plan here?
Flags: needinfo?(sworkman)
So, uplift has already happened. Let me take some time today to look at it and see if I can find the problem. If not, I should be able to disable the patch from Bug 497003 by adding a pref or disabling it directly. I'll post an update by the end of the day.
Flags: needinfo?(sworkman)
I should add that I'm seeing the issue locally on OSX 10.8, so that should help with some diagnosis.
I might have a fix. I had a couple of patches in my local repo to fix some intermittent issues on the try server, and it looks like they might fix the issue. I was able to reproduce it with a local build without the patches, and with them it seems to go away.

I've pushed the patches to try, and builds should be appearing there soon:

Daniel, can you try it out and post the results, please? Anyone else who could reproduce it, please try also.
Flags: needinfo?(daniel)
(In reply to Steve Workman [:sworkman] from comment #12)

> Daniel, can you try it out and post the results, please? Anyone else who
> could reproduce it, please try also.

These builds seem to fix the issue. I could test only the OS X version since
I'm away from my office these days.
Flags: needinfo?(daniel)
Thanks Daniel - fixed on OS X is a good start.

Smaug, could you try too please?
Flags: needinfo?(petruta.rasa)
Flags: needinfo?(bugs)
Tryserver build seems to work on 64bit linux.
Flags: needinfo?(bugs)
Jason, this is the combination of two patches I merged together. Both are fixes I was going to add to support ODA off main thread for imagelib. Note that I'm not 100% sure what the issue was here; basically I had observed racing issues with OnStopRequest and imagelib code, and seeing truncated HTML here made me think to try the patches. I want to spend some more time figuring out exactly what is happening, but the patch seems to work.

1 - Set nsInputStreamPump::mStatus only if EnsureWaiting fails. This is in the context of nsInputStreamPump::OnInputStreamReady. This is to prevent the case where mStatus can be incorrectly set to SUCCESS after a call to Cancel was made where it should have been set to a FAILURE. This allows OnStateStop to run with the correct mStatus.

2 - It looks like EnsureWaiting should only call AsyncWait at the end of the loop in OnInputStreamReady. Calls to Resume and Cancel should not result in a call to AsyncWait in this context, otherwise they can spawn parallel calls to OnInputStreamReady on different threads.

Note, I discovered this while working on the imagelib code which has an nsIEventTarget mapping to a threadpool not a single thread. Here is the flow of events involving the main thread and two decode threads:

- On a decode thread, in the context of OnStateTransfer/ODA, Cancel is called. This results in an AsyncWait, and OnInputStreamReady is called on ANOTHER decode thread, since the target is a threadpool.

- The second decode thread starts running OnInputStreamReady BEFORE the first thread continues. Basically, it changes mState to STATE_STOP but the first thread is not expecting this. The second thread then calls AsyncWait to push OnStateStop to the main thread, which it should.

- The first thread, however, just knows that its current state is STATE_STOP, so it tries to run OnStateStop too. This fires an assertion and tests fail intermittently.

- On the main thread, a call to OnStateStop is made - this is correct behavior.

By forcing mWaiting to true at the start of the loop in OnInputStreamReady, calls to AsyncWait are only allowed AFTER OnStateStart and OnStateTransfer have completed. Previously, this was not needed since the calls were all serialized events on the main thread.

The case here is a little different - I don't think Cancel is called (I could be wrong), and there are only two threads, the HTML parser thread and the main thread. So, really this is somewhat speculative. My guess is that OnStopRequest was being called on the main thread before all the ODAs had completed for the HTML, but I'm not sure why. I'll do some more investigating tomorrow to see if I can come up with a plausible case.
Assignee: nobody → sworkman
Attachment #786666 - Flags: review?(jduell.mcbugs)
The issue doesn't reproduce using the try builds on Ubuntu 13.04 x86, Win 8 32 bit and OS 10.7.5.
Flags: needinfo?(petruta.rasa)
(In reply to petruta.rasa from comment #17)
> The issue doesn't reproduce using the try builds on Ubuntu 13.04 x86, Win 8
> 32 bit and OS 10.7.5.

Awesome - thanks for testing!
Figured out what's happening here, and only one part of the patch should be needed.

When the HTML parser starts parsing the meta tag, it sees that it needs to switch char sets. So, it tells nsHttpChannel to reschedule the request, which should cancel the nsInputStreamPump. Now, on a good day, the channel should NOT save this data to the cache; it should just reload from the network. BUT, nsInputStreamPump::mStatus which is passed to the channel in OnStopRequest was set to NS_OK instead of NS_BINDING_ABORTED.

How did it get set to NS_OK? Well, in nsInputStreamPump we changed the behavior to effectively dispatch OnStopRequest back to the main thread. This is done by calling EnsureWaiting at the end of the loop in OnInputStreamReady. Cancel was called during OnStateTransfer (OnDataAvailable's caller) setting mStatus to BINDING_ABORTED, but when EnsureWaiting was called to dispatch OnStopRequest back to the main thread, its return value was NS_OK - which is then used to set mStatus. Finally, mStatus=NS_OK is passed on to nsHttpChannel::OnStopRequest, in which the channel decides that it is ok to cache the current data, even though it's incomplete. So, when it does the reload for the charset switch, it loads from the incomplete cache entry and not from the network.

So, only the first part of the patch is needed, i.e. that part which only sets mStatus if EnsureWaiting fails, thus preserving any value set by Cancel.
Attachment #786666 - Attachment is obsolete: true
Attachment #786666 - Flags: review?(jduell.mcbugs)
Attachment #787807 - Flags: review?(jduell.mcbugs)
I've tested this locally to confirm, and it looks good. One or two verifications would be appreciated. Try builds should be coming in here:
I've tested the try builds on Ubuntu 13.04 x86, Win 7 64 bit and OS X 10.8.4.
The page opens correctly.
thanks steve!
(In reply to petruta.rasa from comment #21)
> I've tested the try builds on Ubuntu 13.04 x86, Win 7 64 bit and OS X 10.8.4.
> The page opens correctly.

Thanks for testing again!
Duplicate of this bug: 902014
Updated based on chat with Jason.

mStatus should only be updated if EnsureWaiting fails AND if it is NOT already in a failure state. This way, if Cancel or some other function has set mStatus to some failure value, it will not be overwritten by the failure value of EnsureWaiting.
Attachment #787807 - Attachment is obsolete: true
Attachment #787807 - Flags: review?(jduell.mcbugs)
Attachment #788431 - Flags: review?(jduell.mcbugs)
Attachment #788431 - Flags: review?(jduell.mcbugs) → review+
Thanks Jason!

Landed on inbound.

I'll request aurora approval in a couple of days after it's landed and baked a little on central.
Whiteboard: [leave open]
Comment on attachment 788431 [details] [diff] [review]
v1.2 Set nsInputStreamPump::mStatus only if EnsureWaiting fails

[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 497003 - Support OnDataAvailable off the main thread.

User impact if declined: and other sites will not load.

Testing completed (on m-c, etc.): Reproduced and verified locally; fix verified by several commenters on the bug as well; landed on m-c several days ago.

Risk to taking this patch (and alternatives if risky): Low risk. No alternatives.

String or IDL/UUID changes made by this patch: None.
Attachment #788431 - Flags: approval-mozilla-aurora?
Attachment #788431 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: [leave open]
Assuming this can be marked resolved fixed since [leave open] was removed from the whiteboard. Flagging for verification.
Closed: 6 years ago
Keywords: verifyme
Resolution: --- → FIXED
Note that [leave open] should be used for when a bug has an incomplete landing on m-c, not for branch uplifts. We use the bug resolution and target milestone to track landing on m-c and the status flags for branch uplifts. Leaving the bug open can interfere with that process.
Target Milestone: --- → mozilla26
Verified fixed on Aurora 25.0a2 (20130913004001):
Mozilla/5.0 (X11; Linux i686; rv:25.0) Gecko/20100101 Firefox/25.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.7; rv:25.0) Gecko/20100101 Firefox/25.0
Mozilla/5.0 (Windows NT 6.2; rv:25.0) Gecko/20100101 Firefox/25.0
QA Contact: petruta.rasa
Mozilla/5.0 (X11; Linux x86_64; rv:26.0) Gecko/20100101 Firefox/26.0
Mozilla/5.0 (Windows NT 6.0; rv:26.0) Gecko/20100101 Firefox/26.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:26.0) Gecko/20100101 Firefox/26.0

Verified as fixed on latest Aurora 26.0a2 (buildID: 20131003004003).
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.