Race condition loading source content on refresh

RESOLVED WORKSFORME

Status

P2
normal
RESOLVED WORKSFORME
6 years ago
4 months ago

People

(Reporter: bdahl, Assigned: fitzgen)

Tracking

20 Branch
x86
Mac OS X

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

6 years ago
The debugger never finishes loading the script source on refresh when there is a 'debugger' keyword used.

Steps to reproduce:
1) Open the debugger
2) Open http://people.mozilla.com/~bdahl/sandbox/debugger_hang/web/viewer.html#disableWorker=true
3) Observe that the debugger breaks on the 'debugger' keyword line in font.js
4) Refresh the page
5) Where the script source is located will say 'Loading' but it never loads the source and eventually turns on my laptop fan.

FF19 seems to work fine with the above steps.  Further, if I set a breakpoint in the debugger instead of using the 'debugger' keyword everything works as expected.
could be toolbox related. Need to spend some time figuring out where this delay is coming from. We have a race.
Assignee: nobody → past
Priority: -- → P2
I can't reproduce this on either Nightly (FF21) or the latest Aurora (FF20). The source always loads for me. Even in debug builds, where I get a slow script dialog, no matter what option I choose (stop, continue, debug) I always get the script afterwards. Can you check if this now works for you, too?
I am not exactly sure why this happens, but here goes what actually happens :

1) While loading a script, some exception comes in Error Console.
2) After that even when the source of the file gets loaded it is instantly *replaced* with the "Loading..." text.
3) Now choose any script, it *will* load, but instantly the text of the script will get replaced with "Loading..."

I see this intermittently (more often in chrome debugger) and will try to capture the exception next time.
May be related to bug 785704, on which I'm working on (relatively in vain).
(Reporter)

Comment 5

6 years ago
(In reply to Girish Sharma [:Optimizer] from comment #3)
> I am not exactly sure why this happens, but here goes what actually happens :
> 
> 1) While loading a script, some exception comes in Error Console.
> 2) After that even when the source of the file gets loaded it is instantly
> *replaced* with the "Loading..." text.
> 3) Now choose any script, it *will* load, but instantly the text of the
> script will get replaced with "Loading..."
> 
> I see this intermittently (more often in chrome debugger) and will try to
> capture the exception next time.

I can still reproduce.  I did just notice that the bug only seems to come up if you are doing a full refresh(on osx command+shift+r).
(In reply to Brendan Dahl from comment #5)
> (In reply to Girish Sharma [:Optimizer] from comment #3)
> > I am not exactly sure why this happens, but here goes what actually happens :
> > 
> > 1) While loading a script, some exception comes in Error Console.
> > 2) After that even when the source of the file gets loaded it is instantly
> > *replaced* with the "Loading..." text.
> > 3) Now choose any script, it *will* load, but instantly the text of the
> > script will get replaced with "Loading..."
> > 
> > I see this intermittently (more often in chrome debugger) and will try to
> > capture the exception next time.
> 
> I can still reproduce.  I did just notice that the bug only seems to come up
> if you are doing a full refresh(on osx command+shift+r).

Can you open the error console and copy paste the error related to debugger (if any)
(Reporter)

Comment 7

6 years ago
This message repeats in the error console:

Timestamp: 2/20/13 10:09:18 AM
Error: Error loading http://people.mozilla.com/~bdahl/sandbox/debugger_hang/src/fonts.js
noSuchActor
Source File: chrome://browser/content/debugger-controller.js
Line: 1259
(In reply to Brendan Dahl from comment #7)
> This message repeats in the error console:
> 
> Timestamp: 2/20/13 10:09:18 AM
> Error: Error loading
> http://people.mozilla.com/~bdahl/sandbox/debugger_hang/src/fonts.js
> noSuchActor
> Source File: chrome://browser/content/debugger-controller.js
> Line: 1259

Yes, I was also talking about this error, it keeps on repeating in the error console, indefinitely.
I am 90% sure this is a dup of bug 785704, can you confirm Panos?
Flags: needinfo?(past)
I can confirm that the WIP patch from bug 785704 fixes this for me, so let's retest after that fix lands.
Status: NEW → ASSIGNED
Depends on: 785704
Flags: needinfo?(past)
The first part of bug 785704 that has already landed seems to have fixed this. Feel free to reopen if you can still reproduce it.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 785704
(Reporter)

Comment 12

5 years ago
This seem to have come back and I can see this issue in the current Aurora(v24) and Nightly(v25).

It seems easier to reproduce if on step 4 I do a full refresh (cmd+shift+r).
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
(Reporter)

Comment 13

5 years ago
Created attachment 806223 [details]
minimal test case

Here's a tiny test case that illustrates the issue (Remember to do a full refresh cmd+shift+r).

I also now get the error:
Error loading source:
No such actor for ID: conn0.source2264
(Reporter)

Comment 14

5 years ago
I should mention it seems to be a timing issue since it's hard to reproduce just refreshing the link above. I can consistently reproduce though if I serve the file with a local server.
(In reply to Brendan Dahl [:bdahl] from comment #14)
> I should mention it seems to be a timing issue since it's hard to reproduce
> just refreshing the link above. I can consistently reproduce though if I
> serve the file with a local server.

That might be bug 916454 in action.
(In reply to Victor Porof [:vp] from comment #15)
> (In reply to Brendan Dahl [:bdahl] from comment #14)
> > I should mention it seems to be a timing issue since it's hard to reproduce
> > just refreshing the link above. I can consistently reproduce though if I
> > serve the file with a local server.
> 
> That might be bug 916454 in action.

No, I think this is different. That is a general slowness, this is a source loading race condition.
I think I have a solution.

Don't clear ThreadSources on clearDebuggees, instead send it a refresh notification that tells the existing actors to throw away their cached source text (if using Debugger.Source) and that way the actors will actually live as long as the thread actor, like the RDP says they sould (and the way the frontend kind of expects them to).
Assignee: past → nfitzgerald
Created attachment 806299 [details] [diff] [review]
wip

This patch fixes this bug, but every now and then I will refresh and have no sources listed despite being at the breakpoint. Not too much better.

I'm kind of hesitant to really dig into this because

a) this patch depends on the Debugger.Source change, and if that gets backed out, I don't want to ahve to back out this patch too

b) it is becoming very clear to me that the way we handle sources right now is crazy, and we need to transition from using URLs as unique identifiers to using Debugger.Source, but I'm not sure we are ready yet.
Attachment #806299 - Flags: feedback?(past)
Comment on attachment 806299 [details] [diff] [review]
wip

Review of attachment 806299 [details] [diff] [review]:
-----------------------------------------------------------------

I can't reproduce the issue from comment 13, neither when loading the file from bmo, nor when I use a file:// URL. And that is in both debug and optimized OS X builds, without your patch applied, so I don't know what to make of it. I haven't tried using a local web server, because in my experience file:// URLs expose such races even easier.

The good news is that it looks like we won't have to back out the Debugger.Source patch, so it's fine if you want to pursue bug 917579. And since this bug is apparently hard to reproduce, I believe it would be also fine to work on other bugs that we think might fix it indirectly.
Attachment #806299 - Flags: feedback?(past)
Blocks: 905700
I can repro maybe 1/5 refreshes via file:// URL with an opt build. Basically when we navigate, webbrowser.js calls TA_clearDebuggees, which clears the threadLiftetimePool and sources. The frontend asks for source content of the old source actor from the last threadLifetimePool before it gets the newSource notification for the new actor. This all seems unnecessary to me: I was under the impression that the threadLifetimePool was *supposed* to have the same lifetime as the thread actor. In which case the problem goes away.

However, there are some issues that would remain when we switch to D.S. I've talked with Jim before, and we decided it made sense to have one source actor for each D.S. However, reloads would result in new D.S instances so we would basically have this same race condition again. The above fix only works when we have a source actor per URL. I think we can fix this by making sure that newSource packets are always sent before pause packets, but I thought that was the case right now anyways.

Will revisit this once we have D.S properly integrated.
No longer blocks: 905700
Depends on: 905700
(Reporter)

Comment 21

5 years ago
(In reply to Panos Astithas [:past] from comment #19)
> I can't reproduce the issue from comment 13, neither when loading the file
> from bmo, nor when I use a file:// URL. And that is in both debug and
> optimized OS X builds, without your patch applied, so I don't know what to
> make of it. I haven't tried using a local web server, because in my
> experience file:// URLs expose such races even easier.

File urls seem to be pretty inconsistent for me, using apache locally it seems to be reproducible ~100%.
Summary: Debugger Hangs Forever on Refresh → Race condition loading source content on refresh
(In reply to Victor Porof [:vp] from comment #15)
> (In reply to Brendan Dahl [:bdahl] from comment #14)
> > I should mention it seems to be a timing issue since it's hard to reproduce
> > just refreshing the link above. I can consistently reproduce though if I
> > serve the file with a local server.
> 
> That might be bug 916454 in action.

Copy paste mixup, I meant the old bug 785704.
Duplicate of this bug: 912018
Depends on: 917982
No longer depends on: 785704, 905700
I tried to reproduce this again today, serving attachment 806223 [details] from a local python server with:

python -m SimpleHTTPServer

Everything is working fine for me in both debug and optimized builds. Brendan, does it now work for you, too?
Flags: needinfo?(bdahl)
I suspect that now that bug 917982 has landed, the source actor will still be around, but it is still the *old* source actor. I suspect this bug will now manifest itself as not seeing the latest changes to a file, instead of not getting the source. Basically, I don't think the race has gone away, but we probably have a different (hopefully better in the meantime) symptom.
(Reporter)

Comment 26

5 years ago
It appears to be fixed for my minimal test case, but the original test in comment 1 still doesn't work.
Flags: needinfo?(bdahl)
(Reporter)

Comment 27

5 years ago
Actually I still can replicate the issue with the minimal test case if I use apache to server the file locally.  Not sure if it's faster or slower than python SimpleHTTPServer.
I set up a local apache server, but still couldn't reproduce it. Tried both debug and optimized builds and in the optimized build I even tried keeping Shift-Cmd-R pressed for 10 seconds, to no avail. I also tried the test in comment 0 in both builds, still no issue.

I've run out of ideas on how to reproduce it. I'm testing on a quad-core i7 MBP with 10.9, using the stock apache server and a local build of fx-team tip. Brendan, is your hardware significantly different?

More importantly, does Nick's patch fix this for you? If you or Nick can reliably reproduce it and can verify that this patch fixes it, then that's good enough for me.
Created attachment 8346537 [details] [diff] [review]
wip rebased

Rebased on fx-team tip.
Attachment #806299 - Attachment is obsolete: true
Assignee: nfitzgerald → past
Status: REOPENED → ASSIGNED
Assignee: past → nfitzgerald
(Reporter)

Comment 30

5 years ago
I'm unable to reproduce with 29.0a1 (2013-12-12) for either test case.  I think it would be good to call this bug resolved (I'm not closing since I'm not sure if you still want to land the patch).

On another note I'm still seeing the debugger occasionally have issues with debugger statements.  I see it stop then a moment later the debugger line is gone and all the variable/stack info is gone.  This is is really intermittent though, I'll try to file a new bug when I can get a better test case.
Nick should we land this as is, or do you want to make any adjustments?
Flags: needinfo?(nfitzgerald)
Comment on attachment 8346537 [details] [diff] [review]
wip rebased

This patch is obsolete. It was basically a work around for bug 917982 by making sources live in their own actor pool. Now that that bug is fixed, the sources pool is unnecessary.

If you think it is valuable, we could lift the test out and commit that. /me shrugs
Attachment #8346537 - Attachment is obsolete: true
Flags: needinfo?(nfitzgerald)
Oh right, I forgot about that. We still seem to have some races around reloads, like bug 907440 and bug 948061, but let's deal with them in those bugs.

Brendan I'm closing this bug, but do file a new one for the intermittent problem when you've got STR.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago5 years ago
Resolution: --- → WORKSFORME

Updated

4 months ago
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.