Closed Bug 800708 Opened 8 years ago Closed 6 years ago

Test failure "All downloads have been finished" in /testPrivateBrowsing/testPrivateDownloadPanel.js and in /testPrivateBrowsing/testDownloadManagerClosed.js

Categories

(Mozilla QA Graveyard :: Mozmill Tests, defect, P2)

defect

Tracking

(firefox25 wontfix, firefox26 disabled, firefox27 disabled, firefox-esr10 wontfix, firefox-esr17 wontfix)

RESOLVED WONTFIX
Tracking Status
firefox25 --- wontfix
firefox26 --- disabled
firefox27 --- disabled
firefox-esr10 --- wontfix
firefox-esr17 --- wontfix

People

(Reporter: AndreeaMatei, Unassigned)

References

()

Details

(Whiteboard: [mozmill-test-failure])

Attachments

(3 files, 3 obsolete files)

This happened today on Windows Vista x86, with ERS branch 10.0.9.
It seems the download took very long so this waitFor has failed:

// Wait until all downloads have been finished
controller.waitFor(function () {
  return dm.activeDownloadCount === 0;
}, "All downloads have been finished");

I will continue to monitor this over the weekend.
Please add a link to the failure list and not to a single report. Thanks.
Priority: -- → P2
It's happening for releases on esr10. Lets get this figured out and fixed.
Whiteboard: s=121022 u=failure c=private_browsing p=1
Assignee: nobody → andreea.matei
Status: NEW → ASSIGNED
This it's not reproducing for me and I also haven't seen it fail since October 13th.
So it appears that this has occurred only six times since June. Four of these are on Windows NT 6.0.6002 and all of these were on different versions of Firefox. Given how infrequent this occurs I might suggest adding some debug to the test so that when it next occurs we can find out what's still downloading. The files are locally hosted so the downloads should be very quick.
I got it to reproduce, like one time on 300 runs. I found out that in the current waitFor, when we fail, the first 4500ms the activeDownloadCount = 2, then for the next 5000 is 1. Somehow it's not enough time to get to none actively downloading.
It's great that you're able to reproduce this! The downloads should be quick because they're served locally, but maybe there is some delay in starting to download them. I wonder if we should either increase the timeout, or have additional waits. For example, we could wait the default amount of time for each download rather than for all downloads.
Andrea, can you please increase the timeout and check how long it takes for the other remaining files to download? I wonder if the initial download is only affected here.
When increased the timeout (even until 15000), that waitFor passes, but fails later at the dm.close(). 
Need to investigate further to see if that happens in pb mode or right at the end. It's hard to catch it, today I runned tests all day on vista VM.
Still, the first download takes the longest time.
Please use as many as possible dump statements which also print the current time to the console. So we can see what's happening.
Looks like is at the last line on the test. 
  dm.close();

This should have to be the other failure from the reports.

I'm checking now what timeout is the most appropriate for the two waitFor() out and in PB mode. But is always the first file that takes a lot longer to download (unknown_type.mtdl)
I kinda would like to see some console output for this problem. Please attach it to the bug.
Attached file waitForTimeout Error (obsolete) —
This is the output for the "All downloads have been finished" error, outside PB.
Dump doesn't work on the windows console, so I send the output to a txt file.

There's also another case where in the first waitFor it takes long, but manages to pass and in the second one inside PB fails (got that even with 8000 timeout).
Attached file DownloadManagerClose Error (obsolete) —
This is the output for "Download Manager has been closed" error.

Used dump before and after the last close of download manager and only the one before appears.
(In reply to Andreea Matei [:AndreeaMatei] from comment #13)
> Created attachment 679119 [details]
> DownloadManagerClose Error

That one should be attached to bug 802126.
(In reply to Andreea Matei [:AndreeaMatei] from comment #12)
> Created attachment 679117 [details]
> waitForTimeout Error

This does not contain useful information. Dump states with timestamps as I already have mentioned in comment 6 yesterday.
Attached file waitForTimeout Error (obsolete) —
Here are the timestamps. I've used 8000 timeout, for the first one it passed at 5600 and the second in PB mode failed.

If that's the case, I feel that separating the downloads the first time, with waitFor() for each of them will not solve the issue in PB mode where is only one file.
Attachment #679117 - Attachment is obsolete: true
Again please add more context. Dump which file its starting to download and  when it is ready. As best log each step, which would give us really helpful information.
Both files are starting at the beginning in the for, I will add a dump for which one is finished first.

In PB mode where also fails is just one file: unknown_type_pb.stbf.
Attached file waitForTimeout Error
So here are the results, showing each download file, it's state and the timestamp.
state 7 means scanning according to downloads.js and state 1 is finished.

Still, I have another one where both are still downloading at 8000.
Even if we increase timeout even further, it's odd it takes that long.

I think we could indeed separate the downloads, start the next one only when the first is done and wait for each of them. Couldn't get it to fail like this.
Attachment #679119 - Attachment is obsolete: true
Attachment #679658 - Attachment is obsolete: true
Do you have an antivirus software installed on your box? That could be the cause. Our CI boxes don't have. If that's the case I wonder if we should try to disable the scan for our test files.
This is still failing sometimes (1 in 20 runs) with just downloading each file separately, with default timeout. 
If we increase timeout, even double it, it stops failing, but this is not a wanted fix since it shouldn't be taking this long to download a local file, which indicates a possible firefox bug.
Failing under which conditions? Normal CPU load? The VM shouldn't have a high load usually, so I would say you should test it directly on it which probably gives us better results.
I've tested under high load CPU, but I will check this morning how it behaves with normal load. Thanks! I assume it will be fine by downloading the files synchronously.
Attached patch patch v1Splinter Review
Downloading each file and waiting for it to finish.
Attachment #682403 - Flags: review?(hskupin)
Attachment #682403 - Flags: review?(dave.hunt)
Comment on attachment 682403 [details] [diff] [review]
patch v1

Review of attachment 682403 [details] [diff] [review]:
-----------------------------------------------------------------

Before fully reviewing this patch I would like to know how the downloading times are looking now? You haven't given any numbers for this patch.

::: tests/functional/testPrivateBrowsing/testDownloadManagerClosed.js
@@ +67,4 @@
>      downloads.downloadFileOfUnknownType(controller, DOWNLOADS[i]);
> +
> +    // Save information of currently downloaded file
> +    downloadedFiles = dm.getAllDownloads();

I think that this should not be called within the loop. if it also returns already downloaded files we are good, otherwise we will leave files around on the file system.
Attachment #682403 - Flags: review?(hskupin)
Attachment #682403 - Flags: review?(dave.hunt)
Attachment #682403 - Flags: review-
This is the highest report, 600 ms for one file.
As you can see in the file, the getAllDownloads() method returns those already downloaded as well (when the first is done, is still kept in downloadedFiles with the currently downloading one).
Wow, this smells like a real bug in Firefox! Can you please exchange the files we download from local ones to some remote files? If that also takes that long from state 7 to state 1 we should deeper investigate it and file a bug against Firefox.
With remote files (the same, from mozqa.com), in 20 runs, mostly they were also about 600ms, one case with 900ms and 1200ms but also one failed:
First download reached the timeout (5000), second download started and was finished at 1500 while the first was still downloading.
The important thing here is the transition from state 7 to state 1. It's not relevant how long it takes to download the files. So can you give me that information for remote files being downloaded synchronously and asynchronously?
For remote files, downloading separately, the transition in as shown in the file attached, state 7 persists until it's 1.
Downlowding both in the for loop, we have the following:
* states 7 and 1: http://pastebin.mozilla.org/1960791 - one gets downloaded very quickly and the other stays at 7 for a time, then gets to 1
* states 5 and 1: http://pastebin.mozilla.org/1960789 - 5 stands for "The download is in the queue but is not presently downloading." and is strange that is not changing to another state but suddenly into 1
* states 1 and 0: http://pastebin.mozilla.org/1960840 - 0 stands for "The download is in the process of being downloaded." - one file is downloaded quick, the other is 0 from the beginning
(In reply to Andreea Matei [:AndreeaMatei] from comment #30)
> * states 7 and 1: http://pastebin.mozilla.org/1960791 - one gets downloaded
> very quickly and the other stays at 7 for a time, then gets to 1

Given that pastebin links make it hard for us to query for and get obsolete please always paste content in the bug. I would prefer a combined list of events not separated by type. Please attach such a file.

Something is wrong here

downloaded Files: [object Object],[object Object]
active download count: 0
download http://mozqa.com/data/firefox/downloading/unknown_type.fmtd state 1
download http://mozqa.com/data/firefox/downloading/unknown_type.mtdl state 7 

Why does it give 0 for active download count while there is still one download in progress?

> * states 5 and 1: http://pastebin.mozilla.org/1960789 - 5 stands for "The
> download is in the queue but is not presently downloading." and is strange
> that is not changing to another state but suddenly into 1

As I have suspected above it probably takes a long time to initialize the scanning module. For follow-up downloads it's way shorter. But why does that only happen for parallel downloads? That's something we have to figure out.

> * states 1 and 0: http://pastebin.mozilla.org/1960840 - 0 stands for "The
> download is in the process of being downloaded." - one file is downloaded
> quick, the other is 0 from the beginning

Not sure how I should read it because I miss the overall picture of that test. Please provide the full log as mentioned above.
Assignee: andreea.matei → andrei.eftimie
Have not had any more failures in a full month:  http://mozmill-ci.blargon7.com/#/functional/top?branch=All&platform=All&from=2013-01-25&to=2013-02-25

This test has been refactored as part of bug 823064 (applied to all branches except esr17 and release).

I'd say we should close it for now and reopen if it still occurs.
Sounds good. Please close it yourself next time.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Happened again on Win 8, with 17.0.4esrpre:
http://mozmill-ci.blargon7.com/#/functional/report/2a6536e9db9f5f44ed48c58510585b14
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Same issue happening on the rewritten test (actual download code is the same):
http://mozmill-ci.blargon7.com/#/functional/report/db924715258381629c26ddbad6e6eb29
Summary: Test failure "All downloads have been finished" in /testPrivateBrowsing/testDownloadManagerClosed.js → Test failure "All downloads have been finished" in /testPrivateBrowsing/testPrivateDownloadPanel.js and in /testPrivateBrowsing/testDownloadManagerClosed.js
Happened today on Windows 8 x64 with Nightly en-US build:
http://mozmill-ci.blargon7.com/#/functional/report/206c36485a655b168e74264237372d2e
OS: Windows Vista → All
Whiteboard: s=121022 u=failure c=private_browsing p=1 → [mozmill-test-failure] s=121022 u=failure c=private_browsing p=1
Happened today on Windows 7 with Firefox 25.0a2 (25.0a2, en-US, 20130807004004)
http://mozmill-daily.blargon7.com/#/functional/report/bccc2ea9ac1e478fa074c699d8bb01be
Whiteboard: [mozmill-test-failure] s=121022 u=failure c=private_browsing p=1 → [mozmill-test-failure]
Happened again on WinXP with Firefox 26.0a1 (26.0a1, de, 20130831030224):
http://mozmill-daily.blargon7.com/#/functional/report/3c3cd991de01b704f3f65783a18850b6
That failure is not related since the test is skipped.

Might be a machine problem?
Agreed, I have no idea why this report have been filed as comment here. It's clearly not related also because it's a disconnect and not the failure as covered on this bug.
We no longer have testDownloadManagerClosed.js in other branches besides esr17 and testPrivateDownloadPanel.js is skipped on aurora and default due to the Downloads API changes.

Still fails on Beta with Windows 8 intermittently.
Depends on: 908649
Assignee: andrei.eftimie → nobody
Status: ASSIGNED → NEW
This failed again on ESR24:
http://mozmill-daily.blargon7.com/#/functional/report/2f982f72826307fed840a3b11cbd121d

Due to the fact that on all newer branches we'll have the new Download Library implementation, this will not fail _probably_ the same way.

We won't fix this for ESR24, so I'm closing this issue now.

If something similarly fails after bug 908649 lands, please file a different bug.
Status: NEW → RESOLVED
Closed: 8 years ago6 years ago
Resolution: --- → WONTFIX
Product: Mozilla QA → Mozilla QA Graveyard
You need to log in before you can comment on or make changes to this bug.