Closed Bug 764640 Opened 13 years ago Closed 12 years ago

Sending data too quickly through the JSBridge socket causes an overflow and lots of event data can get lost (expect_assert.js, screenshot.js)

Categories

(Testing Graveyard :: Mozmill, defect)

Product:

Component:

Type:

defect

Priority:

Not set

Severity:

critical

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: whimboo, Assigned: whimboo)

References

Details

(Keywords: dataloss, Whiteboard: [mozmill-2.0+][ateamtrack: p=mozmill q=2013q3 m=1])

Attachments

(4 files, 3 obsolete files)

broken json data 13 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 86.72 KB, text/plain		Details
successfull dump (2 out of 2) 12 years ago Andrei Eftimie 125.19 KB, text/plain		Details
failed dump (1 out of 2) 12 years ago Andrei Eftimie 125.05 KB, text/plain		Details
WIP v1 12 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 1.42 KB, patch	andrei : feedback+ jfrench : feedback+	Details \| Diff \| Splinter Review
Patch v1 12 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 2.01 KB, patch	cmtalbert : review+	Details \| Diff \| Splinter Review
0001-Bug-764640-Sending-data-too-quickly-through-the-JSBr.patch 12 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 6.66 KB, patch		Details \| Diff \| Splinter Review
0001-Bug-764640-Sending-data-too-quickly-through-the-JSBr.patch 12 years ago Henrik Skupin [:whimboo][⌚️UTC+1] 6.66 KB, patch	cmtalbert : review+	Details \| Diff \| Splinter Review

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Description

•

13 years ago

When I run mozmill master's mutt tests for the assertions lib, I get zero test results returned. ERROR | ProcessManager UNABLE to use job objects to manage child processes ERROR | INFO | ProcessManager NOT managing child processes INFO | DEBUG | mozmill.startRunner | true DEBUG | mozmill.setModule | "currentModule" DEBUG | mozmill.setState | "currentState" TEST-START | z:/data/code/tools/mozmill/mutt/mutt/tests/js/assertions/expect_assert.js | testExpect [..] RESULTS | Passed: 0 RESULTS | Failed: 0 RESULTS | Skipped: 0 All tests were successful. Ship it! Flagging for 2.0 for now until this has been investigated.

Comment 1

•

13 years ago

FWIW, the ERROR | output is from outputting to stderr in mozprocess which mozlogger converts to an error line: https://github.com/mozautomation/mozmill/blob/master/mozmill/mozmill/logger.py#L86 Unrelated to the 0 results but worth noting

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 2

•

13 years ago

This is related to the throws/doesNotThrow tests in the test module.

Blocks: 637880

Comment 3

•

13 years ago

+'ing for investigation.

Whiteboard: [mozmill-2.0?] → [mozmill-2.0+]

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

13 years ago

Assignee: nobody → hskupin

Status: NEW → ASSIGNED

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 4

•

13 years ago

I have re-setup a new Windows XP machine and I can always see this failure. Other versions of Windows are seem to be fine. At least on Windows 7 I'm not able to reproduce.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 5

•

13 years ago

This is clearly a failure in the framework. In some cases we fail to send the endTest event so the Python code doesn't pick up the pass/fail/skip count. More details later.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 6

•

13 years ago

Attached file broken json data — Details

This attachment contains the JSON code which we fail to send via JSBridge. I can get rid of the failure with my patch on bug 781376, which means that something invalid is in this object. I will further investigate tomorrow.

Dave Hunt [:davehunt] [he/him] ⌚BST

Comment 7

•

13 years ago

Interesting. Now that bug 781376 has been resolved I assume we'd see this issue much less often, but it would be great to find out the cause.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 8

•

13 years ago

We would if an assertion fails with the specific data in the details. So we can simply negate the test to make it happen. So it shoudl still be reproducible 100% of the time. I will continue to work on this bug after bug 761564 is fixed. If you want to check feel free to take some work out of my plate. Thanks.

Updated

•

12 years ago

Whiteboard: [mozmill-2.0+] → [mozmill-2.0+] s=130408 u=failure c=mozmill p=1

Andreea Matei [:AndreeaMatei]

Updated

•

12 years ago

Assignee: hskupin → andrei.eftimie

Comment 9

•

12 years ago

Trying to figure this one out. When running python tests we get something like this: > RESULTS | Passed: 1 > RESULTS | Failed: 0 > RESULTS | Skipped: 0 Running any js tests fails to print any "RESULTS", the whole block is missing. Is this intended? If so, this bug might not be relevant anymore. If not, we should file another bug for re-enabling the RESULTS block for JS tests.

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 10

•

12 years ago

If we don't print the final results it's certainly worth another bug we should fix first.

Flags: needinfo?(hskupin)

Updated

•

12 years ago

Depends on: 859746

Comment 11

•

12 years ago

I've set up a WinXP box with mozmill 2.0, but I am unable to reproduce this. According to comment 8 we should be able to reproduce the problem by 'negating the test': > So we can simply negate the test to make it happen. So it shoudl still be reproducible 100% of the time. While running the assertions/expect_assert.js test I always get 2 tests logged. I've tried changing the TEST_DATA objects or the conditions from within the tests to make them fail. So I either get 2 PASSED test or 2 FAILED tests. Henrik, is there something particular that needs "negating" here?

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 12

•

12 years ago

I'm still seeing this with latest Mozmill code against the Nightly build by running mozmill -b ../bin/nightly/firefox.exe -m mutt/mutt/tests/js/assertions/tests.ini

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 13

•

12 years ago

This time I see this on Windows 7, so it's not XP only.

Comment 14

•

12 years ago

Thanks Henrik, I can also see it on Win7; XP works fine for me. Finally a starting point.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

12 years ago

Whiteboard: [mozmill-2.0+] s=130408 u=failure c=mozmill p=1 → [ateamtrack: p=mozmill q=2013q2 m=1][mozmill-2.0+]

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

12 years ago

Whiteboard: [ateamtrack: p=mozmill q=2013q2 m=1][mozmill-2.0+] → [ateamtrack: p=mozmill q=2013q2 m=0][mozmill-2.0+]

Comment 15

•

12 years ago

An update to what I have found so far. frame.js is firing the mozmill.endTest event here: https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/extension/resource/modules/frame.js#L442 but sometimes the endTest method which would append the results to the results list here: https://github.com/mozilla/mozmill/blob/master/mozmill/mozmill/__init__.py#L69 is not run It might also be that this is indeed fired, but to late, (testrun finishes, handler gets destroyed before receiving all data?), but I have no proof to support this. If this is the case we might need to correctly wait for all reporting to finish before generating the final report.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 16

•

12 years ago

You should add debugging statements to both sides to check which events are getting fired at which times. That helped me a lot in the past.

Comment 17

•

12 years ago

I've added time to my print/dump statements. Formatted Hour:Minute::Second:Milisecond From 2 tests - randomly - none, one or both would be logged. Flow: frame.js => fires the endTest event frame.js => takes the event and fires it through Events.jsm mozmill/__init__.py :: endTest_listener => receives the event and appends test results When our problem occurs the python never triggers the endTest_listener method, which might indicate a problem in Events.jsm The times look a bit off, but I suspect JS and Python dates are not completely in sync? Here is a dump where no test was logged: > ←[1;34mTEST-START←[0m | expect_assert.js | testExpect > > 11:16::27:410 FIRE endTest EVENT (frame.js) > > 11:16::27:410 mozmill.endTest (frame.js => Events.jsm) > 11:16::27:428 FIRE endTest EVENT (frame.js) > > > 11:16::27:817000 final reporting > > RESULTS | Passed: 0 > RESULTS | Failed: 0 > RESULTS | Skipped: 0 > 11:16::27:428 mozmill.endTest (frame.js => Events.jsm) Here is a dump where one test was logged: > ←[1;34mTEST-START←[0m | expect_assert.js | testExpect > > 11:18::26:882 FIRE endTest EVENT (frame.js) > > 11:18::26:882 mozmill.endTest (frame.js => Events.jsm) > 11:18::26:916 FIRE endTest EVENT (frame.js) > > ←[1;32mTEST-PASS←[0m | expect_assert.js | testAssert > > 11:18::26:910000 APPEND TEST RESULTS (mozmill/__init__.py => endTest_listener) > > > TEST-END | expect_assert.js | finished in 64ms > > 11:18::27:331000 final reporting > > RESULTS | Passed: 1 > RESULTS | Failed: 0 > RESULTS | Skipped: 0 > 11:18::26:918 mozmill.endTest (frame.js => Events.jsm) I am not sure how we should proceed further.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 18

•

12 years ago

If the Python side doesn't get the endTest event, you will have to add debugging statements to jsbridge code. That will show you in detail what's getting transferred or where we bail out with a failure.

Comment 19

•

12 years ago

Attached file successfull dump (2 out of 2) — Details

I've dug around deeper and I've reached the point where we open a Socket create a buffer with the message and send it: https://github.com/mozilla/mozmill/blob/master/jsbridge/jsbridge/extension/resource/modules/Sockets.jsm#L64 In both cases (when we fail and when we succeed) all parameters are equal (except for the timestamp in the message and - probably - the equivalent in the buffer). Attaching output with dump statements for when we see both tests correctly logged, and when we see only 1. I'm not sure how to monitor what gets sent through the socket. **Note. I have noticed that the more stuff I dumped to the console, the harder was it to get a failure. With the dumps seen in the attachements I had no results with 0 and one out of 20 runs with 1 result. Without any dumps the ratio would have been roughly 1 in 6 and 1 in 3. I could infer that this would add some delay that would make the test not fail to report. But nothing conclusive.

Comment 20

•

12 years ago

Attached file failed dump (1 out of 2) — Details

Comment 21

•

12 years ago

Henrik, any idea on how we could proceed? Based on what I've found the problem seems to lie within the Socket communication. I'm thinking of setting up some sort of proxy to see all communication, but I'm not that it would be feasible - or even possible. Have not tried to eavesdrop on a Socket until now.

Flags: needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 22

•

12 years ago

Would you mind to reduce the assertion test? It might be that a specific check in this test and its result cause this awkward behavior. If there is such a check it would be easier for us to track down why it's failing.

Flags: needinfo?(hskupin)

Comment 23

•

12 years ago

(In reply to Andrei Eftimie from comment #21) > Henrik, any idea on how we could proceed? > Based on what I've found the problem seems to lie within the Socket > communication. > > I'm thinking of setting up some sort of proxy to see all communication, but > I'm not that it would be feasible - or even possible. Have not tried to > eavesdrop on a Socket until now. If you put dump statements in jsbridge's recv/send code you can see all the information going over the socket. jsbridge/extension/resource/modules/Sockets.jsm

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 24

•

12 years ago

Andrei, have you seen my comment 22? Can you please re-check by reducing the assertion test?

Flags: needinfo?(andrei.eftimie)

Comment 25

•

12 years ago

I am unable to reproduce this anymore. Same machine, updated FF and Mozmill. Older FF versions still have this problem. With tinderbox builds I have found the following pushlog which fixes the problem: http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=eeb89381c7a3 This is a merge from birch. Not sure what could have fixed this. We have some changes to a socket but they are advertised as Unix only (and we only had this problem in windows): http://hg.mozilla.org/mozilla-central/rev/6c52203a47b8 Since our problem appeared to be in a Socket connection, might this have fixed it? Henrik, can you please see if you can reproduce this with the latest Nightly?

Flags: needinfo?(andrei.eftimie) → needinfo?(hskupin)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 26

•

12 years ago

(In reply to Andrei Eftimie from comment #25) > With tinderbox builds I have found the following pushlog which fixes the > problem: > http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=eeb89381c7a3 > > This is a merge from birch. That doesn't make any sense given that it was a merge from mozilla-central to birch and not the other way around. Nothing in here should have changed the behavior for Nightly or a tinderbox build for Nightly. I tried to run the test in my Win7 VM but it was not successful given the failure: "message": "redeclaration of var Cu", "fileName": "resource://mozmill/stdlib/securable-module.js -> resource://mozmill/modules/assertions.js", "name": "TypeError", "lineNumber": 7 Do you see the same? It's only happening for me on Windows with latest code of Mozmill but not on Linux.

Flags: needinfo?(hskupin)

Comment 27

•

12 years ago

I have no failures on Win 7. Double checked that I am running latest versions of mozmill-env, mozmill, Nightly. > $ mozmill -b /c/Mozilla/Nightly/firefox-24.0a1.en-US.win32/firefox/firefox.exe - m /c/work/mozmill/mutt/mutt/tests/js/assertions/tests.ini > TEST-START | expect_assert.js | testExpect > TEST-PASS | expect_assert.js | testExpect > TEST-START | expect_assert.js | testAssert > TEST-PASS | expect_assert.js | testAssert > TEST-END | expect_assert.js | finished in 154ms > RESULTS | Passed: 2 > RESULTS | Failed: 0 > RESULTS | Skipped: 0 > > svuser@PXXXWIN7VM ~ > $ mutt -b /c/Mozilla/Nightly/firefox-24.0a1.en-US.win32/firefox/firefox.exe -m / c/work/mozmill/mutt/mutt/tests/js/assertions/tests.ini > Running python tests > > ---------------------------------------------------------------------- > Ran 0 tests in 0.000s > > OK > Running JS Tests > TEST-START | expect_assert.js | testExpect > TEST-PASS | expect_assert.js | testExpect > TEST-START | expect_assert.js | testAssert > TEST-PASS | expect_assert.js | testAssert > TEST-END | expect_assert.js | finished in 64ms > =========================================================================== > Python Failures: > > =========================================================================== > Javascript Failures: > > Total passed: 2 > Total failed: 0 > Total skipped: 0 ================ I got that pushlog by opening "about:buildconfig" in the first tinderbox build that didn't reproduce the problem, and clicked on the link "built from"

Comment 28

•

12 years ago

If we can't repro anymore than mark as WFM?

Comment 29

•

12 years ago

Hopefully yes. I want someone has has previously seen this to confirm that it isn't happening anymore.

Comment 30

•

12 years ago

I've come across something similar, which might in fact be the same issue, but with a 100% reproducibility rate. Latest patch on bug 840487 (we should probably wait for that to land to make debugging this easier), under mozmill 2.0 with a en-US build passes nicely. However with a jp or fr build i _never_ get a PASS statement and the final output looks like > RESULTS | Passed: 0 > RESULTS | Failed: 0 > RESULTS | Skipped: 0 de builds work fine. Preliminary info might suggest encoding problems. Previous research showed the problem lying somewhere in the socket communication layer. So UTF or encoding problems related to special charactes _might_ be a cause. Will investigate more deeply.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 31

•

12 years ago

Same happens on OS X when running: mozmill -b /mozilla/bin/nightly/firefox -t mutt/mutt/tests/js/frame/user_shutdown/multiple/test5.js So this is most likely a shutdown problem we might be able to fix with the landing of bug 865690.

Depends on: 865690

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 32

•

12 years ago

Oh, I meant Linux not OS X.

OS: Windows XP → All

Hardware: x86 → All

Comment 33

•

12 years ago

That sounds very reasonable. We send information through the Socket, but it looks like we never receive it. If our connection shuts down before all messages come through looks like a very plausible explanation.

Jonathan French (:jfrench)

Comment 34

•

12 years ago

I also noticed variances from run to run in both expect_assert.js and screenshot.js in the course of evaluating whimboo's patch for bug 865690. Simply running in clean master multiple times, there were similar variances, from run to run. Some would contain a TEST-END, TEST-START, some not. Configuration: Windows XP32 SP3 Latest Nightly 25.0a1 20130625031238 I can provide logs if required, but won't initially so I don't clutter up the attachments list.

Summary: Mozmill returns PASS/SKIP/FAIL == 0 for test_assertions.js → Mozmill returns PASS/SKIP/FAIL == 0 for test_assertions.js, screenshot.js

Jonathan French (:jfrench)

Comment 35

•

12 years ago

And for context, whimboo has already received those logs via email.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

12 years ago

Summary: Mozmill returns PASS/SKIP/FAIL == 0 for test_assertions.js, screenshot.js → Mozmill returns PASS/SKIP/FAIL == 0 for expect_assert.js, screenshot.js

Comment 36

•

12 years ago

Disregard the failure mentioned in comment 30. Created bug 889817 for it since its an entirely different issue. Most of this bug is no longer reproducible. Lets wait for bug 865690 to land. We'll probably be able to close this one as well then.

Jonathan French (:jfrench)

Comment 37

•

12 years ago

I retested with the latest mozmill 2.0 application-shutdown changes landed from bug 865690, out of interest. I did about a dozen `mutt testjs` runs. With respect to this open bug, as would be expected I am still seeing the same problem and variances run to run with expect_assert.js and screenshot.js, described in Comment 34. Config: Windows XP32 SP3 mozmill-env w/mozmill 2.0rc4 latest Nightly 25.0a1 20130704031323

Comment 38

•

12 years ago

Thanks Jonathan for the update! I'll try to replicate your results on a Win XP box. I haven't been able to see the problem in Windows7 anymore.

Comment 39

•

12 years ago

I have not been able to reproduce this on WinXP (50 clean runs). But I see it on Win7 now. Will try to dig it up again.

Comment 40

•

12 years ago

Reproducing this with --debug shows that no Events are triggered after shutdown. Doesn't seem related to bug 865690 Only visible thing from running with --debug is that the TEST-START event is logged later then when it pases.

Jonathan French (:jfrench)

Comment 41

•

12 years ago

(In reply to Andrei Eftimie from comment #40) > Doesn't seem related to bug 865690 Indeed/agreed.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 42

•

12 years ago

Ok, I will take this bug so we can get this finished off. Otilia, please find a replacement for. I did a look at those tests and I'm not exactly sure yet what's causing this problem. It might be the size or the content of the data we are trying to send via jsbridge. I have to dig deeper into it. At least I have seen a strange functionName for one of the checks in expect_assert.js: {"fileName":"file:///c:/mozilla/mozmill/mutt/mutt/tests/js/assertions/expect_assert.js","functionName":"testAssert/<","lineNumber":73,"message":" (Error). Doesn't throw on named exception - Got unwanted exception"} Please see the 'testAssert/<' part. It might be not related but I will check this first and fix it. I will keep you informed about updates.

Assignee: andrei.eftimie → hskupin

Flags: needinfo?(otilia.anica)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 43

•

12 years ago

Ok, after digging around a lot without success, I finally picked up the content size and mentioned that to Clint. He replied that we might have an overflow while sending the data. That explosively raised some ideas in my head and I tried a short sleep in wrapper()@frame.js. This let the tests pass! Wohoo! So yes, we send data too quickly. So I looked into the NSS and Sockets module of JSBridge. I tried to get a result for the PR_Send call but no exception gets raised. So I tried to check the return value of the method, and noticed that it might be the length of the data sent. In case of this failure I was able to see a lot of -1 values. So I checked on MXR for the PR_Send usage, and found out that we have to retry sending if -1 gets returned. A patch I have implemented is exactly doing that now, and I can't see the failure anymore. All affected tests are passing now.

Summary: Mozmill returns PASS/SKIP/FAIL == 0 for expect_assert.js, screenshot.js → Sending data too quickly through the JSBridge socket causes an overflow and lots of event data can get lost (expect_assert.js, screenshot.js)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 44

•

12 years ago

Attached patch WIP v1 (obsolete) — Details — Splinter Review

This patch fixes the issue for me and does not introduce new problems on any platform. Andrei and Jonathan, can you both please test this on your machines, given that you were also able to see it? Thanks.

Attachment #775049 - Flags: review?(ctalbert)

Attachment #775049 - Flags: feedback?(tojonmz)

Attachment #775049 - Flags: feedback?(andrei.eftimie)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

12 years ago

Attachment #775049 - Attachment description: Patch v1 → WIP v1

Comment 45

•

12 years ago

I don't seem to see the original problem anymore, at least not to the full extent. I'm running: > mozmill -m mutt/mutt/tests/js/assertions/tests.ini -b /c/Mozilla/Nightly/firefox-25.0a1.en-US.win32/firefox/firefox.exe And with the WIP patch I get 5 out of 6 runs: > TEST-START | expect_assert.js | testExpect > TEST-PASS | expect_assert.js | testAssert > TEST-END | expect_assert.js | finished in 48ms > RESULTS | Passed: 1 > RESULTS | Failed: 0 > RESULTS | Skipped: 0 It should register (I'm getting this correct result with the new patch in 1 out of 6 runs): > TEST-START | expect_assert.js | testExpect > TEST-PASS | expect_assert.js | testExpect > TEST-START | expect_assert.js | testAssert > TEST-PASS | expect_assert.js | testAssert > RESULTS | Passed: 2 > RESULTS | Failed: 0 > RESULTS | Skipped: 0 We still have some issues to solve.

Comment 46

•

12 years ago

Some runs with debug statements: http://pastebin.com/h7DEwJzW // 1 passed http://pastebin.com/A79ccKda // 0 passed

Comment 47

•

12 years ago

Previous links were with --console-level=DEBUG This one is with --debug: http://pastebin.com/nPErkkVW

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 48

•

12 years ago

Sorry, those should have been run with --debug so we can see which message get send over the bridge. As said on IRC I have never seen it failing anymore. Are you sure the patch has been applied correctly?

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 49

•

12 years ago

Are you sure you are on the latest master? Usually you should get more information with --debug and not only "* Fire event: 'mozmill.pass'". What I see is: * Sending message: '{"eventType":"mozmill.pass","result":{"pass":{"fileName":"file:///c:/mozilla/mozmill/mutt/mutt/tests/js/assertions/expect_assert.js","functionName":"testAssert","lineNumber":118,"message":" (AssertionError). assert.deepEqual for [[object Object], [object Object], Objects are equal]"}}}' Please closely check that. I assume that's the reason why it does not work for you.

Comment 50

•

12 years ago

I am on the lastest master + this WIP patch applied. Edit: the problem was with my mozmill-env, re-initialized it completely and it works now, seems somthing was off there. I don't get anymore failures!! Your fix appears to be working nicely. We should probably log this message only in debug > Failed sending message through the socket. Retrying.

Jonathan French (:jfrench)

Comment 51

•

12 years ago

(In reply to Henrik Skupin (:whimboo) from comment #44) > Created attachment 775049 [details] [diff] [review] > WIP v1 > Andrei and Jonathan, can you both please test this on your machines, given > that you were also able to see it? Thanks. Just back from several days off, trying it now also. Will report back.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 52

•

12 years ago

(In reply to Andrei Eftimie from comment #50) > I am on the lastest master + this WIP patch applied. > Edit: the problem was with my mozmill-env, re-initialized it completely and > it works now, seems somthing was off there. Great. So my assumption was right. > We should probably log this message only in debug > > Failed sending message through the socket. Retrying. Correct. That's why it's a WIP patch. The final patch will also include a Python test.

Comment 53

•

12 years ago

Comment on attachment 775049 [details] [diff] [review] WIP v1 Review of attachment 775049 [details] [diff] [review]: ----------------------------------------------------------------- Missed the feedback flag, as stated above this works fine for me :) Great find and fix Henrik! ::: jsbridge/jsbridge/extension/resource/modules/Sockets.jsm @@ +69,5 @@ > + buffer = new NSS.Sockets.buffer(message.substring(sent)); > + count = NSS.Sockets.PR_Send(this.fd, buffer, buffer.length, > + 0, NSS.Sockets.PR_INTERVAL_MAX); > + if (count < 0) { > + dump('Failed sending message through the socket. Retrying.\n'); We should only log this when debug is active. (or don't log it at all)

Attachment #775049 - Flags: feedback?(andrei.eftimie) → feedback+

Jonathan French (:jfrench)

Comment 54

•

12 years ago

Comment on attachment 775049 [details] [diff] [review] WIP v1 I did a dozen runs with latest master to re-re-produce the problem variances with expect_assert.js and screenshot.js. And then I did 20 runs with this WIP patch overlayed; and the missing result lines are now correctly present in all 20 patched runs: TEST-PASS | js\assertions\expect_assert.js | testAssert TEST-END | js\assertions\expect_assert.js TEST-PASS | js\controller\screenshot.js | test TEST-PASS | js\controller\screenshot.js | testChrome I diffed all the 20 output files against each other, and grep'd them with wc to be absolutely sure. Config: Windows XP32 SP3 mozmill-env w/mozmill 2.0rc4 latest latest Nightly 25.0a1 20130716030202 138524 I too, see many of the intended 'Failure sending message through socket' diagnostics in the output. I gather the volume of them is expected. An amount ranging from about 500 to 1,500 of them in an output file, for each `mutt testjs` run.

Attachment #775049 - Flags: feedback?(tojonmz) → feedback+

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 55

•

12 years ago

Thanks Jonathan! That's great to hear that it is also fixed for you. And as stated above this output is wanted to show you how many information we were leaking before. So by tomorrow I will update the patch and also include a python test so we can be sure this dataloss will never happen again under those circumstances.

Severity: normal → critical

Keywords: dataloss

QA Contact: hskupin

Whiteboard: [ateamtrack: p=mozmill q=2013q2 m=0][mozmill-2.0+] → [mozmill-2.0+]

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Updated

•

12 years ago

Flags: needinfo?(otilia.anica)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 56

•

12 years ago

Comment on attachment 775049 [details] [diff] [review] WIP v1 I will re-ask for feedback once the patch is ready.

Attachment #775049 - Flags: review?(ctalbert)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 57

•

12 years ago

Attached patch Patch v1 (obsolete) — Details — Splinter Review

I know that I said a test would be good to have, but I was not able to create a reliable one which always fails with the old implementation. I got more passes. So I don't think that I should spend more time on trying to deeper investigate that. We should get this patch in and should be happy. All of our tests are still working and even others are fixed with this patch landed.

Attachment #775049 - Attachment is obsolete: true

Attachment #777394 - Flags: review?(ctalbert)

Jonathan French (:jfrench)

Comment 58

•

12 years ago

I will do some new runs with the latest patch and report back.

Jonathan French (:jfrench)

Comment 59

•

12 years ago

Another dozen runs minus the patch, just to confirm the test environment was valid. And 20 runs with the final patch v1 in Comment 57 overlayed; the missing result lines are still correctly present in all 20 runs. Pass: 61 Fail: 0 Skip: 2 Config: Windows XP32 SP3 mozmill-env w/mozmill 2.0rc4 latest latest Nightly 25.0a1 20130717030207 138720

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 60

•

12 years ago

Fantastic. That's really great to hear!

Comment 61

•

12 years ago

Comment on attachment 777394 [details] [diff] [review] Patch v1 Review of attachment 777394 [details] [diff] [review]: ----------------------------------------------------------------- This looks great. r+ with the one comment addressed. I tested it on windows 7 64, and couldn't get anything to fail. So I think we might have finally nailed this one. Good job Henrik, Andrei and Jonathan! ::: jsbridge/jsbridge/extension/resource/modules/Sockets.jsm @@ +70,5 @@ > + buffer = new NSS.Sockets.buffer(message.substring(sent)); > + count = NSS.Sockets.PR_Send(this.fd, buffer, buffer.length, > + 0, NSS.Sockets.PR_INTERVAL_MAX); > + if (count < 0) { > + Log.dump('PR_Send', 'Failure sending data. Retrying...'); If we lose the socket then this is going to retry forever. Let's put something in here to prevent that case. The safest is probably an upper limit on retries. Often when the socket goes away our handle to it (this.fd) is still valid, so we can't simply test for that being not null. I'd recommend putting something in where we retry some number of times and potentially break out of the loop if we hit max retries. I'd say a 100 retries should be plenty (pulling a number out of thin air).

Attachment #777394 - Flags: review?(ctalbert) → review+

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 62

•

12 years ago

So I was trying to find something better and also found it. If count is -1 an error is expected, and you can retrieve it via PR_GetError(). So I have implemented this method additionally to the other Socket methods. The error value in those cases is PR_WOULD_BLOCK_ERROR. After some searching I found the following webpage which fantastically describes what's going on: http://www.mail-archive.com/dev-tech-crypto@lists.mozilla.org/msg09750.html So a better solution here is to check for that error and retry to send the data. At some point the send buffer will be empty again and we can send our message. If another error occurs we will break the loop. That way we do not need a hardcoded limit, which I usually never like. :) The patch is coming up in a bit with some refactoring of NSS.jsm.

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 63

•

12 years ago

Attached patch 0001-Bug-764640-Sending-data-too-quickly-through-the-JSBr.patch (obsolete) — Details — Splinter Review

Attachment #777394 - Attachment is obsolete: true

Attachment #778028 - Flags: review?(ctalbert)

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 64

•

12 years ago

Attached patch 0001-Bug-764640-Sending-data-too-quickly-through-the-JSBr.patch — Details — Splinter Review

The last version of the patch had a double quote due to the autocomplete feature of sublimetext. This is fixed now.

Attachment #778028 - Attachment is obsolete: true

Attachment #778028 - Flags: review?(ctalbert)

Attachment #778034 - Flags: review?(ctalbert)

Comment 65

•

12 years ago

Comment on attachment 778034 [details] [diff] [review] 0001-Bug-764640-Sending-data-too-quickly-through-the-JSBr.patch Review of attachment 778034 [details] [diff] [review]: ----------------------------------------------------------------- I think this looks good. Tested it again on win 7 64, everything came through with flying colors. ::: jsbridge/jsbridge/extension/resource/modules/Sockets.jsm @@ +65,5 @@ > + var buffer; > + var count; > + var sent = 0; > + > + // We have to send a final '\0' so we need one more char Good addition. @@ +73,5 @@ > + 0, NSS.Sockets.PR_INTERVAL_MAX); > + if (count < 0) { > + var error = NSS.Sockets.PR_GetError(); > + if (error !== NSS.Types.PR_WOULD_BLOCK_ERROR) { > + Log.dump("PR_Send", "Failed with error " + error); Much better than my idea. Thanks.

Attachment #778034 - Flags: review?(ctalbert) → review+

Henrik Skupin [:whimboo][⌚️UTC+1]

Assignee

Comment 66

•

12 years ago

Landed on master: https://github.com/mozilla/mozmill/commit/42aa3cb84ba201b2fc6368cf193302311e4a78cb

Status: ASSIGNED → RESOLVED

Closed: 12 years ago

Resolution: --- → FIXED

Whiteboard: [mozmill-2.0+] → [mozmill-2.0+][ateamtrack: p=mozmill q=2013q3 m=1]

Nobody; OK to take it and work on it

Updated

•

9 years ago

Product: Testing → Testing Graveyard

You need to log in before you can comment on or make changes to this bug.