Closed Bug 855325 Opened 11 years ago Closed 11 years ago

Missing nspr4 module causes jsbridge to fail when creating the network socket

Categories

(Testing Graveyard :: Mozmill, defect)

defect
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: andrei, Assigned: andrei)

References

Details

(Keywords: regression, Whiteboard: [ateamtrack: p=mozmill q=2013q2 m=1][mozmill-2.0+])

Attachments

(3 files, 4 obsolete files)

Possible regression in FF.
Started failing on the 18th March 2013.

Only affects Mozmill 2.0

Traceback
---------
Traceback (most recent call last):
  File "/Users/andrei.eftimie/work/mozilla/mozmill/src/mozmill/mozmill/mozmill/__init__.py", line 758, in run
    mozmill.run(tests, self.options.restart)
  File "/Users/andrei.eftimie/work/mozilla/mozmill/src/mozmill/mozmill/mozmill/__init__.py", line 395, in run
    frame = self.run_test_file(frame or self.start_runner(),
  File "/Users/andrei.eftimie/work/mozilla/mozmill/src/mozmill/mozmill/mozmill/__init__.py", line 315, in start_runner
    self.create_network()
  File "/Users/andrei.eftimie/work/mozilla/mozmill/src/mozmill/mozmill/mozmill/__init__.py", line 275, in create_network
    self.jsbridge_port)
  File "/Users/andrei.eftimie/work/mozilla/mozmill/src/mozmill/jsbridge/jsbridge/__init__.py", line 44, in wait_and_create_network
    raise Exception("Cannot connect to jsbridge extension, port %s" % port)
Exception: Cannot connect to jsbridge extension, port 62010

Pushlog (using tinderbox builds)
--------------------------------
http://hg.mozilla.org/mozilla-central/pushloghtml?changeset=09f72f45a0b7
Andrei, can you please continue with inbound builds so we get a better time frame? There are way too many changesets. Does it fail on each platform?
Whiteboard: [mozmill-2.0+]
More relevant pushlog (using inbound builds):
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=4d31a3c8b067

Mike,
do you know what changed that could affect the jsbridge connection?
Maybe a pointer where to look to fix this.
Thanks
Flags: needinfo?(mh+mozilla)
Is something involved in mozmill using one of the following libraries from a binary component or ctypes?
- nspr4
- plds4
- plc4
- sqlite3
- nssutil3
- smime3
- ssl3
Flags: needinfo?(mh+mozilla)
Yes, we are using nspr4 to establish the network connection between the Python side and the extension.

https://github.com/mozilla/mozmill/blob/master/jsbridge/jsbridge/extension/resource/modules/NSPR.jsm#L23
(In reply to Henrik Skupin (:whimboo) from comment #4)
> Yes, we are using nspr4 to establish the network connection between the
> Python side and the extension.
> 
> https://github.com/mozilla/mozmill/blob/master/jsbridge/jsbridge/extension/
> resource/modules/NSPR.jsm#L23

That's your problem, then. On all platforms except linux, nspr4 is gone, it's folded into nss3.
Will it stay in nspr4 on linux or will it also be folded into nss3 later? As I read it we will not change it there. So we would have to special-case Linux here.
Blocks: 648407
Severity: normal → blocker
Keywords: regression
(In reply to Henrik Skupin (:whimboo) from comment #6)
> Will it stay in nspr4 on linux or will it also be folded into nss3 later? As
> I read it we will not change it there. So we would have to special-case
> Linux here.

I'd recommend trying nspr4 and if it fails, try nss3. That would work on all platforms, and would work across versions (the same code would run on 21 and 22, for instance)
(In reply to Mike Hommey [:glandium] from comment #7)
> (In reply to Henrik Skupin (:whimboo) from comment #6)
> > Will it stay in nspr4 on linux or will it also be folded into nss3 later? As
> > I read it we will not change it there. So we would have to special-case
> > Linux here.
> 
> I'd recommend trying nspr4 and if it fails, try nss3.

Actually, the opposite might be better. nss3 first, and nspr4 if it fails.
Yes, that sounds fine to me. Thanks. 

Andrei, if you want, you can already start to work on this bug. If we don't have a patch by Tuesday next week I will take over.
Summary: Mozmill fails with "Exception: Cannot connect to jsbridge extension, port 62010" → Missing nspr4 module causes jsbridge to fail when creating the network socket
Assignee: nobody → andrei.eftimie
Whiteboard: [mozmill-2.0+] → [mozmill-2.0+] s=130401 u=failure c=mozmill p=1
Speaking with bsmedberg on IRC, an alternative that would be both future-proof (because we may fold libraries even more and nss3 could end up going away too) and easier on addons would be to expose some constant or function that allows to request library x (nspr4, nss3, smime3, you name it) and get back a ctypes library (as would be returned by ctypes.open) for the right library, whatever it is on the given platform.

There is already something along these lines with OS.Constants.Path.libxul, although it returns a path (but it's only used with ctypes.open).

Yoric, what do you think?
Flags: needinfo?(dteller)
That sounds like a good idea to me.
We could either return a path, as is done now, or a lazily instantiated per-thread library object.
Flags: needinfo?(dteller)
Attached patch Patch v1Splinter Review
Replaced NSPR with NSS.

Disclaimer: what these crypto libraries do is way over my head.
All I did was to use NSS instead of NSPR. Their API's as far as we're concerned seem to be compatible.

In regards to the discussion about Linux, for our use NSS seems to work with Mozmill 2.0 on all linux branches.

The reported failure is gone, tested on OSX, Windows and Linux.
Attachment #731794 - Flags: review?(hskupin)
Attachment #731794 - Flags: review?(dave.hunt)
Comment on attachment 731794 [details] [diff] [review]
Patch v1

Review of attachment 731794 [details] [diff] [review]:
-----------------------------------------------------------------

Comment 8 suggested falling back to nspr4 if nss3 fails, which I don't see in this patch. Also, the latest comments imply that this may change again, so this patch would likely be a temporary fix.
Attachment #731794 - Flags: review?(hskupin)
Attachment #731794 - Flags: review?(dave.hunt)
Attachment #731794 - Flags: review-
Blocks: 803492
I don't see absolutely no reason why this will block bug 803492.
No longer blocks: 803492
Mike I have a followup question for you:

We only use Sockets and Type from NSPR, and changing to NSS seems to work fine across all platforms (linux, osx, win) and branches (from esr17 to 23).

We are thinking on ditching NSPR altogether from Mozmill and only use NSS.
Do you know if we're safe doing this?
Are there any caveats we should be aware of?

Thanks
Flags: needinfo?(mh+mozilla)
It /should/ be fine, but don't quote me on this ;)
If you want to be future proof, you might as well start from OS.Constants.Path.libxul... the only caveat is that it's not supported on esr17. Not sure when it landed either (that is, maybe release and/or beta don't have it), but it's not very hard to fallback to a custom implementation of it...
Flags: needinfo?(mh+mozilla)
Comment on attachment 731794 [details] [diff] [review]
Patch v1

Based on our discussion in yesterdays Ask an Expert and on Mike's comments I'm asking again for a review.

Retested on all platforms.
Works good.
Attachment #731794 - Flags: review?(hskupin)
Attachment #731794 - Flags: review?(dave.hunt)
Comment on attachment 731794 [details] [diff] [review]
Patch v1

Review of attachment 731794 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me then. If modules are changing again in the future we might use a custom name so we don't have to change it each time.
Attachment #731794 - Flags: review?(hskupin)
Attachment #731794 - Flags: review?(dave.hunt)
Attachment #731794 - Flags: review+
https://github.com/mozilla/mozmill/commit/13e8fef8847063f5cf2fa8c08c089871f94e3496
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [mozmill-2.0+] s=130401 u=failure c=mozmill p=1 → [ateamtrack: p=mozmill q=2013q2 m=1][mozmill-2.0+]
As talked about in yesterdays ask an expert meeting, this bug should have been reopened given that it is not fixed. It breaks compatibility with builds on older branches even commented otherwise in comment 15. We should implement a fallback mechanism instead.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Fallback v1 (obsolete) — Splinter Review
Prior to having NSPR folded into NSS we had both files available.
After the fold NSPR is no more.

Thus: if NSPR exists we use it (older versions) else we use NSS (newer versions).

I thought about renaming the file (and internal used name) once again to a more generic name, but since NSS will arrive on all branches shortly and NSPR is only used as a fallback for older versions, leaving the name to NSS seems appropriate.

Works like a charm on OSX.
I've had problems getting a full testrun on Windows and Linux due to crashes (which are most probably from other sources, but I am not sure) :(

I'm uploading this now, and will try tomorrow again to run more tests.

Windows
=======
Nightly: http://mozmill-crowd.blargon7.com/#/functional/report/c8094a822ef568b588c5eddacae9c9f2


OSX
===
Nightly: http://mozmill-crowd.blargon7.com/#/functional/report/c8094a822ef568b588c5eddacad08f7d
Release: http://mozmill-crowd.blargon7.com/#/functional/report/c8094a822ef568b588c5eddacacee41d
ESR17: http://mozmill-crowd.blargon7.com/#/functional/report/c8094a822ef568b588c5eddacacfde8d
Attachment #740844 - Flags: review?(andreea.matei)
Comment on attachment 740844 [details] [diff] [review]
Fallback v1

Review of attachment 740844 [details] [diff] [review]:
-----------------------------------------------------------------

Looks fine. Please make sure to add a comment which describes why we are using NSS3 as fallback and not the no longer used NSPR.
Comment on attachment 740844 [details] [diff] [review]
Fallback v1

Review of attachment 740844 [details] [diff] [review]:
-----------------------------------------------------------------

Waiting for the requested comment.
Attachment #740844 - Flags: review?(andreea.matei) → review-
Attached patch Fallback v2 (obsolete) — Splinter Review
Added more details into the comment.
Also managed to get reports on Linux and one on windows.
Windows is very flaky atm, on lots of the failures we have it crashes completely. Only managed to get a complete run on Nightly.

Linux
=====
Nightly: http://mozmill-crowd.blargon7.com/#/functional/report/49feb8c5f96a46e9037cffdde11e26ff
Release: http://mozmill-crowd.blargon7.com/#/functional/report/49feb8c5f96a46e9037cffdde1200556

Windows
=======
Nightly: http://mozmill-crowd.blargon7.com/#/functional/report/49feb8c5f96a46e9037cffdde1259022
-- there's a dashboard js error while trying to show the report :(
Attachment #740844 - Attachment is obsolete: true
Attachment #741281 - Flags: review?(hskupin)
Attachment #741281 - Flags: review?(andreea.matei)
(In reply to Andrei Eftimie from comment #24)
> Windows is very flaky atm, on lots of the failures we have it crashes
> completely. Only managed to get a complete run on Nightly.

Tell us those crash reports, so we can have a look at and if it is our fault. I don't think that this patch is ready for review in that state. Please reconsider when to ask for review and when for needinfo.
Comment on attachment 741281 [details] [diff] [review]
Fallback v2

Review of attachment 741281 [details] [diff] [review]:
-----------------------------------------------------------------

Leaving open for Henrik due to landing matters. I assume it will need a pull request on github.
Attachment #741281 - Flags: review?(andreea.matei) → review+
Comment on attachment 741281 [details] [diff] [review]
Fallback v2

Review of attachment 741281 [details] [diff] [review]:
-----------------------------------------------------------------

r- mainly because I'm still missing links to the crash reports. Nothing will be landed here unless we know that such crashes are not the fault of this patch.

::: jsbridge/jsbridge/extension/resource/modules/NSS.jsm
@@ +22,5 @@
>      let file = Services.dirsvc.get("GreD", Ci.nsILocalFile);
> +    file.append(ctypes.libraryName("nspr4"));
> +
> +    // Firefox folded nspr4 into nss3 from 22.0 onwards;
> +    // for older versions we still need to use nspr4

The comment at this position is not clear to me and only starts confusion. So something you might want to use as comment is:

"Even we would have to use nss3 by default, we don't do it because for versions older than 22.0 nss3 already exists but doesn't offer the necessary ctypes methods to us. So we try nspr4 first.
Attachment #741281 - Flags: review?(hskupin) → review-
Attached patch Fallback v3Splinter Review
Updated patch with updated comment.

> r- mainly because I'm still missing links to the crash reports. Nothing will be 
> landed here unless we know that such crashes are not the fault of this patch.

I'm not sure what you mean by "crash reports". Attaching now a screenhsot, and the error message with one crash with our patch applied, and with a mozmill version pre any NSS changes.
Attachment #741281 - Attachment is obsolete: true
Attachment #741739 - Flags: review?(hskupin)
Attached file Crash log (pre NSS changes) (obsolete) —
This is not the only place it crashes (and this is not happening every run).

To get a testrun to finish you need to manually intervene and close a window or a modal dialog in a few places. I am _not_ 100% sure _none_ of the failures are related to this nss/nspr change.

Since not all of them happen every run, and they happen in different places (pre and pot the nss change) I do not have an accurate count. Working on that.
Andrei, not sure what's your definition of a crash is, but that's not a crash at all for Firefox. It's just a hanging modal dialog at the end of a test, which blocks us from restarting the browser correctly. That's exactly the same thing as the current failures for update tests (bug 862748). If that is not fixed in Mozmill 2.0 please file a new bug for it. We should always be able to close the browser even with a modal dialog running.
Attachment #741744 - Attachment is obsolete: true
Attachment #741748 - Attachment is obsolete: true
Attachment #741742 - Attachment description: Crash screenshot → shutdown hang (screenshot)
Comment on attachment 741739 [details] [diff] [review]
Fallback v3

Review of attachment 741739 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good now and pushed to master:
https://github.com/mozilla/mozmill/commit/738246a64a560eddaa5b63e07967bf1f4be1c286
Attachment #741739 - Flags: review?(hskupin) → review+
Status: REOPENED → RESOLVED
Closed: 11 years ago11 years ago
Resolution: --- → FIXED
Product: Testing → Testing Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: