Spotify web player is broken in Linux [google too]

RESOLVED FIXED in Firefox 42

Status

()

defect
RESOLVED FIXED
4 years ago
4 years ago

People

(Reporter: billm, Assigned: mcmanus)

Tracking

({regression})

Trunk
mozilla43
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox41 unaffected, firefox42+ fixed, firefox43+ fixed)

Details

Attachments

(4 attachments)

Using Nightly on Linux with e10s enabled. It looks like the Spotify web player no longer works since bug 1191253 landed.

STR:
1. Visit https://play.spotify.com/browse while logged into a Spotify account.
2. (Sometimes not needed) Click around a bit.

Expected: Can play songs and stuff.
Actual: Spotify quickly pops up a spinner dialog that says "Unable to connect to Spotify".

I've had to use Chrome to play music for the last few days. I finally bisected it to bug 1191253 using the try push in that bug.
Flags: needinfo?(daniel)
That's interesting as I run (an updated) Nightly on Linux and I play music using the Spotify web player quite a lot, even right now, without noticing any problems.

Are you getting these problems all the time?

Tell me more about your network situation when these problems trigger as that code made sure the "link monitor" runs which is used to detect when network changes and/or goes online/offline.
Assignee: nobody → daniel
Flags: needinfo?(daniel)
I'm in the Mozilla Mountain View office hooked up to ethernet. There isn't any proxy or anything.
Also, yes, I see the problem consistently. It should be pretty easy to debug if you give me some pointers.
Enabling logging for the "nsNotifyAddr" module should give you a good start. In a normal stable network situation it shouldn't generate anything basically. The key code that is running here is in netwerk/system/linux/nsNotifyAddrListener_Linux.cpp.

It will fire up a new thread that reads on a netlink socket that is fed information from the kernel about IP and routing updates, and then it uses that info to say if the network has changed or not and if it has, it also checks if there are network interfaces "online".
Duplicate of this bug: 1193796
Summary: Spotify web player is broken in Linux → Spotify web player is broken in Linux [google too]
daniel - I suggest you back 1191253 out of both 43 and 42 and work on relanding..
Assignee: daniel → mcmanus
Status: NEW → ASSIGNED
[tracking reason]

from https://bugzilla.mozilla.org/show_bug.cgi?id=1193796#c17 (dup of this)

Still unclear what's going wrong here, but the user impact is that this appears to break connections to (at least) Google properties for some network configurations for Linux users.
Attachment #8647123 - Flags: review+
Keywords: checkin-needed
Comment on attachment 8647123 [details] [diff] [review]
Revert Bug 1191253 - Start the Link Monitor

Approval Request Comment

this bug fix was made on 42 and then merged to aurora... but its not ready for prime time and needs to be backed out. m-c backout is just pending an open tree

[Feature/regressing bug #]:1191253
[User impact if declined]: linux users have network connectivity problems of undetermined severity
[Describe test coverage new/current, TreeHerder]: none - backout
[Risks and why]: low
[String/UUID change made/needed]:none
Attachment #8647123 - Flags: approval-mozilla-aurora?
Ok, this is a tiny patch that only adds 3 more lines of logging. The log from bug 1193796 (https://bugzilla.mozilla.org/attachment.cgi?id=8647198) showed a range of rapid events but doesn't reveal what those events were. With this patch applied, I think we'll learn more. (I suspect we need some sort of rate limiting or filtering.)
An alternative route for figuring this out is to build and run this stand-alone tool alongside with Firefox to see what it reports: https://github.com/bagder/afnetlink This tool listens to the same events Firefox listens to, but could be slightly easier to debug or play around with.
Keywords: checkin-needed
(In reply to Daniel Stenberg [:bagder] from comment #10)
> Created attachment 8647360 [details] [diff] [review]
> 0001-nsNotifyAddr-add-more-logging-to-aid-Linux-debugging.patch
> 
> Ok, this is a tiny patch that only adds 3 more lines of logging. The log
> from bug 1193796 (https://bugzilla.mozilla.org/attachment.cgi?id=8647198)
> showed a range of rapid events but doesn't reveal what those events were.
> With this patch applied, I think we'll learn more. (I suspect we need some
> sort of rate limiting or filtering.)

I don't have my Linux machine with me today; I'll get this logging tomorrow. Needinfoing myself so I don't forget.
Flags: needinfo?(botond)
https://hg.mozilla.org/mozilla-central/rev/3c8f6736b07c
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Here's the logging with the custom patch applied.
Flags: needinfo?(botond)
Attachment #8648089 - Flags: feedback?(daniel)
Comment on attachment 8648089 [details]
Logging with custom patch applied

Yeps thanks a lot, it shows lots of events coming in and "network change" event being sent. No wonder that causes problems. Now the question is _why_ those events come and what events they are so that I can filter them appropriately...
Attachment #8648089 - Flags: feedback?(daniel) → feedback+
(In reply to Daniel Stenberg [:bagder] from comment #11)
> An alternative route for figuring this out is to build and run this
> stand-alone tool alongside with Firefox to see what it reports:
> https://github.com/bagder/afnetlink This tool listens to the same events
> Firefox listens to, but could be slightly easier to debug or play around
> with.

I ran this tool, but got no output, even with the latest patch [1] applied.

[1] https://github.com/bagder/afnetlink/commit/7ab43291f48969e1505dc25513c83f7c63011134
I've got more info on the symptoms in bug 1194940 if you need it.
Comment on attachment 8647123 [details] [diff] [review]
Revert Bug 1191253 - Start the Link Monitor

Fix a recent regression, taking it.
Attachment #8647123 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
The latest Nightly update [43.0a1 (2015-08-17)] is working for Google on my Debian system, both with and without e10s enabled.
Tracking for 42+ since it's a recent regression - even though this is already fixed in nightly.
I can confirm GMail now works on Debian 64-bit with Firefox Developer Edition 42.0a2 (2015-08-18) with e10s enabled!
I'm still struggling to understand what the differences are between the Firefox and the afnetlink tool versions since one detects lots of (imaginary?) things and the other one doesn't.

This patch removes the setting of the socket to non-blocking mode to see if that makes a difference. It struck me it really isn't necessary to have it non-blocking anyway and the afnetlink tool runs in blocking mode.

I still haven't been able to trigger this weird behavior in any machine of mine.
Attachment #8650442 - Flags: feedback?(botond)
(In reply to Daniel Stenberg [:bagder] from comment #24)
> I still haven't been able to trigger this weird behavior in any machine of
> mine.

3 of us working at Mozilla Toronto experienced this so if setting up remote access would help, just ask :)
(In reply to Andrew Overholt [:overholt] from comment #25)

> 3 of us working at Mozilla Toronto experienced this so if setting up remote
> access would help, just ask :)

Ah yes that'd be great! What do I need to do?
Flags: needinfo?(overholt)
(In reply to Andrew Overholt [:overholt] from comment #25)
> (In reply to Daniel Stenberg [:bagder] from comment #24)
> > I still haven't been able to trigger this weird behavior in any machine of
> > mine.
> 
> 3 of us working at Mozilla Toronto experienced this so if setting up remote
> access would help, just ask :)

I've only seen it on Debian "stretch" - it's fine on Windows 8.1. One of the reports on a related bug report also saw it on Debian. I don't have any other Linux systems at the moment.
(In reply to znmeb from comment #27)
> I've only seen it on Debian "stretch" - it's fine on Windows 8.1. One of the
> reports on a related bug report also saw it on Debian. I don't have any
> other Linux systems at the moment.

I notice you mentioned in bug 1193796 comment 26 that the system you say it on had Docker installed, and that Docker "does things to networking". The system where I saw it (Debian testing) also had Docker installed. Perhaps the cause could be related to Docker?
I was not using Docker.
(In reply to Botond Ballo [:botond] from comment #28)
> (In reply to znmeb from comment #27)
> > I've only seen it on Debian "stretch" - it's fine on Windows 8.1. One of the
> > reports on a related bug report also saw it on Debian. I don't have any
> > other Linux systems at the moment.
> 
> I notice you mentioned in bug 1193796 comment 26 that the system you say it
> on had Docker installed, and that Docker "does things to networking". The
> system where I saw it (Debian testing) also had Docker installed. Perhaps
> the cause could be related to Docker?

I can easily remove Docker and see if the problem goes away. Which versions of Firefox still have this problem?
(In reply to Bill McCloskey (:billm) from comment #29)
> I was not using Docker.

Neither was I nor do I have it installed.
(In reply to znmeb from comment #30)

> I can easily remove Docker and see if the problem goes away. Which versions
> of Firefox still have this problem?

No version still has this bug, it was backed out from 42 and Nightly. The patch that I still want to land that brought this problem is in Bug 1191253, it is a one-line change that just makes sure that the helper thread actually runs.

Just to be clear: backing out the patch took away the immediate regression but brought back the bug where Firefox on Linux cannot properly detect and act on network changes and online/offline situations.
(In reply to Daniel Stenberg [:bagder] from comment #24)
> This patch removes the setting of the socket to non-blocking mode to see if
> that makes a difference. It struck me it really isn't necessary to have it
> non-blocking anyway and the afnetlink tool runs in blocking mode.

I applied the patch from bug 1191253, the logging patch from comment 10, and this patch. The problem of not being able to access Google properties continued to occur.

The log is showing the same thing as before: lots of "route update" messages arriving.
Comment on attachment 8650442 [details] [diff] [review]
0001-switch-off-non-blocking-socket-use.patch

See comment 33.
Attachment #8650442 - Flags: feedback?(botond)
Duplicate of this bug: 1192532
(In reply to Daniel Stenberg [:bagder] from comment #26)
> (In reply to Andrew Overholt [:overholt] from comment #25)
> 
> > 3 of us working at Mozilla Toronto experienced this so if setting up remote
> > access would help, just ask :)
> 
> Ah yes that'd be great! What do I need to do?

Botond is likely in a better place to hook you up with this.
Flags: needinfo?(overholt) → needinfo?(botond)
Yeah, I can set you up with an account on a machine that repros this problem.

What would you need in that account? Would an m-c checkout and build that repros the problem be sufficient?
Flags: needinfo?(botond) → needinfo?(daniel)
Flags: needinfo?(daniel)
You need to log in before you can comment on or make changes to this bug.