Closed Bug 1635637 Opened 4 years ago Closed 4 years ago

[Urgent] Our Firefox extension, Amazon Assistant (official Amazon.com extension), is broken for customers after the Version 76 release.

Categories

(WebExtensions :: Compatibility, defect, P1)

76 Branch
defect

Tracking

(relnote-firefox 76+, firefox-esr68 unaffected, firefox75 unaffected, firefox76blocking verified, firefox77+ verified, firefox78+ verified)

VERIFIED FIXED
mozilla78
Tracking Status
relnote-firefox --- 76+
firefox-esr68 --- unaffected
firefox75 --- unaffected
firefox76 blocking verified
firefox77 + verified
firefox78 + verified

People

(Reporter: gaosally, Assigned: zombie)

References

(Regression)

Details

(Keywords: rca-needed, regression)

Attachments

(4 files)

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36

Steps to reproduce:

  1. Install Amazon Assistant for Firefox on Firefox version 76 (reproduced on Windows and Mac machines).
  2. To the right of the URL bar, there should be a teal-colored 'a' icon. Click on the icon.
  3. A panel should open up. The panel will be grayed out and a loading icon appears.

Actual results:

The loading icon continues for a couple of seconds, followed by the message "Sorry about this, I'm having trouble loading". The issue was 100% reproducible on a variety of machines.
We have gotten multiple customer complaints so far after the release today, and given the reproducibility, it seems like it may become a widespread issue as more customers get the latest update.
Our extension makes heavy use of IFrames and IFrame post-messaging for feature components. Nothing in particular looked especially related from the release note summary for version 76, but the issue may be related to some change that can affect IFrame communication. Another possibility is anything that specifically affects browser action APIs, which is how this feature is generated.

Expected results:

The Amazon Assistant Home feed should appear, with deals, customer recommendations, search bar, and other features. See an image of the Home feed attached.
We tried the same operation on Firefox 68 and 75 and the same issue did not occur, so it seems highly likely that it's related to the new version.

Group: firefox-core-security

A couple clarifications:

  1. I was trying to update the privacy settings on this issue and it got set to security-sensitive. I do not necessarily know if the issue is security-sensitive, so this is a false alert.
  2. The "Sorry" message I referenced is part of our extension logic when something goes wrong with loading the Home feed. This is normal fallback behavior, but should happen very rarely. The 100% reproducibility for this error is what's concerning.
Group: firefox-core-security
Component: Untriaged → Compatibility
Product: Firefox → WebExtensions

I can confirm the issue with Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0 ID:20200505213825

Status: UNCONFIRMED → NEW
Ever confirmed: true

Background updates to 76 are disabled while we look into this.

i see some breakage starting after the changes from bug 1316748.

Regressed by: 1316748
Has Regression Range: --- → yes
Flags: needinfo?(tomica)

Hi Sally, the extension has 1Mb+ of minimized javascript, would it be possible for one of the engineers/developers to help speed up the investigation by finding a minimal STR or root cause?

Or at least provide a non-minimized version that we could use for debugging? (I'm requesting access to see if one was submitted to AMO)

Flags: needinfo?(tomica) → needinfo?(gaosally)

Hello,

I have performed a regression range on the issue and narrowed it down to https://bugzilla.mozilla.org/show_bug.cgi?id=1316748 (2020-05-06T15:15:48: DEBUG : Found commit message: Bug 1316748 - Move Port messaging off MessageChannel r=mixedpuppy) with this differential revision https://phabricator.services.mozilla.com/D67302.

This is the corresponding pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ee3eb7735df344ffc218933002bd8b0b0a9ae1c1&tochange=bbb78f71a5de9849f6697da61a062f59ec960f97

Also, I have managed to reproduce the issue on the latest Nightly (78.0a1/20200505213825), for comparison purposes.

Given comment 3, this sounds like a blocker to unthrottling the Fx76 release.

Shane or zombie could you prioritize this bug?

Flags: needinfo?(mixedpuppy)

Its pretty clearly a P1 while releases are throttled. Rob and zombie are actively investigating...

Flags: needinfo?(mixedpuppy)
Priority: -- → P1
Assignee: nobody → tomica

This bug is tracked by a release manager but with a small severity so change it to major.
For more information, please visit auto_nag documentation.

Severity: normal → major

This bug is caused by runtime.onConnect unexpectedly triggering in the browser action popup panel. (EDIT: not just browser action popups but any other extension page, such as extension tabs and background pages).
This should not happen, the runtime.onConnect event should not be triggered when the event is registered in the same location as runtime.connect.

We will fix this, but meanwhile extensions can try to work around this by ignoring unexpected onConnect calls.
In the Amazon extension, this could be done by returning early from _panelConnectHandler when _panelPort has been set before (by createConnectionToExtension)

STR:

  1. Load the attached extension at about:debugging
  2. Click on the extension button.
  3. Look at the content of the extension panel.

Expected:

  • "This should be empty:"

Actual:

  • "This should be empty: Unexpected onConnect"
Flags: needinfo?(gaosally)

Sally, I verified that applying the work-around that I suggested in comment 11 unbreaks the Amazon Assistant add-on. Could you submit an update of your add-on with this proposed work-around?

I added the following to the _panelConnectHandler method, right before the assignment of the received port to (._panelPort = ).

              if (this._panelPort) {
                  // Work-around for https://bugzilla.mozilla.org/show_bug.cgi?id=1635637
                  console.warn("_panelPort already exists, ignoring port");
                  return;
              }    
Flags: needinfo?(gaosally)

Huge thanks for the thorough investigation. I will test out the proposed fix now and confirm if the change works.

Flags: needinfo?(gaosally)

Comment on attachment 9146237 [details]
Bug 1635637 - Add contextId to MessageSender, prevent onConnect in same Context

Beta/Release Uplift Approval Request

  • User impact if declined: Amazon Assistant and likely other extensions seeing multiple onConnect events, resulting in broken functionality.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: See comment 0, use the specific version 10.2003.31.11559 from https://addons.mozilla.org/en-US/firefox/addon/amazon-browser-bar/versions/ (and/or using the minimal test case from comment 12).
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Small and well understood change, regression test included, and the rest of messaging api is covered by existing tests.
  • String changes made/needed: None
Attachment #9146237 - Flags: approval-mozilla-release?
Attachment #9146237 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Thanks for the prompt fix on your end as well. We have tested out the fix in our codebase and it looks like everything is working correctly after regression testing. We also confirmed against a past version of FF that was previously working (74) that the issue hasn't caused any issues. I am working with the team that typically does these deployments to get this change through our pipelines quickly.

A couple additional questions as we try to release this to customer as quickly as possible:

  1. Once we upload to the extension store, is there a way we can work with you to expedite the approval process to get the change out to customers faster? After approval, how much longer does it take typically for customers to start picking up the change?
  2. To better understand your process, when would the bug fix on your end be pushed to customers?
  1. Once we upload to the extension store, is there a way we can work with you to expedite the approval process to get the change out to customers faster? After approval, how much longer does it take typically for customers to start picking up the change?

The add-on review team is on standby to take a look as soon as we receive the submission. If you can limit the changes to this fix only, we would greatly appreciate it and could approve the add-on on very short notice.

Pushed by tjovanovic@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5a06f8fa888e
Add contextId to MessageSender, prevent onConnect in same Context r=robwu
Pushed by btara@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/ec6c8f37b1fa
Add contextId to MessageSender, prevent onConnect in same Context r=robwu a=RyanVM
  1. To better understand your process, when would the bug fix on your end be pushed to customers?

The fix should be in our next Nightly, in the next 12 hours. If everything goes well with this and another (unrelated) fix, dot release should be built tomorrow, and released 12-24h after that.

Got it. We currently have a build in progress to add the change directly on top of the last published version. Once that is done, we will do a final sanity check and we should be able to upload within the next hour.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla78

Quick update on our end - rebuild process has taken longer than expected. We are aiming for 1 hr from now.

The extension changes have been rebuilt and reviewed. We should be uploading to the store very shortly through our primary account.

Thanks Sally! It's currently the middle of the night for our review team; they'll take a look tomorrow morning CET.

We just saw a message saying "Mutiple add-ons violating our policies have been submitted from your location. The IP address has been blocked" before submitting. Would you be able to unblock us?

Can you use a VPN to submit? Looks like you need to submit from a non-AWS IP.

You'll need to try uploading your XPI from a different IP address. There are some IP addresses that have been blocked for spamming (I think related to AWS) and the workaround is to upload from a different location.

If that doesn't work, you can attach the XPI for the new version in this bug and one of the reviewers will upload it for you and review it. Given the time of day, this will probably happen early tomorrow EU time.

We just re-uploaded now from a diffferent IP and it looks like it was successful. Really sorry to keep you up, we were not aware about the separate time zones, but is there anyway we can bypass the normal review process? It's important to us to get this change in as soon as possible given the widespread customer impact. We've seen ~ 20k Home open failures today already and waiting another 12 hours will affect many more customers.

Hi Sally, unfortunately we can't bypass the review process. The review team will take a look at the update as soon as they are able to do so.

Hi Sally, quick update -- one of our reviewers was still awake and able to approve the update. Your users should start getting the updated version soon.

Hi Caitlin, that's awesome news that you were reviewer for us despite the late timeframe! Thank you guys for doing a great job on identifying the issue, being responsive, and working with us to solve the problem. Will check back here again to let you know once we've started to see some metrics shifts, but I think we are all set for now. Hope you get some sleep!

Hello,

Verified the fix using the latest Nightly (78.0a1/20200506215114) under Windows 10 Pro 64-bit and macOS Catalina 10.15.

The Amazon Assistant Home feed is now properly displayed in the add-on panel, confirming the fix.
For further details, please see the attached screenshot.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Comment on attachment 9146237 [details]
Bug 1635637 - Add contextId to MessageSender, prevent onConnect in same Context

Approved for 77 beta 3, thanks.

Attachment #9146237 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Comment on attachment 9146237 [details]
Bug 1635637 - Add contextId to MessageSender, prevent onConnect in same Context

Approved for 76.0.1

Attachment #9146237 - Flags: approval-mozilla-release? → approval-mozilla-release+

Hello,

Please note that the results from Comment 35 were obtained using the latest version (10.2005.6.12051) of the Amazon Assistant add-on.

I’ve re-checked the fix using the penultimate version (10.2003.31.11559) of the add-on, as well, and the issue is resolved, the bug no longer occurring.

Great to hear that Alex. One additional question - given the success of the fix on your end in the Nightly, how long would you typically expect it to take for close to 100% of Firefox users to get the update? Now that the quick fix in our extension is out, we are working through whether we also want to add this fix into our next normal release (expected to be submitted sometime next week) as a precaution, and I believe that mostly depends on how quickly your change will make it to the majority of users.

Wanted to also add an update that we are starting to see a gradual increase in our success rates on Firefox, indicating the fix is working and propagating to our customers. During the low point yesterday for 5-6 hours we saw a success rate of only 30% (possibly due to some customers still being on the older version of FF) but now we are averaging around 50% for the past few hours. It looks like typically the addon updates go out on a daily cycle so hopefully in the next couple of hours we'll see a continual increase.

Our plan at the moment is to begin gradually rolling out a 76.0.1 point release with this fix included starting tomorrow with an aim to go to 100% rollout by Monday. Note that we are not currently automatically updating users to 76.0 and will skip right to 76.0.1 once automatic updates are re-enabled.

Got it. Is there typically also a propagation delay between the 100% rollout to customers vs. 100% of customers having the updated version? Ie, when would it be expected that close to 0% of customers have version 76.0 because they are all on 76.0.1+? I think if there is a good chance of a reasonable number of customers being on the tail end of not getting the updates by end of next week, we'll opt for the safer option of keeping the fix in.

I think it would be wise to assume there will still be a non-zero size population of users on 76.0 who haven't updated to 76.0.1 by the end of next week.

Makes sense, that was my assumption as well. We'll keep the change in for our next update, thanks for the info!

Added to the Firefox 76.0.1 release notes:

Fixed a bug causing some add-ons such as Amazon Assistant to see multiple onConnect events, impairing functionality

Feel free to to suggest any changes to the wording.

Hello,

Verified the fix on the latest Beta (77.0b3/20200507233245) under Windows 10 Pro 64-bit and macOS Catalina 10.15 using the penultimate version (10.2003.31.11559) of the add-on.

The Amazon Assistant Home feed is now properly displayed in the add-on panel, confirming the fix.

Hi again,

Also verified the fix on the latest Release (76.0.1/20200507114007) under Windows 10 Pro 64-bit and macOS Catalina 10.15 using the penultimate version (10.2003.31.11559) of the add-on.

The Amazon Assistant Home feed is now properly displayed in the add-on panel, confirming the fix.

We have verified that success rates for Home render metrics have been normal for 1 full day now. We've built the workaround into our next release as well.

Thanks again to your team for the quick turnaround on debugging the issue, responsiveness on updates, and working with us to approve the fix as quickly as possible. From our end, we should be good to go unless anything new comes up.

Tom to handle rca

Flags: needinfo?(tomica)

removing old ni?

Flags: needinfo?(tomica)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: