[Urgent] Our Firefox extension, Amazon Assistant (official Amazon.com extension), is broken for customers after the Version 76 release.
Categories
(WebExtensions :: Compatibility, defect, P1)
Tracking
(relnote-firefox 76+, firefox-esr68 unaffected, firefox75 unaffected, firefox76blocking verified, firefox77+ verified, firefox78+ verified)
Tracking | Status | |
---|---|---|
relnote-firefox | --- | 76+ |
firefox-esr68 | --- | unaffected |
firefox75 | --- | unaffected |
firefox76 | blocking | verified |
firefox77 | + | verified |
firefox78 | + | verified |
People
(Reporter: gaosally, Assigned: zombie)
References
(Regression)
Details
(Keywords: rca-needed, regression)
Attachments
(4 files)
235.60 KB,
image/png
|
Details | |
884 bytes,
application/zip
|
Details | |
47 bytes,
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
pascalc
:
approval-mozilla-release+
|
Details | Review |
63.71 KB,
image/png
|
Details |
User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_14_6) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.129 Safari/537.36
Steps to reproduce:
- Install Amazon Assistant for Firefox on Firefox version 76 (reproduced on Windows and Mac machines).
- To the right of the URL bar, there should be a teal-colored 'a' icon. Click on the icon.
- A panel should open up. The panel will be grayed out and a loading icon appears.
Actual results:
The loading icon continues for a couple of seconds, followed by the message "Sorry about this, I'm having trouble loading". The issue was 100% reproducible on a variety of machines.
We have gotten multiple customer complaints so far after the release today, and given the reproducibility, it seems like it may become a widespread issue as more customers get the latest update.
Our extension makes heavy use of IFrames and IFrame post-messaging for feature components. Nothing in particular looked especially related from the release note summary for version 76, but the issue may be related to some change that can affect IFrame communication. Another possibility is anything that specifically affects browser action APIs, which is how this feature is generated.
Expected results:
The Amazon Assistant Home feed should appear, with deals, customer recommendations, search bar, and other features. See an image of the Home feed attached.
We tried the same operation on Firefox 68 and 75 and the same issue did not occur, so it seems highly likely that it's related to the new version.
A couple clarifications:
- I was trying to update the privacy settings on this issue and it got set to security-sensitive. I do not necessarily know if the issue is security-sensitive, so this is a false alert.
- The "Sorry" message I referenced is part of our extension logic when something goes wrong with loading the Home feed. This is normal fallback behavior, but should happen very rarely. The 100% reproducibility for this error is what's concerning.
Updated•5 years ago
|
Comment 2•5 years ago
|
||
I can confirm the issue with Mozilla/5.0 (X11; Linux x86_64; rv:78.0) Gecko/20100101 Firefox/78.0 ID:20200505213825
Comment 3•5 years ago
|
||
Background updates to 76 are disabled while we look into this.
Comment 4•5 years ago
|
||
i see some breakage starting after the changes from bug 1316748.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 5•5 years ago
•
|
||
Hi Sally, the extension has 1Mb+ of minimized javascript, would it be possible for one of the engineers/developers to help speed up the investigation by finding a minimal STR or root cause?
Or at least provide a non-minimized version that we could use for debugging? (I'm requesting access to see if one was submitted to AMO)
Comment 6•5 years ago
|
||
Hello,
I have performed a regression range on the issue and narrowed it down to https://bugzilla.mozilla.org/show_bug.cgi?id=1316748 (2020-05-06T15:15:48: DEBUG : Found commit message: Bug 1316748 - Move Port messaging off MessageChannel r=mixedpuppy) with this differential revision https://phabricator.services.mozilla.com/D67302.
This is the corresponding pushlog: https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=ee3eb7735df344ffc218933002bd8b0b0a9ae1c1&tochange=bbb78f71a5de9849f6697da61a062f59ec960f97
Also, I have managed to reproduce the issue on the latest Nightly (78.0a1/20200505213825), for comparison purposes.
Updated•5 years ago
|
Comment 7•5 years ago
|
||
Given comment 3, this sounds like a blocker to unthrottling the Fx76 release.
Comment 8•5 years ago
|
||
Shane or zombie could you prioritize this bug?
Updated•5 years ago
|
Comment 9•5 years ago
|
||
Its pretty clearly a P1 while releases are throttled. Rob and zombie are actively investigating...
Assignee | ||
Updated•5 years ago
|
Comment 10•5 years ago
|
||
This bug is tracked by a release manager but with a small severity so change it to major.
For more information, please visit auto_nag documentation.
Comment 11•5 years ago
•
|
||
This bug is caused by runtime.onConnect
unexpectedly triggering in the browser action popup panel. (EDIT: not just browser action popups but any other extension page, such as extension tabs and background pages).
This should not happen, the runtime.onConnect
event should not be triggered when the event is registered in the same location as runtime.connect
.
We will fix this, but meanwhile extensions can try to work around this by ignoring unexpected onConnect
calls.
In the Amazon extension, this could be done by returning early from _panelConnectHandler
when _panelPort
has been set before (by createConnectionToExtension
)
Comment 12•5 years ago
|
||
str |
STR:
- Load the attached extension at
about:debugging
- Click on the extension button.
- Look at the content of the extension panel.
Expected:
- "This should be empty:"
Actual:
- "This should be empty: Unexpected onConnect"
Comment 13•5 years ago
|
||
Sally, I verified that applying the work-around that I suggested in comment 11 unbreaks the Amazon Assistant add-on. Could you submit an update of your add-on with this proposed work-around?
I added the following to the _panelConnectHandler
method, right before the assignment of the received port to (._panelPort =
).
if (this._panelPort) {
// Work-around for https://bugzilla.mozilla.org/show_bug.cgi?id=1635637
console.warn("_panelPort already exists, ignoring port");
return;
}
Reporter | ||
Comment 14•5 years ago
|
||
Huge thanks for the thorough investigation. I will test out the proposed fix now and confirm if the change works.
Assignee | ||
Comment 15•5 years ago
|
||
Assignee | ||
Comment 16•5 years ago
|
||
Assignee | ||
Comment 17•5 years ago
•
|
||
Comment on attachment 9146237 [details]
Bug 1635637 - Add contextId to MessageSender, prevent onConnect in same Context
Beta/Release Uplift Approval Request
- User impact if declined: Amazon Assistant and likely other extensions seeing multiple onConnect events, resulting in broken functionality.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: See comment 0, use the specific version 10.2003.31.11559 from https://addons.mozilla.org/en-US/firefox/addon/amazon-browser-bar/versions/ (and/or using the minimal test case from comment 12).
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Small and well understood change, regression test included, and the rest of messaging api is covered by existing tests.
- String changes made/needed: None
Assignee | ||
Updated•5 years ago
|
Reporter | ||
Comment 18•5 years ago
|
||
Thanks for the prompt fix on your end as well. We have tested out the fix in our codebase and it looks like everything is working correctly after regression testing. We also confirmed against a past version of FF that was previously working (74) that the issue hasn't caused any issues. I am working with the team that typically does these deployments to get this change through our pipelines quickly.
A couple additional questions as we try to release this to customer as quickly as possible:
- Once we upload to the extension store, is there a way we can work with you to expedite the approval process to get the change out to customers faster? After approval, how much longer does it take typically for customers to start picking up the change?
- To better understand your process, when would the bug fix on your end be pushed to customers?
Comment 19•5 years ago
|
||
- Once we upload to the extension store, is there a way we can work with you to expedite the approval process to get the change out to customers faster? After approval, how much longer does it take typically for customers to start picking up the change?
The add-on review team is on standby to take a look as soon as we receive the submission. If you can limit the changes to this fix only, we would greatly appreciate it and could approve the add-on on very short notice.
Comment 20•5 years ago
|
||
Comment 21•5 years ago
|
||
Assignee | ||
Comment 22•5 years ago
|
||
- To better understand your process, when would the bug fix on your end be pushed to customers?
The fix should be in our next Nightly, in the next 12 hours. If everything goes well with this and another (unrelated) fix, dot release should be built tomorrow, and released 12-24h after that.
Reporter | ||
Comment 23•5 years ago
|
||
Got it. We currently have a build in progress to add the change directly on top of the last published version. Once that is done, we will do a final sanity check and we should be able to upload within the next hour.
Updated•5 years ago
|
Comment 24•5 years ago
|
||
bugherder |
Reporter | ||
Comment 25•5 years ago
|
||
Quick update on our end - rebuild process has taken longer than expected. We are aiming for 1 hr from now.
Reporter | ||
Comment 26•5 years ago
|
||
The extension changes have been rebuilt and reviewed. We should be uploading to the store very shortly through our primary account.
Comment 27•5 years ago
|
||
Thanks Sally! It's currently the middle of the night for our review team; they'll take a look tomorrow morning CET.
Reporter | ||
Comment 28•5 years ago
|
||
We just saw a message saying "Mutiple add-ons violating our policies have been submitted from your location. The IP address has been blocked" before submitting. Would you be able to unblock us?
Comment 29•5 years ago
|
||
Can you use a VPN to submit? Looks like you need to submit from a non-AWS IP.
Comment 30•5 years ago
|
||
You'll need to try uploading your XPI from a different IP address. There are some IP addresses that have been blocked for spamming (I think related to AWS) and the workaround is to upload from a different location.
If that doesn't work, you can attach the XPI for the new version in this bug and one of the reviewers will upload it for you and review it. Given the time of day, this will probably happen early tomorrow EU time.
Reporter | ||
Comment 31•5 years ago
|
||
We just re-uploaded now from a diffferent IP and it looks like it was successful. Really sorry to keep you up, we were not aware about the separate time zones, but is there anyway we can bypass the normal review process? It's important to us to get this change in as soon as possible given the widespread customer impact. We've seen ~ 20k Home open failures today already and waiting another 12 hours will affect many more customers.
Comment 32•5 years ago
|
||
Hi Sally, unfortunately we can't bypass the review process. The review team will take a look at the update as soon as they are able to do so.
Comment 33•5 years ago
|
||
Hi Sally, quick update -- one of our reviewers was still awake and able to approve the update. Your users should start getting the updated version soon.
Reporter | ||
Comment 34•5 years ago
|
||
Hi Caitlin, that's awesome news that you were reviewer for us despite the late timeframe! Thank you guys for doing a great job on identifying the issue, being responsive, and working with us to solve the problem. Will check back here again to let you know once we've started to see some metrics shifts, but I think we are all set for now. Hope you get some sleep!
Comment 35•5 years ago
|
||
Hello,
Verified the fix using the latest Nightly (78.0a1/20200506215114) under Windows 10 Pro 64-bit and macOS Catalina 10.15.
The Amazon Assistant Home feed is now properly displayed in the add-on panel, confirming the fix.
For further details, please see the attached screenshot.
Comment 36•5 years ago
|
||
Comment 37•5 years ago
|
||
Comment on attachment 9146237 [details]
Bug 1635637 - Add contextId to MessageSender, prevent onConnect in same Context
Approved for 77 beta 3, thanks.
![]() |
||
Comment 38•5 years ago
|
||
bugherder uplift |
Comment 39•5 years ago
|
||
Comment on attachment 9146237 [details]
Bug 1635637 - Add contextId to MessageSender, prevent onConnect in same Context
Approved for 76.0.1
Comment 40•5 years ago
|
||
Hello,
Please note that the results from Comment 35 were obtained using the latest version (10.2005.6.12051) of the Amazon Assistant add-on.
I’ve re-checked the fix using the penultimate version (10.2003.31.11559) of the add-on, as well, and the issue is resolved, the bug no longer occurring.
![]() |
||
Comment 41•5 years ago
|
||
bugherder uplift |
Reporter | ||
Comment 42•5 years ago
|
||
Great to hear that Alex. One additional question - given the success of the fix on your end in the Nightly, how long would you typically expect it to take for close to 100% of Firefox users to get the update? Now that the quick fix in our extension is out, we are working through whether we also want to add this fix into our next normal release (expected to be submitted sometime next week) as a precaution, and I believe that mostly depends on how quickly your change will make it to the majority of users.
Wanted to also add an update that we are starting to see a gradual increase in our success rates on Firefox, indicating the fix is working and propagating to our customers. During the low point yesterday for 5-6 hours we saw a success rate of only 30% (possibly due to some customers still being on the older version of FF) but now we are averaging around 50% for the past few hours. It looks like typically the addon updates go out on a daily cycle so hopefully in the next couple of hours we'll see a continual increase.
Comment 43•5 years ago
|
||
Our plan at the moment is to begin gradually rolling out a 76.0.1 point release with this fix included starting tomorrow with an aim to go to 100% rollout by Monday. Note that we are not currently automatically updating users to 76.0 and will skip right to 76.0.1 once automatic updates are re-enabled.
Reporter | ||
Comment 44•5 years ago
|
||
Got it. Is there typically also a propagation delay between the 100% rollout to customers vs. 100% of customers having the updated version? Ie, when would it be expected that close to 0% of customers have version 76.0 because they are all on 76.0.1+? I think if there is a good chance of a reasonable number of customers being on the tail end of not getting the updates by end of next week, we'll opt for the safer option of keeping the fix in.
Comment 45•5 years ago
•
|
||
I think it would be wise to assume there will still be a non-zero size population of users on 76.0 who haven't updated to 76.0.1 by the end of next week.
Reporter | ||
Comment 46•5 years ago
|
||
Makes sense, that was my assumption as well. We'll keep the change in for our next update, thanks for the info!
Comment 47•5 years ago
|
||
Added to the Firefox 76.0.1 release notes:
Fixed a bug causing some add-ons such as Amazon Assistant to see multiple onConnect events, impairing functionality
Feel free to to suggest any changes to the wording.
Comment 48•5 years ago
|
||
Hello,
Verified the fix on the latest Beta (77.0b3/20200507233245) under Windows 10 Pro 64-bit and macOS Catalina 10.15 using the penultimate version (10.2003.31.11559) of the add-on.
The Amazon Assistant Home feed is now properly displayed in the add-on panel, confirming the fix.
Comment 49•5 years ago
|
||
Hi again,
Also verified the fix on the latest Release (76.0.1/20200507114007) under Windows 10 Pro 64-bit and macOS Catalina 10.15 using the penultimate version (10.2003.31.11559) of the add-on.
The Amazon Assistant Home feed is now properly displayed in the add-on panel, confirming the fix.
Reporter | ||
Comment 50•5 years ago
|
||
We have verified that success rates for Home render metrics have been normal for 1 full day now. We've built the workaround into our next release as well.
Thanks again to your team for the quick turnaround on debugging the issue, responsiveness on updates, and working with us to approve the fix as quickly as possible. From our end, we should be good to go unless anything new comes up.
Updated•5 years ago
|
Description
•