Closed Bug 1470083 Opened 6 years ago Closed 3 years ago

Save website complete does not rewrite srcset to local file

Categories

(Core :: DOM: Serializers, defect, P3)

61 Branch
defect

Tracking

()

RESOLVED DUPLICATE of bug 1668414

People

(Reporter: nachtigall, Unassigned)

Details

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:62.0) Gecko/20100101 Firefox/62.0 Build ID: 20180621013659 Steps to reproduce: Save a website that contains `srcset` attribute on `<img>` elements (and not only `src`) For example: 1. go to https://www.flyingroasters.de/de/ 2. Ctrl+S to save website completely Actual results: 3. Load this locally saved html file in Firefox 4. Watch the Network tab and see that image files that are referenced in the `srcset` still got downloaded from the remote website flyingroasters.de Looking at the source it seems that the `src` attributes are rewritten/changed by Firefox to point to the local files. But the `srcset` still point the the remote website. Expected results: The `srcset` should be rewritten/changed by Firefox in the saved HTML. Same like already done for `src`
Has STR: --- → yes
Component: Untriaged → Serializers
Product: Firefox → Core
Priority: -- → P3

Hello.

It's been a very (very) long time since I did any form of code contribution to Mozilla, so this is going to take a bit of getting back into!

This seems simple enough but the syntax for the attribute isn't a simple URL but a URL, descriptor pair. Fixing it up would therefore be slightly more involved.

Requesting some information from the module owner, since I cannot find a peer assigned to this module on the wiki.

The only place I've found for parsing of this attribute is in ResponsiveImageSelector::SetCandidatesFromSourceSet. I'm not sure it'd be worth splitting the parsing logic out and reusing it here, especially since we don't care about the logic beyond extracting and fixing up the attribute string. I'm looking for some feedback on that before I go ahead and seeing if I can fix this.

Also, (somewhat related) are there any save document tests?

Flags: needinfo?(htsai)
Flags: needinfo?(htsai) → needinfo?(hsivonen)

nsHTMLContentSerializer::SerializeHTMLAttributes needs to be taught about srcset.

Flags: needinfo?(hsivonen)

Also, (somewhat related) are there any save document tests?(In reply to Henri Sivonen (:hsivonen) from comment #2)

nsHTMLContentSerializer::SerializeHTMLAttributes needs to be taught about srcset.

Thanks for the confirmation, that is what I suspected would need to be done based on what I've done in this area before. Good to know that not everything has changed! That would be simple enough.

I do have two other questions:

  1. There is parsing logic for this in ResponsiveImageSelector::SetCandidatesFromSourceSet. Since we would not need any of the logic here (other than splitting the URL from the descriptor) it seems to make sense to just perform a basic 'parse' within nsHTMLContentSerializer instead of splitting that code out and reusing it. Simpler seems to be the better option here. What is your take on this?
  2. Are there any save as file tests that need to have a test case added for this? There weren't when last I worked on this code, but that was over a decade ago. Tempus fugit.

Thanks for responding.

Flags: needinfo?(hsivonen)
  1. I think code duplication is OK in this case.
  2. See https://searchfox.org/mozilla-central/source/dom/base/test/unit/test_serializers_entities.js for an example of a serializer test.
Flags: needinfo?(hsivonen)
Assignee: nobody → sciguyryan
Status: NEW → ASSIGNED

I have a few questions, if I can pick your brain. Henri indicated that you might know since he's going to be away.

If you don't know but can point me in the direction of anyone that might, that would be great too.!

From what I can see there actually seem to be two issues at play here:

  1. Firefox does not (as far as I can see) download all of the assets for images referenced by srcset. It seems to only download the "selected" image that is currently displayed on the page. If this is intended behaviour then that's fine but then updating the attribute is pointless. I am unsure as to where to amend the code in order to allow it to download all of the assets. I've tried following the chain of execution but didn't succeed in finding where that happens. If anyone knows where this code sits I'd love a pointer. That should probably be fixed in a separate bug anyway as it is related to the issue here, but can be resolved separately I suspect.

  2. The second is that by the time nsHTMLContentSerializer::SerializeHTMLAttributes is called the process of adjusting the URLs seems to have already taken place. In my testcase example the src attribute of "img.png" is already converted into "Testcase_files\img.png" by the time . As with the above I've failed to determine where specifically this occurs.

I need a newer computer, building Firefox takes way too long. 😂

Flags: needinfo?(mbrodesser)

(In reply to Ryan Jones-Ward [:sciguyryan] from comment #5)

I have a few questions, if I can pick your brain. Henri indicated that you might know since he's going to be away.

If you don't know but can point me in the direction of anyone that might, that would be great too.!

From what I can see there actually seem to be two issues at play here:

  1. Firefox does not (as far as I can see) download all of the assets for images referenced by srcset. It seems to only download the "selected" image that is currently displayed on the page. If this is intended behaviour then that's fine but then updating the attribute is pointless.

Chrome doesn't download all the assets. I'm not sure what the majority of users wants in this case. Thinking globally, in areas with restricted bandwidth or data volume, this might be undesirable. I wonder how often users download a website and want to view it on another machine.
Jens: do you have an actual use case for this?

I am unsure as to where to amend the code in order to allow it to download all of the assets. I've tried following the chain of execution but didn't succeed in finding where that happens. If anyone knows where this code sits I'd love a pointer. That should probably be fixed in a separate bug anyway as it is related to the issue here, but can be resolved separately I suspect.

  1. The second is that by the time nsHTMLContentSerializer::SerializeHTMLAttributes is called the process of adjusting the URLs seems to have already taken place. In my testcase example the src attribute of "img.png" is already converted into "Testcase_files\img.png" by the time . As with the above I've failed to determine where specifically this occurs.

The fixup code could be relevant.

I need a newer computer, building Firefox takes way too long. 😂

Flags: needinfo?(mbrodesser) → needinfo?(nachtigall)

Jens: do you have an actual use case for this?

I cannot really remember. 99,99% use case is probably saving a web page (which might go offline / vanish, so you still have it later locally) and then opening it on the same computer, also using the same viewport size. So only downloading the assets that are needed for the current viewport is probably 99% enough.

(otoh, saving bandwidth is not really that important to me. I would consider accuracy higher, that is even if I change the viewport, that is, resize window, flip portrait/landside or attach to a bigger remote monitor - a saved website should still be a local only version. If you decide to "download" a website you taking an action where it is okay to take a little bandwith, and srcset images are not that big too)

Flags: needinfo?(nachtigall)

The bug assignee didn't login in Bugzilla in the last 7 months.
:hsinyi, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: sciguyryan → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(htsai)

I fixed this in bug 1668414.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
Flags: needinfo?(htsai)
You need to log in before you can comment on or make changes to this bug.