Closed Bug 1311399 Opened 8 years ago Closed 7 years ago

Fetch and save images associated with page metadata to Swift database

Categories

(Firefox for iOS :: Data Storage, defect, P2)

Other
iOS
defect

Tracking

()

RESOLVED FIXED
Iteration:
1.15

People

(Reporter: sleroux, Assigned: fluffyemily)

References

(Blocks 1 open bug)

Details

(Whiteboard: [MobileAS])

Attachments

(1 file)

55 bytes, text/x-github-pull-request
farhan
: review+
Details | Review
A large part of the metadata we grab from a page is the various images we want to extract. Ideally we should try to extract the image contents that the webpage already downloaded and pass that through to the native side to avoid double-downloading images. Last thing we want is to essentially double the bandwidth usage of users when browsing.

Once we are able to extract the image, we'll need to store the image and image metadata to disk alongside the textual metadata which can be queried.
Moving to new, actual user story for metadata related work.
Blocks: 1314387
No longer blocks: 1311081
If I am going to pick this up, who is the best person to talk to about what the current metadata thinking is (the whole metadata thing happened while I was on NMX and so I've missed all conversation about what this is)
Flags: needinfo?(sleroux)
Flags: needinfo?(fpatel)
I wrote the current metadata stuff for textual content so I'd be able to help with that. Farhan has been doing some work on the JS parser side of things so he would be able to help there as well.
Flags: needinfo?(sleroux)
The future desktop model is for all images to be stored in a separate ATTACHed database. I think that pattern would be wise to follow here.
So my PR[1] just gets fathom working and storing the metadata into the table that Steph created. We just store the URLs to the images not the actual images themselves. 

The only way to get data from the WKWebView is from the JSON message passing. Which would mean we'd have to base64 encode large images. Because we dont actually load every single image we get from the metadata table only the ones we need. I think its okay to just redownload the few images we need. 

I don't know exactly what the user story with Sync is Emily but that'll probably need to be figured out. Right now I think its okay to just start using the data to get the highlights/provider name setup the way we want. If we need to wipe the table and switch to a separate db it wont matter cause its in beta for now anyways. 
[1]https://github.com/mozilla-mobile/firefox-ios/pull/2341
Flags: needinfo?(fpatel)
:farhan

> Because we dont actually load every single image we get from the metadata table only
> the ones we need. I think its okay to just redownload the few images we
> need. 

How do we know which ones we need? What are we using the metadata for? What is the purpose of storing them? 

> 
> I don't know exactly what the user story with Sync is Emily but that'll
> probably need to be figured out. 

How does this story tie up with Sync at all? Is the expectation that we will eventually be syncing metadata? 

> Right now I think its okay to just start using the data to get the highlights/provider name setup the way we want. 

How do we want the highlights provider name setup? What is our goal here?

I have _no_ background at all for the metadata stories. I would like to understand what we are trying to achieve here and why.
Flags: needinfo?(fpatel)
:sleroux just jumped on a call and gave me my required context.

* We use fathom injected as JS into the WKWebView to extract metadata based on rules defined by Desktop
* We take the results of the rules and store them in a SQLite table in the main DB called page_metadata and managed by SQLiteMetadata.swift
* We don't currently store images associated with the metadata. It is slightly uncertain how best to extract image data from within the webpage but this probably involves some Base64 encoding and passing of large strings over the JS bridge
* Images should probably be stored as part of the current image caching strategy. Look at Favicons for example. URL's to the images location in cache should be stored in the DB
* We might want to create a separate table mapping image URL's with their associated page metadata
* An upcoming bug will involve retrieving the image data and displaying it as part of Highlights
* We want to think about storing all metadata in a new DB linked to existing DB using ATTACH SQL syntax rather than just a separate table in order to match desktop way of storing and make it easier for potential upcoming SYNCing of metadata. 
* Don't worry about the Sync bit yet - this is not on the immediate horizon
Flags: needinfo?(fpatel)
(In reply to Farhan Patel from comment #5)
> We just store the URLs to the images not the actual
> images themselves. 

Bear in mind that some sites use data: URIs for images, so these URLs might be quite large, and might not actually lead anywhere.
Assignee: nobody → etoop
Status: NEW → ASSIGNED
Iteration: --- → 1.15
Attached file Pull request
Attachment #8837638 - Flags: review?(fpatel)
Attachment #8837638 - Flags: review?(fpatel) → review+
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: