Closed Bug 1995618 Opened 5 months ago Closed 1 month ago

Pass the links present on the page as a list in the PageExtractor

Categories

(Core :: Machine Learning: On Device, enhancement, P2)

enhancement

Tracking

()

RESOLVED FIXED
149 Branch
Tracking Status
firefox149 --- fixed

People

(Reporter: gregtatum, Assigned: thasan)

References

(Blocks 1 open bug)

Details

(Whiteboard: [genai])

Attachments

(2 files)

The text content is extracted from the page, but we don't grab the links explicitly. We should do this, maybe as an optional piece of behavior. We'll need to specify the format, but maybe just doing things in markdown would make sense.

Here is an example test:

Added to:
toolkit/components/pageextractor/tests/browser/browser_dom_extractor.js

add_task(async function test_dom_extractor_links() {
  const { actor, cleanup } = await html`
    <article>
      <h1>Example of Links</h1>
      <ul>
        <li>Here is the <a href="./example-1.html">First link</a></li>
        <li>
          Now this is an <a href="https://example.com/link">external link</a>
        </li>
      </ul>
    </article>
  `;

  const { text, links } = await actor.getText();

  is(
    text,
    "Example of Links\n" +
      "Here is the [First link](https://localhost:7372/example-1.html)\n" +
      "Now this is an [external link](https://example.com/link)",
  );
  Assert.deepEqual(
    links,
    ["./example-1.html", "https://example.com/link"]
  );

  return cleanup();
});
Priority: -- → P3
Component: Machine Learning: General → Machine Learning: On Device

The component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit BugBot documentation.

Priority: P3 → --
Whiteboard: [genai]
Priority: -- → P2
Assignee: nobody → thasan
Attachment #9532488 - Attachment description: WIP: Bug 1995618 - Pass page links as list to PageExtractor → Bug 1995618 - Pass page links as list to PageExtractor
Status: NEW → ASSIGNED
Pushed by imoraru@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/efbcc321b8d1 https://hg.mozilla.org/integration/autoland/rev/891741892645 Revert "Bug 1995618 - Pass page links as list to PageExtractor r=ai-ondevice-reviewers,gregtatum" for causing bc failures on browser_dom_extractor.js.

Revert for causing bc failures on browser_dom_extractor.js and browser_get_page_content.js.

Flags: needinfo?(thasan)
Flags: needinfo?(thasan)
Attachment #9537959 - Attachment description: WIP: Bug 1995618 - Update GetPageContent to use new links format → Bug 1995618 - Update GetPageContent to use new links format r=gregtatum
Attachment #9532488 - Attachment description: Bug 1995618 - Pass page links as list to PageExtractor → Bug 1995618 - Pass page links as list to PageExtractor r=gregtatum
Status: ASSIGNED → RESOLVED
Closed: 1 month ago
Resolution: --- → FIXED
Target Milestone: --- → 149 Branch
QA Whiteboard: [qa-triage-done-c150/b149]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: