Closed Bug 1995618 Opened 5 months ago Closed 1 month ago

Pass the links present on the page as a list in the PageExtractor

Tracking

()

Status:

RESOLVED FIXED

Milestone:

149 Branch

Tracking Flags:

Tracking

Status

firefox149

---

fixed

People

(Reporter: gregtatum, Assigned: thasan)

References

(Blocks 1 open bug)

Details

(Whiteboard: [genai])

Attachments

(2 files)

Bug 1995618 - Pass page links as list to PageExtractor r=gregtatum 3 months ago Taimur 48 bytes, text/x-phabricator-request		Details \| Review
Bug 1995618 - Update GetPageContent to use new links format r=gregtatum 2 months ago Taimur 48 bytes, text/x-phabricator-request		Details \| Review

Greg Tatum [:gregtatum]

Reporter

Description

•

5 months ago

The text content is extracted from the page, but we don't grab the links explicitly. We should do this, maybe as an optional piece of behavior. We'll need to specify the format, but maybe just doing things in markdown would make sense.

Here is an example test:

Added to:
toolkit/components/pageextractor/tests/browser/browser_dom_extractor.js

add_task(async function test_dom_extractor_links() {
  const { actor, cleanup } = await html`
    <article>
      <h1>Example of Links</h1>
      <ul>
        <li>Here is the <a href="./example-1.html">First link</a></li>
        <li>
          Now this is an <a href="https://example.com/link">external link</a>
        </li>
      </ul>
    </article>
  `;

  const { text, links } = await actor.getText();

  is(
    text,
    "Example of Links\n" +
      "Here is the [First link](https://localhost:7372/example-1.html)\n" +
      "Now this is an [external link](https://example.com/link)",
  );
  Assert.deepEqual(
    links,
    ["./example-1.html", "https://example.com/link"]
  );

  return cleanup();
});

Greg Tatum [:gregtatum]

Reporter

Updated

•

5 months ago

Priority: -- → P3

Ed Lee :Mardak

Updated

•

4 months ago

Component: Machine Learning: General → Machine Learning: On Device

BugBot [:suhaib / :marco/ :calixte]

Comment 1

•

4 months ago

The component has been changed since the backlog priority was decided, so we're resetting it.
For more information, please visit BugBot documentation.

Priority: P3 → --

jgauf

Updated

•

3 months ago

Whiteboard: [genai]

Jira Integration Bot

Updated

•

3 months ago

See Also: → https://mozilla-hub.atlassian.net/browse/GENAI-2524

Greg Tatum [:gregtatum]

Reporter

Updated

•

3 months ago

Priority: -- → P2

Taimur

Assignee

Comment 2

•

3 months ago

Attached file Bug 1995618 - Pass page links as list to PageExtractor r=gregtatum — Details

Phabricator Automation

Updated

•

3 months ago

Assignee: nobody → thasan

Attachment #9532488 - Attachment description: WIP: Bug 1995618 - Pass page links as list to PageExtractor → Bug 1995618 - Pass page links as list to PageExtractor

Status: NEW → ASSIGNED

Pulsebot

Comment 3

•

2 months ago

Pushed by gtatum@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/5433b8bb8f2d https://hg.mozilla.org/integration/autoland/rev/f145e47d11fe Pass page links as list to PageExtractor r=ai-ondevice-reviewers,gregtatum

Pulsebot

Comment 4

•

2 months ago

Pushed by imoraru@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/efbcc321b8d1 https://hg.mozilla.org/integration/autoland/rev/891741892645 Revert "Bug 1995618 - Pass page links as list to PageExtractor r=ai-ondevice-reviewers,gregtatum" for causing bc failures on browser_dom_extractor.js.

Iulian Moraru

Comment 5

•

2 months ago

Revert for causing bc failures on browser_dom_extractor.js and browser_get_page_content.js.

Flags: needinfo?(thasan)

Taimur

Assignee

Comment 6

•

2 months ago

Attached file Bug 1995618 - Update GetPageContent to use new links format r=gregtatum — Details

Taimur

Assignee

Updated

•

2 months ago

Flags: needinfo?(thasan)

Phabricator Automation

Updated

•

1 month ago

Attachment #9537959 - Attachment description: WIP: Bug 1995618 - Update GetPageContent to use new links format → Bug 1995618 - Update GetPageContent to use new links format r=gregtatum

Phabricator Automation

Updated

•

1 month ago

Attachment #9532488 - Attachment description: Bug 1995618 - Pass page links as list to PageExtractor → Bug 1995618 - Pass page links as list to PageExtractor r=gregtatum

Pulsebot

Comment 7

•

1 month ago

Pushed by thasan@mozilla.com: https://github.com/mozilla-firefox/firefox/commit/6686036b198b https://hg.mozilla.org/integration/autoland/rev/0a795970d1b3 Pass page links as list to PageExtractor r=ai-ondevice-reviewers,gregtatum https://github.com/mozilla-firefox/firefox/commit/98c32ffaf198 https://hg.mozilla.org/integration/autoland/rev/e4d57db08628 Update GetPageContent to use new links format r=gregtatum,ai-ondevice-reviewers

Atila Butkovits

Comment 8

•

1 month ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/0a795970d1b3
https://hg.mozilla.org/mozilla-central/rev/e4d57db08628

Status: ASSIGNED → RESOLVED

Closed: 1 month ago

status-firefox149: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 149 Branch

Camelia Badau [:cbadau], Desktop Test Engineering

Updated

•

24 days ago

QA Whiteboard: [qa-triage-done-c150/b149]

You need to log in before you can comment on or make changes to this bug.