Closed Bug 374648 Opened 18 years ago Closed 16 years ago

Provide access to page source via AppleScript

Categories

(Camino Graveyard :: OS Integration, enhancement)

All
macOS
enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: alqahira, Assigned: chris)

References

Details

(Whiteboard: p-safari)

Attachments

(3 files, 2 obsolete files)

Safari does this, and it seems like a basic thing a browser should provide.

tell application "Safari"
	set my_html to source of document 1
end tell

We'd have to either fix bug 314061 or change the syntax, of course.
I can't imagine a convincing argument for not doing this, confirming.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Blocks: 380580
Blocks: 156078
No longer blocks: 380580
Oops, forgot this was already filed.  Here's what I wrote about the issue in a duplicate bug:

It would be great to be able to get the source of a tab via AppleScript.  I
wanted to make this part of my SoC project, but it looks like it's gong to be
too involved and beyond the scope of the project.

I can't find a simple BrowserWrapper call to get the browser's source.  I
haven't looked into the actual Gecko calls, but I get the feeling there's
nothing useful there either.  It looks like the way we view source is by
opening "view-source:<url>" and hopefully hitting the cache.  That's ugly
enough for viewing source; it's uglier for AS, since it means replicating that
"request" and diverting the output to a string we can pass back to the script.
In bug 196704 comment 8, smfr indicates that the mechanism our Services impl uses can get either text or HTML from a page (and either selection or the entire document); any chance we could call that on the whole page and hook that up to AppleScript to get text/source of page?

[3:01pm] peeja: ooh, interesting
[3:02pm] peeja: I definitely don' have time to look now, but that's a good thought
[3:02pm] sauron: sometimes random comments in old bugs produce serendipitous results
[3:02pm] sauron: i'll comment in the relevant bugs, then
[3:02pm] sauron: and keep it in my head to poke you about it periodically ;)
[3:03pm] peeja: cmd_getContents.  well there you go.
[3:05pm] sauron: a way around the black box of view-source
Attached patch Access to text and source, v1.0 (obsolete) — Splinter Review
Adds AppleScript access to both the text and source of both the page and selection. Covers Bug 394582 as well. Adds the |pageTextForSelection: inFormat:| method in CHBrowserView, the name of which I'm not 100% happy with. We want to be able to get the (plaintext or html) of the (page or selection). The AppleScript property descriptions aren't the best, either.
Assignee: nobody → trendyhendy2000
Status: NEW → ASSIGNED
Attachment #367712 - Flags: review?
Attached patch v1.1 (obsolete) — Splinter Review
Chris Lawson suggested making the MIME types constants, so we don't have to remember what exactly to write for the format.
Attachment #367712 - Attachment is obsolete: true
Attachment #367731 - Flags: review?
Attachment #367712 - Flags: review?
I wouldn't call it enough to count as an r+, but I took a look at the .sdef.  That much looks good.  (I can't build source, though, so I haven't tested it.)

hendy: You rock for taking this on.
Comment on attachment 367731 [details] [diff] [review]
v1.1

>+			<property name="source" code="pSrc" description="The tab&apos;s current HTML source" type="text" access="r">
>+				<cocoa key="pageSource"/>
>+			</property>
>+			<property name="text" code="pTxt" description="The tab&apos;s current text" type="text" access="r">
>+				<cocoa key="pageText"/>
>+			</property>

Is there any reason we're using "current text"/"current source" here?  It's not like we can give people the past or future {source|text} of the tab.

By comparison, Safari uses

"The HTML source of the web page currently loaded in the tab." and "The text of the web page currently loaded in the tab. [...]"

which sounds better to my ear, and less likely to imply we can give past/future source for the page.

OmniWeb doesn't have this feature at all!  Can't believe that.

(I'd maybe even suggest we take it out of the description for URL, too, though Safari does have it--or maybe switch to a more OW style, "the URL [of the page] currently being displayed in this tab".)

>+			<property name="selected source" code="pSSr" description="The currently selected text&apos;s HTML source" type="text" access="r">
>+				<cocoa key="selectionSource"/>
>+			</property>

I think maybe I'd prefer "The HTML source of the currently selected text" to parallel the description of the "selected text" property that follows.

>+			<property name="selected text" code="pSTx" description="The currently selected text" type="text" access="r">
>+				<cocoa key="selectionText"/>
>+			</property>

I think cl agreed to take a swing at the actual code, so tossing him in the box.
Attachment #367731 - Flags: review? → review?(cl-bugs-new)
Attached patch Fix v1.2Splinter Review
Tweaked the SDEF descriptions a bit. Still needs a code review.
Attachment #367731 - Attachment is obsolete: true
Attachment #368839 - Flags: review?(cl-bugs-new)
Attachment #367731 - Flags: review?(cl-bugs-new)
Attached file basic AppleScript test
Run this in Script Editor to exercise the new functions added in this patch.
Comment on attachment 368839 [details] [diff] [review]
Fix v1.2

>Index: camino/src/embedding/CHBrowserView.h
>===================================================================

>+// access to page text
>+- (NSString*)pageTextForSelection:(BOOL)selection inFormat:(const char*)format;

I know the other methods here aren't exactly commented in great detail either, but it'd be nice to at least tell people that inFormat is a MIME type specifier.

>Index: camino/src/embedding/CHBrowserView.mm
>===================================================================

> //
> // Returns the currently selected text as a NSString. 

Not your fault, but might as well make this correct and say "an NSString" since we're changing things here anyway.

>+// - pageTextForSelection: inFormat:
>+//
>+// Returns the text of the the currently selected text (if selection is YES), or

"the text of the currently selected text" sounds weird :-p How about "the text of the current selection" instead?

>+// the whole page (if selection is NO). Choose "text/plain" as the format to get
>+// the text as it appears in the browser, or "text/html" to get the source code.

To be clear, this is "the text as it appears in the browser without any formatting attributes", lest anyone think this could be used to get formatted (i.e., fonts/styles) text.

r=me with those nits addressed in some manner. The patch works great with the test in comment 10. In retrospect, I'm not sure why this was so difficult in the first place.
Attachment #368839 - Flags: superreview?(mikepinkerton)
Attachment #368839 - Flags: review?(cl-bugs-new)
Attachment #368839 - Flags: review+
Hardware: PowerPC → All
Attached patch Fixe v1.3Splinter Review
Patch with the updated comments. Code is unchanged.
Comment on attachment 368839 [details] [diff] [review]
Fix v1.2

sr=pink
Attachment #368839 - Flags: superreview?(mikepinkerton) → superreview+
Comment on attachment 369225 [details] [diff] [review]
Fixe v1.3

>Index: camino/src/embedding/CHBrowserView.mm
>===================================================================
>RCS file: /cvsroot/mozilla/camino/src/embedding/CHBrowserView.mm,v
>retrieving revision 1.116
>diff -u -8 -r1.116 CHBrowserView.mm
>--- camino/src/embedding/CHBrowserView.mm	11 Mar 2009 04:01:58 -0000	1.116
>+++ camino/src/embedding/CHBrowserView.mm	25 Mar 2009 03:09:45 -0000

[...]

>+//
>+// - pageTextForSelection: inFormat:
>+//
>+// Returns the text of the the current selection (if selection is YES), or

"Returns the text of the current selection"

Need to get rid of this double-the on checkin; someone please check to make sure I do.
Landed on cvs trunk with the double-the removed; multas gratias, hendy!

I have a pseudo-automated test for this in development; I'll attach it once it's done.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: