Closed Bug 630696 Opened 13 years ago Closed 13 years ago

Create tests to verify cfx sdocs output

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: wbamberg, Assigned: wbamberg)

Details

Attachments

(1 file)

Test links in documentation 13 years ago Will Bamberg [:wbamberg] 22.20 KB, patch	warner : review+	Details \| Diff \| Splinter Review

Will Bamberg [:wbamberg]

Assignee

Description

•

13 years ago

As a response to bug 629922, we should have tests for the output of cfx sdocs. This could run cfx sdocs, and for all links in the output verify that there is a page on the end of the link.

Brian Warner [:warner :bwarner]

Comment 1

•

13 years ago

I did a quick search for python web-spidering tools, and asked Ian Bicking,
and here were a few contenders:

  - twill: http://twill.idyll.org/
  - http://pypi.python.org/pypi/linkchecker
  - http://arthurdejong.org/webcheck/
  - http://pypi.python.org/pypi/spider.py/0.5
  - http://pypi.python.org/pypi/spydey/0.3
    - uses httplib2.googlecode.com

Ian points out that the 'lxml' module is probably the best
HTML-parser/link-extractor tool around, and a number of those tools use it.
Most of those tools are aimed at the command line, and/or have some
inconvenient dependencies which we wouldn't want to add to the SDK, but we
might be able to extract some ideas or techniques from them.

Running 'spydey' against the 'cfx docs' server made me realize that our docs
isn't actually very static: I don't think the HTML of the front page has
links to any of the other pages (it's all being driven by scripts). So we
need some way to enumerate the links that need to be crawled. Maybe we should
add a secondary page (/all-links.html?) that contains links to enough pages
to bootstrap the crawl.

Will Bamberg [:wbamberg]

Assignee

Comment 2

•

13 years ago

Thanks Brian. I will have a look at these.

> Running 'spydey' against the 'cfx docs' server made me realize that our docs
> isn't actually very static: I don't think the HTML of the front page has
> links to any of the other pages (it's all being driven by scripts). So we
> need some way to enumerate the links that need to be crawled. Maybe we should
> add a secondary page (/all-links.html?) that contains links to enough pages
> to bootstrap the crawl.

Well, after bug 560008 lands they will be much, much more static, so this ought to get easier.

Brian Warner [:warner :bwarner]

Comment 3

•

13 years ago

Oh, also BeautifulSoup is good, http://www.crummy.com/software/BeautifulSoup , for parsing HTML to find the links.

Brian Warner [:warner :bwarner]

Comment 4

•

13 years ago

also our own Ian Bicking wrote http://blog.ianbicking.org/2010/04/02/webtest-http-testing/ , which can run in the same process as the WSGI app it's testing. I'm not sure if it has any scan-for-links features, though, but maybe we could glue it to a parser for that.

Myk Melez [:myk] [@mykmelez]

Updated

•

13 years ago

Priority: -- → P3

Hardware: x86 → All

Target Milestone: --- → 1.0

Myk Melez [:myk] [@mykmelez]

Comment 5

•

13 years ago

(automatic reprioritization of 1.0 bugs)

Priority: P3 → P2

Target Milestone: 1.0 → 1.1

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 6

•

13 years ago

(Pushing all open bugs to the --- milestone for the new triage system)

Target Milestone: 1.1 → ---

Will Bamberg [:wbamberg]

Assignee

Comment 7

•

13 years ago

Attached patch Test links in documentation — Details — Splinter Review

Here's a patch that generates the doc set, and looks through all the HTML files looking at the href attribute in each <a> element, and validates it by:
- checking it isn't a file:// URL
- if it's a relative URL, adding the baseurl to it and checking there's a file at the end of the resulting absolute URL

This test found a few bugs, which are also fixed in the patch.

First, just running HTMLParser on the code found a few problems. The only interesting one is the presence of "<" and ">" in the "author" key in package.json files, which we're now escaping using cgi.escape. I'm not sure whether we should use that for all the input files, but I was a bit worried about possible side-effects of that, so I'm only doing it in one place. Really, I think we should remove the "author" field from the package pages, as it's not really maintained, and we have a "Credits" page which works better.

Second, there are a few broken links. The most interesting are the e10s guide and API reference documents, which I decided to remove on the understanding that they are no longer at all accurate.

Finally, although this test runs fine on Windows, I am seeing (only on Windows) the intermittent "directory is not empty" error from shutil.rmtree, which is tracked in bug 659478. I don't think it occurs any more often with this patch. I also don't know what's causing that error. My best guess is that a file handle is sometimes not getting closed in time, but I did try going through generate_static_docs and explicitly closing files, but that did not make it go away. Maybe I missed one. I'll have another look tomorrow.

Attachment #575105 - Flags: review?(warner-bugzilla)

Brian Warner [:warner :bwarner]

Comment 8

•

13 years ago

Comment on attachment 575105 [details] [diff] [review]
Test links in documentation

looks great! The only suggestion I could think of in get_base_url() would be to go the other way: strip any leading "/", then always add one back in:

  base_url_path = get_base_url_path().lstrip("/")
  return "file://"+"/"+"/".join(base_url_path.split(os.sep))+"/"

but I'm not sure that's a whole lot better.

You might also consider building some of these strings fewer times: get_base_url() is going to be called for every href in the whole tree, and will always return the same value. (you might call it once in Generate_Docs_Tests and then pass the value into your Link_Checker() constructor).

It might also be time to rename these test methods: they're more than a mere smoketest now. Maybe just test_generate_static_docs and test_generate_docs ?

But it looks perfectly fine as-is.. r+!

Attachment #575105 - Flags: review?(warner-bugzilla) → review+

[github robot]

Comment 9

•

13 years ago

Commit pushed to https://github.com/mozilla/addon-sdk

https://github.com/mozilla/addon-sdk/commit/0e7415e739c1c06e38eb98937d4374892624595d
Bug 630696 - Create tests to verify cfx sdocs output; r=@warner

Will Bamberg [:wbamberg]

Assignee

Updated

•

13 years ago

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

[github robot]

Comment 10

•

13 years ago

Commit pushed to https://github.com/mozilla/addon-sdk

https://github.com/mozilla/addon-sdk/commit/0e7415e739c1c06e38eb98937d4374892624595d
Bug 630696 - Create tests to verify cfx sdocs output; r=@warner

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Create tests to verify cfx sdocs output

Categories

(Add-on SDK Graveyard :: Documentation, defect, P2)

Tracking

(Not tracked)

People

(Reporter: wbamberg, Assigned: wbamberg)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Updated

Comment 10

Attachment

General

Description

File Name

Content Type