Closed Bug 73992 (dublinCore) Opened 24 years ago Closed 16 years ago

Page Info dialog should support Dublin Core metadata

Categories

(SeaMonkey :: Page Info, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME
Future

People

(Reporter: karl, Assigned: db48x)

References

()

Details

(Keywords: helpwanted)

The View Page Info dialog should support Dublin Core metadata <URL: http://dublincore.org/documents/1999/07/02/dces/ >. Dublin Core defines 15 elements (these are not elements in the SGML/XML sense): Title: A name given to the resource. Creator: An entity primarily responsible for making the content of the resource. Subject: The topic of the content of the resource. Description: An account of the content of the resource. Publisher: An entity responsible for making the resource available. Contributor: An entity responsible for making contributions to the content of the resource. Date: A date associated with an event in the life cycle of the resource. Type: The nature or genre of the content of the resource. Format: The physical or digital manifestation of the resource. Identifier: An unambiguous reference to the resource within a given context. Source: A Reference to a resource from which the present resource is derived. Language: A language of the intellectual content of the resource. Relation: A reference to a related resource. Coverage: The extent or scope of the content of the resource. Rights: Information about rights held in and over the resource. This metadata is included in the HTML like this as defined in <URL: http://www.ietf.org/rfc/rfc2731.txt >: <meta name = "DC.Creator" content = "Engels, F."> <meta name = "DC.Title" content = "Capital"> <link rel = "schema.DC" href = "http://purl.org/DC/elements/1.1/"> Dublin Core also has a set of qualifisers <URL: http://dublincore.org/documents/dcmes-qualifiers/ >, which "narrow" the meaning of different elements. Example: <meta name = "DC.Date.Created" content = "1998-05-14"> <meta name = "DC.Date.Available" content = "1998-05-21"> <meta name = "DC.Date.Valid" content = "1998-05-28"> More examples can also be found in <URL: http://dublincore.org/documents/2000/07/16/usageguide/qualified-html.shtml > (not normative). Note that an element can be repeated several times (e.g., when there's several authors). All elements should be displayed in the UI, possible using several tabs/categories. All of these should be support in the page info dialog. Here's a *complete* "walk-through" of what we can expect to find in HTML documents, including all qualifiers (we should support these, and noone else) and schemes: TITLE: <meta name = "DC.Title" content = "Hamlet in Iceland; being the Icelandic romantic Ambales saga"> <meta name = "DC.Title.Alternative" content = "Ambales saga"> <meta name = "DC.Title" lang = "nn" content = "Hamlet på Island&#xa0;&ndash; Ambales saga"> Note the 'Alternative' qualifier. This should be marked as such in the UI. The language of the to first titles are defined by the document, i.e.: <html xml:lang="en"> (or <head ...> or another parent) <html lang="en"> HTTP header 'Content-Language' <meta http-equiv="Content-Language" content="en"> xml:lang overrides lang which overrides HTTP header which overrides meta http- equiv. (This is the normal way of getting the language of an element/attribute - - inheritance. I don't know if this information is available in Mozilla, but it *should* be, as CSS 2 requires it.) Language can be explicitly defined on each 'meta' element or implicitly, by inheritance from the parent or HTTP header. CREATOR: <meta name = "Creator" content = "Hufthammer, Karl Ove"> The creator name is usally written in the form 'Last name, First Name', but not always, e.g.: <meta name = "DC.Creator" content = "Mao Tse Tung"> They should *always* be displayed as 'First Name Last Name', e.g. 'Hufthammer, Karl Ove' should be displayed as 'Karl Ove Hufthammer'. SUBJECT: <meta name = "DC.Subject" content = "heart attack"> <meta name = "DC.Subject" scheme = "MeSH" content = "Myocardial Infarction; Pericardial Effusion"> <meta name = "DC.Subject" content = "Vietnam War"> <meta name = "DC.Subject" scheme = "LCSH" content = "Vietnamese Conflict, 1961-1975"> <meta name = "DC.Subject" content = "Friendship"> Note the 'scheme' attribute. This can take one of the values: LCSH MeSH DDC LCC UDC When presented in the UI, the name of the scheme should also be shown, but expanded to the following (not including the text in []): Library of Congress Subject Headings Medical Subject Headings [See <URL: http://www.nlm.nih.gov/mesh/meshhome.html >] Dewey Decimal Classification [See <URL: http://www.oclc.org/dewey/index.htm >] Library of Congress Classification [See <URL: http://lcweb.loc.gov/catdir/cpso/lcco/lcco.html >] UDC [See <URL: Universal Decimal Classification >] DESCRIPTION: <meta name = "DC.Description" content = "A tutorial and reference manual for Java."> <meta name = "DC.Description.TableofContents" lang = "en" content = "The Author gives some Account of Himself and Family -- His First Inducements to Travel -- He is Shipwrecked, and Swims for his Life -- Gets safe on Shore in the Country of Lilliput -- Is made a Prisoner, and carried up the Country"> <meta name = "DC.Description.Abstract" content = "The kinematics of the jaws and hyolingual apparatus in Caiman crocodilus were examined by cineradiography and electromyography. After catching, caimans position their prey between the teeth by a series of inertial bites and then kill and crush it by a forceful bite."> Note the 'TableofContents' and 'Abstract'. These should be marked as such in the UI. PUBLISHER: <meta name = "DC.Publisher" content = "O'Reilly"> <meta name = "DC.Publisher" content = "Digital Equipment Corporation"> This is pretty straigt-forward. There could be more than one publisher of a document. CONTRIBUTOR: <meta name = "DC.Contributor" content = "Curie, Marie"> Again, pretty straigt-forward. DATE: <meta name = "DC.Date" scheme = "W3CDTF" content = "1998-05-14"> <meta name = "DC.Date.Created" scheme = "W3CDTF" content = "1998-05-14"> <meta name = "DC.Date.Available" content = "1998-05-21"> <meta name = "DC.Date.Valid" scheme = "W3CDTF" content = "1998"> <meta name = "DC.Date.Valid" scheme = "W3CDTF" content = "1999-09-25T14:20+10:00/"> <meta name = "DC.Date.Issued" scheme = "W3CDTF" content = "1998-05-29"> <meta name = "DC.Date.Modified" scheme = "W3CDTF" content = "1998-05-29"> Note the qualifiers 'Created', 'Available', 'Valid', 'Issued' and 'Modified'. If the value of 'Created' and 'Modified' isn't available, they can be taken from the HTTP headers. The W3CDTF scheme is basically ISO 8601, and is specified in <URL: http://www.w3.org/TR/NOTE-datetime >. This is the default is no scheme is specified. There is also a scheme="Period" defined in <URL: http://dublincore.org/documents/dcmi-period/ >, though this can't be used as an attribute value (as far as I can see). TYPE: <meta name = "DC.Type" scheme = "DCMIType" content = "Software"> <meta name = "DC.Type" scheme = "DCMIType" content = "Dataset"> <meta name = "DC.Type" scheme = "DCMIType" content = "Event"> <meta name = "DC.Type" scheme = "DCMIType" content = "Service"> The DCIMType scheme is defined in <URL: http://dublincore.org/documents/dcmi- type-vocabulary/ >. There are nine different DCMI types. There can be a button to get a description of a type, or this can be presented as a tooltip (localizable of course). For 'Service': A service is a system that provides one or more functions of value to the end- user. Examples include: a photocopying service, a banking service, an authentication service, interlibrary loans, a Z39.50 or Web server. FORMAT: <meta name = "DC.Format.Medium" scheme = "IMT" content = "text/xml"> <meta name = "DC.Format.Extent" content = "14 minutes"> <meta name = "DC.Format" content = "A text file with mono-spaced tables and diagrams."> <meta name = "DC.Format" content = "video/mpeg; 14 minutes"> The IMT scheme is defined in <URL: http://www.isi.edu/in- notes/iana/assignments/media-types/media-types >. IDENTIFIER: <meta name = "DC.Identifier" scheme = "URI" content = "http://catalog.loc.gov/67-26020"> The URI scheme is defined in <URL: http://www.ietf.org/rfc/rfc2396.txt >. All URIs should be clickable. (An identifier is *not* and shoulnd not be treated as an URI unless the 'URI' scheme is used, even though it has the form of a valid URI. We should always honor the scheme and never assume a particular scheme is used if it isn't explicitly defined in the 'meta' element (an exception is 'DC.Date' which is W3C Date/Time if no scheme is chosen).) SOURCE: <meta name = "DC.Source" content = "Shakespeare's Romeo and Juliet"> <meta name = "DC.Source" scheme = "URI" content = "http://a.b.org/manon/"> The scheme 'URI' is a URI. The default is plain text. LANGUAGE: <meta name = "DC.Language" scheme = "rfc1766" content = "en"> <meta name = "DC.Language" scheme = "ISO639-2" content = "eng"> <meta name = "DC.Language" scheme = "rfc1766" content = "en-US"> ISO639-2: <URL: http://lcweb.loc.gov/standards/iso639-2/langhome.html >. RFC 1766: <URL: http://www.ietf.org/rfc/rfc1766.txt >. The name, not the language code of the language should be displayed in the UI. Mozilla already has a list of language code/language name pairs (see 'Preferences' | 'Language') built-in. Also, for backwards compatibility, 'ISO639-1' should be treated as synonym for 'rfc1766'. (The Nordic Metadata Template uses this.) When no language is specified, the language should be taken from the HTTP header, a http-equiv meta element or lang="xx" or xml:lang="xx" on the 'html' element (it should not be taken from any other elements -- only language specified on the top-level element defines the document language). RELATION: <meta name = "DC.Relation.IsVersionOf" scheme = "URI" content = "http://foo.bar.org/draft9.4.4.2"> <meta name = "DC.Relation.HasVersion" scheme = "URI" content = "http://foo.bar.org/draft9.4.4.2"> <meta name = "DC.Relation.IsReplacedBy" scheme = "URI" content = "http://foo.bar.org/draft9.4.4.2"> <meta name = "DC.Relation.Replaces" scheme = "URI" content = "http://foo.bar.org/draft9.4.4.2"> <meta name = "DC.Relation.IsRequiredBy" scheme = "URI" content = "http://foo.bar.org/draft9.4.4.2"> <meta name = "DC.Relation.Requires" content = "LWP::UserAgent; HTML::Parse; URI::URL; Net::DNS; Tk::Pixmap; Tk::Bitmap; Tk::Photo"> <meta name = "DC.Relation.IsPartOf" scheme = "URI" content = "http://foo.bar.org/abc/proceedings/1998/"> <meta name = "DC.Relation.HasPart" scheme = "URI" content = "http://foo.bar.org/abc/proceedings/1998/"> <meta name = "DC.Relation.IsFormatOf" scheme = "URI" content = "http://foo.bar.org/cd145.sgml"> <meta name = "DC.Relation.IsReferencedBy" scheme = "URI" content = "http://foo.bar.org/cd145.sgml"> <meta name = "DC.Relation.References" content = "urn:isbn:1-56592-149-6"> <meta name = "DC.Relation.IsFormatOf" content = "Shakespeare's Romeo and Juliet"> <meta name = "DC.Relation.HasFormat" scheme = "URI" content = "Shakespeare's Romeo and Juliet"> The scheme 'URI' is a URI. The default is plain text. I *think* I remembered all qualifers. Description of them can be found at <URL: http://dublincore.org/documents/dcmes-qualifiers/#relation >. COVERAGE: <meta name = "DC.Coverage.Temporal" content = "US civil war era; 1861-1865"> <meta name = "DC.Coverage.Temporal" scheme = "W3CDTF" content = "1998"> <meta name = "DC.Coverage.Spatial" content = "Columbus, Ohio, USA; Lat: 39 57 N Long: 082 59 W"> <meta name = "DC.Coverage.Spatial" scheme = "TGN" content = "Columbus (C,V)"> Note to author: This is the spatial or temporal features of the intellectual content. A document about the Eiffel Tower, written in English, by a Norwegian, living in Turkey, stored on a server in Brazil should have a coverage of 'Paris' or 'France' or the equivalent geographical coordinates. This has the qualifiers 'Temporal' and 'Spatial'. There are tons of schemes for these. See <URL: http://dublincore.org/documents/dcmes-qualifiers/#coverage >. RIGHTS: <meta name = "DC.Rights" lang = "en" content = "Copyright Acme 1999 - All rights reserved."> <meta name = "DC.Rights" scheme = "URI" content = "http://foo.bar.org/cgi-bin/terms"> *** IMPORTANT *** The 'DC' "elements" should *only* be recognized if one of the following lines are included in the HTML document: <link rel = "schema.DC" href = "http://purl.org/DC/elements/1.1/"> <link rel = "schema.DC" href = "http://purl.org/DC/elements/1.0/"> <link rel = "schema.DC" href = "http://purl.org/metadata/dublin_core_elements"> <link rel = "schema.DC" href = "http://purl.org/DC/elements/1.0/#fragement-identifier"> (You can compare these to 'namespaces' in XML.) The 'http://purl.org/metadata/dublin_core_elements' should only be supported for backwards compatibility and its use is discouraged. *All* URLs can contain fragment identifiers, e.g.: <link rel = "schema.DC" href = "http://purl.org/DC/elements/1.1/#date"> Here, only the 'date' element should be supported. I'm not completely sure of this, but I *think* using <link rel = "schema.TEST" href = "http://purl.org/DC/elements/1.1/"> should enable: <meta name = "TEST.Description" content = "A tutorial and reference manual for Java."> to work (i.e., the prefix has no meaning in it self, only when connected to a "namespace").
Blocks: 68410
Blocks: 52730
Karl: db48x is rewriting Page Info, as far as I know. You may want to contact him. The word infobot in #mozillazine has his email address. Gerv
What if someone has: <link rel = "schema.DC" href = "http://purl.org/DC/elements/1.2/"> [and that url exists]
> What if someone has: > <link rel = "schema.DC" > href = "http://purl.org/DC/elements/1.2/"> Hmm, you have a point. OK, I think we safely can assume that all schemas beginning with "http://purl.org/DC/elements/" is part of the the DC. (The DC is pretty stable, and I doubt it will change much.) > [and that url exists] 404?
well i picked 1.2 because you didn't list it, but i'm assuming it doesn't exist yet (or you would have). if someone references 1.2 and it returns 404 do we still honor it? [I was more concerned w/ honoring a present file that matched the naming convention but not your list -- thanks for your revised answer, it's much more acceptable]
Keywords: helpwanted
This information will be already be picked up by the new page info stuff when it lists the contents of all meta tags. I'm just displaying the contents of the name and content/http-equiv attributes, so it won't pretty print anything. Will this be enough for the shipping mozilla? Anything further is easily included as an extension with an overlay. That overlay could modify the contents of the tree I'm showing meta tags in, or add a completely seperate tab for displaying the DC metadata. I'd recomend the latter. db48x Now if I could just get mozilla to stop crashing on form submission...
> I'm just displaying the contents of the > name and content/http-equiv attributes, so it won't > pretty print anything. Will > this be enough for the shipping mozilla? Well, it will be better than nothing, but it won't be enough for marking this bug as 'fixed' (anymore than displaying a DOM tree of an HTML document can be seen as *supporting* HTML).
Dublin Core metadata should also be accessible through the W3C's RDF recommendation, like so: <html><head> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/metadata/dublin_core#"> <rdf:Description about="http://www.dlib.org"> <dc:Title>D-Lib Program - Research in Digital Libraries</dc:Title> <dc:Description>The D-Lib program supports the community of people with research interests in digital libraries and electronic publishing.</dc:Description> <dc:Publisher>Corporation For National Research Initiatives</dc:Publisher> <dc:Date>1995-01-07</dc:Date> <dc:Subject> <rdf:Bag> <rdf:li>Research; statistical methods</rdf:li> <rdf:li>Education, research, related topics</rdf:li> <rdf:li>Library use Studies</rdf:li> </rdf:Bag> </dc:Subject> <dc:Type>World Wide Web Home Page</dc:Type> <dc:Format>text/html</dc:Format> <dc:Language>en</dc:Language> </rdf:Description> </rdf:RDF> </head></html> or the RDF abbreviated syntax: <html><head> <rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:dc="http://purl.org/metadata/dublin_core#"> <rdf:Description about="http://www.dlib.org" dc:Title="D-Lib Program - Research in Digital Libraries" dc:Description="The D-Lib program supports the community of people with research interests in digital libraries and electronic publishing." dc:Publisher="Corporation For National Research Initiatives" dc:Date="1995-01-07"/> </rdf:RDF> </head></html> or through links to external RDF files: <link rel="meta" href="mydocMetadata.DC.RDF"> ...These examples were taken from the W3C recommendation at http://www.w3.org/TR/REC-rdf-syntax/ -- see that document for more details. I know that Mozilla has support for RDF datasources, I don't know how different this usage of RDF is from the current implementation. As support for RDF as a vehicle for metadata and the "Semantic Web" grows, Mozilla needs to be able to put it to use.
No longer blocks: 52730
Status: NEW → ASSIGNED
Target Milestone: --- → Future
mass moving open bugs pertaining to page info to pmac@netscape.com as qa contact. to find all bugspam pertaining to this, set your search string to "BigBlueDestinyIsHere".
QA Contact: sairuh → pmac
Component: XP Apps: GUI Features → Page Info
Alias: dublinCore
Assignee: bugs → db48x
Status: ASSIGNED → NEW
QA Contact: pmac
Product: Browser → Seamonkey
Bug 268343 is about Live Bookmarks better supporting Dublin Core metadata, related?
Test cases at http://www.codestyle.org/test/DCTestCases.shtml WFM with the current pageinfo implementation in Build identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; rv:1.9.1b4pre) Gecko/20090422 SeaMonkey/2.0b1pre All the attributes show up in the General->Meta list.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → WORKSFORME
This is a TEST!!!
You need to log in before you can comment on or make changes to this bug.