Last Comment Bug 241438 - please make history.dat easier to parse (i.e., not Mork)
: please make history.dat easier to parse (i.e., not Mork)
Status: RESOLVED WONTFIX
WFM by Places for Firefox 3.0 (uses S...
:
Product: Core
Classification: Components
Component: History: Global (show other bugs)
: Trunk
: All All
: -- normal with 85 votes (vote)
: ---
Assigned To: Nobody; OK to take it and work on it
:
:
Mentors:
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2004-04-23 03:30 PDT by Jamie Zawinski
Modified: 2012-01-17 09:04 PST (History)
60 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
Bookmarklet to export history to RDF/XML (1.23 KB, text/plain)
2004-10-17 22:41 PDT, neomjp
no flags Details
Python script to convert mork to valid XML (12.51 KB, text/plain)
2005-02-21 10:28 PST, Mike Hoye
no flags Details
Utility (Win32) to export the HISTORY.DAT file to a tab delimited text file. (225.35 KB, application/octet-stream)
2006-03-17 09:10 PST, Keith Anderson
no flags Details

Description Jamie Zawinski 2004-04-23 03:30:35 PDT
User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040116
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040116

It is impossible for non-Mozilla programs to extract data from
~/.mozilla/*/*.slt/history.dat because it uses Mork, which is --
and I do not use these words lightly -- the single most braindamaged
file format that I have ever seen in my nineteen year career.

Please make history.dat contain something sane, like tab-delimited
fields, or XML.

I have tried to write a parser for Mork in Perl, and it will never
work right.  The depths of depravity to which this format sinks are
too great.  You can see it here: http://www.jwz.org/hacks/mork.pl

Safari does the following.  It is sane and beautiful.

    <?xml version="1.0" encoding="UTF-8"?>
    <!DOCTYPE plist PUBLIC "-//Apple Computer//DTD PLIST 1.0//EN"
       "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
    <plist version="1.0">
    <dict>
      <key>WebHistoryDates</key>
      <array>
        <dict>
          <key></key>
          <string>http://www.livejournal.com/users/jwz/312657.html</string>
          <key>lastVisitedDate</key>
          <string>104399495.8</string>
          <key>title</key>
          <string>jwz - when the database worms eat into your brain</string>
          <key>visitCount</key>
          <integer>1</integer>
        </dict>
        <dict>
          <key></key>
          <string>http://www.jwz.org/hacks/mork.pl</string>
          <key>lastVisitedDate</key>
          <string>104399393.0</string>
          <key>title</key>
          <string>mork.pl</string>
          <key>visitCount</key>
          <integer>1</integer>
        </dict>
        ...etc.



Reproducible: Always
Steps to Reproduce:
Comment 1 Pete Prodoehl 2004-04-26 13:56:39 PDT
I'm with jwz on this one, mork is just plain goofy. Make it rdf, or xml, or
something trivial to parse like most of the other files Mozilla stores data in.
Comment 2 Jed Brown 2004-04-27 11:30:25 PDT
Agreed, you got my vote
Comment 3 Robert Accettura [:raccettura] 2004-04-27 11:32:34 PDT
Is there even documentation on mork?
Comment 4 Jamie Zawinski 2004-04-27 13:08:02 PDT
> Is there even documentation on mork?

Extremely little, and nearly incomprehensible:

    http://www.mozilla.org/mailnews/arch/mork/primer.txt
    http://www.mozilla.org/mailnews/arch/mork/grammar.txt

As I said above, I tried to reverse-engineer it in Perl; my analysis of it,
from http://www.jwz.org/hacks/mork.pl --

#  In Netscape Navigator 1.0 through 4.0, the history.db file was just a
#  Berkeley DBM file.  You could trivially bind to it from Perl, and
#  pull out the URLs and last-access time.  In Mozilla, this has been
#  replaced with a "Mork" database for which no tools exist.  In brief,
#  let's count its sins:
#
#    - Two different numerical namespaces that overlap.
#
#    - It can't decide what kind of character-quoting syntax to use:
#      Backslash?  Hex encoding with dollar-sign?
#
#    - C++ line comments are allowed sometimes, but sometimes // is just
#      a pair of characters in a URL.
#
#    - It goes to all this serious compression effort (two different 
#      string-interning hash tables) and then writes out Unicode strings
#      without using UTF-8: writes out the unpacked wchar_t characters!
#
#    - Worse, it hex-encodes each wchar_t with a 3-byte encoding,
#      meaning the file size will be 3x or 6x (depending on whether
#      whchar_t is 2 bytes or 4 bytes.)
#
#    - It masquerades as a "textual" file format when in fact it's just
#      another binary-blob file, except that it represents all its magic
#      numbers in ASCII.  It's not human-readable, it's not hand-editable,
#      so the only benefit there is to the fact that it uses short lines
#      and doesn't use binary characters is that it makes the file bigger.
#      Oh wait, my mistake, that isn't actually a benefit at all.

For comedic value:

    http://www.livejournal.com/users/jwz/312657.html
Comment 6 Richard Klein 2004-06-18 13:33:21 PDT
Bug 245745 is to replace history mork database with sqlite.
Comment 7 hacksaw 2004-09-23 00:16:42 PDT
I note from
>http://www.treedragon.com/ged/map/ti/newJun02.htm#30jun02-mork-grammar> that
the original author not only hasn't worked on it in a long while, but doesn't
care about it. He also admits that it is undocumented, and that he was never
asked for such.

Is anyone actually maintaining it in a real way, or is it in the "don't touch
it, and it won't explode category?"

I note that someone has suggested that sqlite be used. How does this affect the
on disk format of the file? I was under the impression that the history file
contains the URL, and access count, and the time of last access. 

How about this as a file format?:

<properly escaped URL>,<access count>,<time of last access>


While I'll agree that the XML encoded version is clear, it's also stupidly
verbose. A more complicated database might need such markup for efficiency, but
a three field, nothing null, no relations database? It's like using a 24 foot
diesel truck to pick up a six pack of beer.

Now, I'd be happy to see a well thought out and standard database access method
be used for all the various databases inside this beast. Certainly the
addressbooks could benefit from such a move. And I'd even say that it's storage
format should be XML or maybe LDIF.

But where leave the simple things simple. CSV is your friend.
Comment 8 Vladimir Vukicevic [:vlad] [:vladv] 2004-09-23 16:39:23 PDT
History will, along with other things, most likely end up in a SQLite database.
 It won't be parsable "by hand", but through the command line sqlite3 client,
DBD::SQLite or a million other wrappers, it should solve a lot of these
problems.  Stay tuned.  Mork my words, mork-no-more!
Comment 9 Paul Berendsen 2004-09-24 03:54:11 PDT
(In reply to vladimir, comment #8)
>  ....  Mork my words, mork-no-more!

I hope that nobody will answer: MORK´S FOR ME (not).

Comment 10 Jan Becker 2004-10-10 17:10:45 PDT
Yes, please get rid of mork ASAP!!!
It looks like old data is just left there unreferenced, yeah even the cell with
the pointers to the data is left in place and there is just a new cell with the
same index appended to the end... I couldn't believe what I was seeing, and I
still can't believe this "format" made it that far into the project.
I would like to synchronize my address book remotely over a narrow bandwidth
connection, and with mozilla's built-in roaming support mork becomes more absurd
than ever.
Comment 11 neomjp 2004-10-17 22:41:53 PDT
Created attachment 162439 [details]
Bookmarklet to export history to RDF/XML


Here is a simple javascript to get the history data in a format easier to read.
Be sure that you need to run it from a trusted context to enable
UniversalXPConnect privilege.

Good things about storing history data in RDF/XML,
1. Easier to read.
2. We can possibly do without mork code. (we cannot do this yet because
formhistory.dat also uses mdb:mork.)

Bad things...
1. Exported history.rdf.txt is always larger than history.dat.
2. RDF/XML Serializer is very slow and memory-consuming. It takes more than one
minute if the history is large. (Bug 259119 will be fixed soon)
Comment 12 Héctor Monacci 2005-01-21 23:12:10 PST
By the way, another monstruosity written in Mork is Mozilla / Thunderbird
addressbook. If it was *anything* else, we could parse it and e.g. have it print
a booklet through TeX. With Mork we are stuck in the dark exactly the way
Outlook Express does!
Comment 13 xxxx yyyy 2005-02-03 06:16:27 PST
(In reply to comment #8)
> History will, along with other things, most likely end up in a SQLite database.
Not Another DB! A more general solution would be something like a JDBC or ODBC
interface, so Moz etc can use the local DBMS for which tools are already
familiar & set up, thereby reducing the already-steep learning curve slope.
Comment 14 Aristotle Pagaltzis 2005-02-03 06:41:54 PST
(In reply to comment #13)
> A more general solution would be something like a JDBC or ODBC interface

Now *that* would be bloat. Not only would it be bloat, it would actually require
an entire *database* *server* just to store a freaking history log.

(SQLite would be fine. If you don't know about it, educate yourself before you
make such comments.)
Comment 15 xxxx yyyy 2005-02-03 07:04:01 PST
(In reply to comment #14)
>> A more general solution would be something like a JDBC or ODBC interface
> Now *that* would be bloat. Not only would it be bloat, it would actually require
> an entire *database* *server* just to store a freaking history log.
> 
> (SQLite would be fine. If you don't know about it, educate yourself before you
> make such comments.)
You missed my point. I said: "***Not Another DB!*** A more general solution
would be something like a JDBC or ODBC ***interface***, so Moz etc can use the
***local*** DBMS".
Please refer to *** highlighting above. I HAVE a DBMS. I don't need or want
another one. I want all my relevant data in one consistent system, not scattered
about in a plethora of weird formats. I would like ALL the Moz data in one
accessable system.
Comment 16 Aristotle Pagaltzis 2005-02-03 07:48:57 PST
(In reply to comment #15)
> I HAVE a DBMS.

That's great. Do the majority of end users have a DB server installed? Do you
expect them to install and manage one?

> I don't need or want another one.

Regard SQLite as a kind of souped up BerkeleyDB that happens to understand SQL.
There's no server. Nor should there be one. Actually forget the fact that you
saw the letters "SQL" anywhere.

The day Mozilla requires a DB server just to run is the day I switch to another
browser. Thankfully, I believe the people who are actually doing the work (as
opposed to just mouthing off, like, say, you or I) have more clue than that.
Comment 17 xxxx yyyy 2005-02-03 21:13:23 PST
(In reply to comment #16)
> (In reply to comment #15)
> > I HAVE a DBMS.
> That's great. Do the majority of end users have a DB server installed?

No, that I have seen. Although Debian Linux seems to install MySQL by default.

> Regard SQLite as a kind of souped up BerkeleyDB that happens to understand SQL.

And can be accessed via, say, JDBC ?

> There's no server. Nor should there be one.
> The day Mozilla requires a DB server just to run is the day I switch to
another browser.

I agree. Any type of server should not be a requirement to run Moz. My response
was an oversimplification that did not include other options.

I would like the option of saving all Moz data to MY data system where I can
deal with it in a consistent manner, not having to translate from yet another
format. See Mozilla2:Unified Storage in MozillaWiki for another alternative.
Comment 18 Andrew Smith 2005-02-04 00:15:27 PST
> And can be accessed via, say, JDBC ?

Yes, and ODBC, libdbi, ADO.NET (including Mono), Perl-DBI, PHP PEAR DB, Python
DBAPI, Tcl/Tk, and many others too:

http://www.ch-werner.de/javasqlite/
http://www.ch-werner.de/sqliteodbc/
http://www.itwriting.com/sqlitenotes.php
http://www.sqlite.org/cvstrac/wiki?p=SqliteWrappers

I hope the above links help to show that this is a non-issue, and I would kindly
ask you to stop spamming this bug's comments.
Comment 19 Mike Hoye 2005-02-21 10:28:05 PST
Created attachment 175024 [details]
Python script to convert mork to valid XML

This python script (based on a "mork-mindy converter" originally written by a
fellow at Stanford, butchered by mhoye@off.net) converts mork files to
something reasonably XMLish. I hope that it's useful for bridging the gap
between Mork and a sane non-Mork future.
Comment 20 Mike Hoye 2005-02-21 17:33:34 PST
Comment on attachment 175024 [details]
Python script to convert mork to valid XML

#!/usr/bin/env python
#==========================================================================
# Original "Mindy.py" copyright: Kumaran Santhanam 
#				 <kumaran@alumni.stanford.org>
#
# Subsequent butchery, demork.py: Mike Hoye 
#				  <mhoye@off.net>
#
# Just to be crystal clear about this, Santhanam did all the heavy lifting
# (i.e. Mork-scraping) here, but apparently the strain of working on mork
# broke him; the "mindy" output was _another_ bizzare home-rolled database
# format, except this one had lots of "@" symbols in it.
#
# This is a straight-up pattern-recognition hack; I've never worked with
# Python or XML before, but not only does this spit out valid XML, if you
# drink a big glass of water while you're using it you might cure your 
# hiccups.
#
#--------------------------------------------------------------------------
# Project : demork - takes in Mork files, spits out XML
# File	  : demork.py
# Version : 0.3
#--------------------------------------------------------------------------
# This program is free software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# Version 2 (1991) as published by the Free Software Foundation.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.	See the
# GNU General Public License for more details.
#
# For the full text of the GNU General Public License, refer to:
#   http://www.fsf.org/licenses/gpl.txt
#
# For alternative licensing terms, please contact the author.
#--------------------------------------------------------------------------
#
#  This widget has hardcoded XML entity tags in places that make it pretty
#  much specifically meant for the Mozilla history.db file. They're tagged
#  with a "#HDBXML" comment nearby, for easy searching and replacing if
#  you intend to do anything else with this and want semantics that make
#  some kind of sense. 
#
#  Version 0.1 was the first run at this, which was (as advertised)
#  only xml-ish.
#
#  Version 0.3 now includes a DTD in the XML output, and escapes the 
#  ampersands in long URLs, making for valid XML. 
#
#  "If it breaks, feel free to keep both parts." - JWZ
#
#==========================================================================

#==========================================================================
# IMPORTS
#==========================================================================
import sys
import re
import getopt

from sys import stdin, stdout, stderr

#==========================================================================
# GLOBALS
#==========================================================================
VERSION = "0.3"

#==========================================================================
# FUNCTIONS
#==========================================================================
def usage ():
    print
    print "demork - A Mork -> XML converter"
    print
    print "Version %s, (c) Mike Hoye, 2005." % VERSION
    print "Based on a Mork/Mindy converter, (c) 2005 Kumaran Santhanam."
    print
    print "usage: demork MORKFILE"
    print
    print "The converted output is dumped to STDOUT."
    print "A filename of '-' will take input from STDIN."
    print

#==========================================================================
# DATABASE
#==========================================================================
class Database:
    def __init__ (self):
	self.cdict  = { }
	self.adict  = { }
	self.tables = { }

class Table:
    def __init__ (self):
	self.id     = None
	self.scope  = None
	self.kind   = None
	self.rows   = { }

class Row:
    def __init__ (self):
	self.id     = None
	self.scope  = None
	self.cells  = [ ]

class Cell:
    def __init__ (self):
	self.column = None
	self.atom   = None


#==========================================================================
# UTILITIES
#==========================================================================
def invertDict (dict):
    idict = { }
    for key in dict.keys():
	idict[dict[key]] = key
    return idict

def hexcmp (x, y):
    try:
	a = int(x, 16)
	b = int(y, 16)
	if a < b:  return -1
	if a > b:  return 1
	return 0

    except:
	return cmp(x, y)


#==========================================================================
# MORK INPUT
#==========================================================================
def escapeData (match):
    return match.group() \
	       .replace('\\\\n', '$0A') \
	       .replace('\\)', '$29') \
	       .replace('>', '$3E') \
	       .replace('}', '$7D') \
	       .replace(']', '$5D')

pCellText   = re.compile(r'\^(.+?)=(.*)')
pCellOid    = re.compile(r'\^(.+?)\^(.+)')
pCellEscape = re.compile(r'((?:\\[\$\0abtnvfr])|(?:\$..))')

backslash = { '\\\\' : '\\',
	      '\\$'  : '$',
	      '\\0'  : chr(0),
	      '\\a'  : chr(7),
	      '\\b'  : chr(8),
	      '\\t'  : chr(9),
	      '\\n'  : chr(10),
	      '\\v'  : chr(11),
	      '\\f'  : chr(12),
	      '\\r'  : chr(13) }

def unescapeMork (match):
    s = match.group()
    if s[0] == '\\':
	return backslash[s]
    else:
	return chr(int(s[1:], 16))

def decodeMorkValue (value):
    global pCellEscape
    return pCellEscape.sub(unescapeMork, value)

def addToDict (dict, cells):
    for cell in cells:
	eq  = cell.find('=')
	key = cell[1:eq]
	val = cell[eq+1:-1]
	dict[key] = decodeMorkValue(val)

def getRowIdScope (rowid, cdict):
    idx = rowid.find(':')
    if idx > 0:
	return (rowid[:idx], cdict[rowid[idx+2:]])
    else:
	return (rowid, None)

def delRow (db, table, rowid):
    (rowid, scope) = getRowIdScope(rowid, db.cdict)
    if scope:
	rowkey = rowid + "/" + scope
    else:
	rowkey = rowid + "/" + table.scope

    if table.rows.has_key(rowkey):
	del table.rows[rowkey]

def addRow (db, table, rowid, cells):
    global pCellText
    global pCellOid

    row = Row()
    (row.id, row.scope) = getRowIdScope(rowid, db.cdict)

    for cell in cells:
	obj = Cell()
	cell = cell[1:-1]

	match = pCellText.match(cell)
	if match:
	    obj.column = db.cdict[match.group(1)]
	    obj.atom   = decodeMorkValue(match.group(2))

	else:
	    match = pCellOid.match(cell)
	    if match:
		obj.column = db.cdict[match.group(1)]
		obj.atom   = db.adict[match.group(2)]

	if obj.column and obj.atom:
	    row.cells.append(obj)

    if row.scope:
	rowkey = row.id + "/" + row.scope
    else:
	rowkey = row.id + "/" + table.scope

    if table.rows.has_key(rowkey):
	print >>stderr, "ERROR: duplicate rowid/scope %s" % rowkey
	print >>stderr, cells

    table.rows[rowkey] = row

def inputMork (data):
    # Remove beginning comment
    pComment = re.compile('//.*')
    data = pComment.sub('', data, 1)

    # Remove line continuation backslashes
    pContinue = re.compile(r'(\\(?:\r|\n))')
    data = pContinue.sub('', data)

    # Remove line termination
    pLine = re.compile(r'(\n\s*)|(\r\s*)|(\r\n\s*)')
    data = pLine.sub('', data)

    # Create a database object
    db		= Database()

    # Compile the appropriate regular expressions
    pCell	= re.compile(r'(\(.+?\))')
    pSpace	= re.compile(r'\s+')
    pColumnDict = re.compile(r'<\s*<\(a=c\)>\s*(?:\/\/)?\s*(\(.+?\))\s*>')
    pAtomDict	= re.compile(r'<\s*(\(.+?\))\s*>')
    pTable	=
re.compile(r'\{-?(\d+):\^(..)\s*\{\(k\^(..):c\)\(s=9u?\)\s*(.*?)\}\s*(.+?)\}')
    pRow	= re.compile(r'(-?)\s*\[(.+?)((\(.+?\)\s*)*)\]')

    pTranBegin	= re.compile(r'@\$\$\{.+?\{\@')
    pTranEnd	= re.compile(r'@\$\$\}.+?\}\@')

    # Escape all '%)>}]' characters within () cells
    data = pCell.sub(escapeData, data)

    # Iterate through the data
    index  = 0
    length = len(data)
    match  = None
    tran   = 0
    while 1:
	if match:  index += match.span()[1]
	if index >= length:  break
	sub = data[index:]

	# Skip whitespace
	match = pSpace.match(sub)
	if match:
	    index += match.span()[1]
	    continue

	# Parse a column dictionary
	match = pColumnDict.match(sub)
	if match:
	    m = pCell.findall(match.group())
	    # Remove extraneous '(f=iso-8859-1)'
	    if len(m) >= 2 and m[1].find('(f=') == 0:
		m = m[1:]
	    addToDict(db.cdict, m[1:])
	    continue

	# Parse an atom dictionary
	match = pAtomDict.match(sub)
	if match:
	    cells = pCell.findall(match.group())
	    addToDict(db.adict, cells)
	    continue

	# Parse a table
	match = pTable.match(sub)
	if match:
	    id = match.group(1) + ':' + match.group(2)

	    try:
		table = db.tables[id]

	    except KeyError:
		table = Table()
		table.id    = match.group(1)
		table.scope = db.cdict[match.group(2)]
		table.kind  = db.cdict[match.group(3)]
		db.tables[id] = table

	    rows = pRow.findall(match.group())
	    for row in rows:
		cells = pCell.findall(row[2])
		rowid = row[1]
		if tran and rowid[0] == '-':
		    rowid = rowid[1:]
		    delRow(db, db.tables[id], rowid)

		if tran and row[0] == '-':
		    pass

		else:
		    addRow(db, db.tables[id], rowid, cells)
	    continue

	# Transaction support
	match = pTranBegin.match(sub)
	if match:
	    tran = 1
	    continue

	match = pTranEnd.match(sub)
	if match:
	    tran = 0
	    continue

	match = pRow.match(sub)
	if match and tran:
	    print >>stderr, "WARNING: using table '1:^80' for dangling row: %s"
% match.group()
	    rowid = match.group(2)
	    if rowid[0] == '-':
		rowid = rowid[1:]

	    cells = pCell.findall(match.group(3))
	    delRow(db, db.tables['1:80'], rowid)
	    if row[0] != '-':
		addRow(db, db.tables['1:80'], rowid, cells)
	    continue

	# Syntax error
	print >>stderr, "ERROR: syntax error while parsing MORK file"
	print >>stderr, "context[%d]: %s" % (index, sub[:40])
	index += 1

    # Return the database
    return db


#==========================================================================
# XML out
#
# All these "mindy" references are holdovers from this program's previous
# life. Like I said, I don't speak python, and it's not broke... -mhoye
#==========================================================================
pMindyEscape = re.compile('([\x00-\x1f\x80-\xff\\\\])')

def escapeMindy (match):
    s = match.group()
    if s == '\\': return '\\\\'
    if s == '\0': return '\\0'
    if s == '\r': return '\\r'
    if s == '\n': return '\\n'
    return "\\x%02x" % ord(s)

def encodeMindyValue (value):
    global pMindyEscape
    return pMindyEscape.sub(escapeMindy, value)

def outputMindy (db):

#HDBXML 

    columns = db.cdict.keys()
    columns.sort(hexcmp)

    tables = db.tables.keys()
    tables.sort(hexcmp)

    print '<?xml version="1.0" standalone="yes"?>'
#I need to put a DTD right.... here!
    print '<!DOCTYPE history ['
    print '<!ELEMENT history ANY>'

    for column in columns:
	name = db.cdict[column]
	if name.find('ns:') != 0:
	    print '  <!ELEMENT %s ANY>' % name

    print '  <!ELEMENT entry ANY>'
    print ' ]>'


    print '<history>'

    for table in [ db.tables[k] for k in tables ]:
	rows = table.rows.keys()
	rows.sort(hexcmp)
	for row in [ table.rows[k] for k in rows ]:
	    print '  <entry>'
#	     print '  <entry table="%s" row="%s">' % (table.id, row.id)
	    for cell in row.cells:
		print '    <%s>%s</%s>' % (cell.column, re.search('s/&/&amp;/',
encodeMindyValue(cell.atom)), cell.column)
	    print '  </entry>'

    print '</history>'

#==========================================================================
# MAIN PROGRAM
#==========================================================================
def main (argv=None):
    if argv is None:  argv = sys.argv

    # Parse the command line arguments
    try:
	opts, args = getopt.getopt(argv[1:], "ht")
    except:
	print "Invalid command-line argument"
	usage()
	return 1

    # Process the switches
    optTest = 0
    for o, a in opts:
	if o in ("-h"):
	    usage()
	    return 0
	elif o in ("-t"):
	    optTest = 1

    # Read the filename
    if (len(args) != 1):
	usage()
	return 1

    filename = args[0]

    # Read the file into memory
    if (filename != '-'):
	file = open(filename, "rt")
    else:
	file = stdin

    data = file.read()
    file.close()

    # Determine the file type and process accordingly
    if (data.find('<mdb:mork') >= 0):
	db = inputMork(data)
	outputMindy(db)
    else:
	print "unknown file format: %s (I only deal with Mork, sorry)" %
filename
	return 1

    # Return success
    return 0


if (__name__ == "__main__"):
    result = main()
    # Comment the next line to use the debugger
    sys.exit(result)
Comment 21 Frank Wein [:mcsmurf] 2005-02-21 22:21:44 PST
mhoye AT neon.polkaroo.net: Please don't use Edit Attachment As Comment when you
don't want to comment on parts of the attachment :) (you just pasted the whole
attachment as comment). Or did you want to attach a new version of it? Then use
Create a New Attachment again.
Comment 22 Mike Hoye 2005-02-28 05:21:00 PST
(In reply to comment #21)
> Please don't use Edit Attachment As Comment when you
> don't want to comment on parts of the attachment :)

I'm sorry, I had no idea it would do that.

I should add here that I'm rewriting that python thing in javascript, in the
hopes that it can become part of some migration mechanism. When I'm done I'll
make a note here so that it can be tested, and maybe somebody smarter than me
can wedge it into the install process. 
Comment 23 captainmellow 2005-03-03 14:16:49 PST
The mork format is baroque and unreadable. Please remove this blight from the
project and utilize an established, open standard format like valid XML for the
history file and other places which are using mork. It is not acceptable to
require the use of a perl/python script to parse the history.dat file into a
usable format.
Comment 24 alanjstr 2005-07-07 14:11:07 PDT
The solution is being formed.
http://wiki.mozilla.org/Mozilla2:Unified_Storage
Comment 25 Jyri Sillanpaa 2005-08-19 05:04:44 PDT
Also the address book should not be on this format, it's impossible to write any
automated scripts to migrate from other e-mail clients to Mozilla because of this.
Comment 26 Martijn Coppoolse 2005-08-19 05:34:14 PDT
Here's some more info on the (terrible) Mork format:
http://www.erys.org/resume/netscape/mork/what.html
Comment 27 Mark Smith (:mcs) 2005-11-21 11:20:49 PST
Regarding Comment #19 (version 0.3 of the demork.py script):

To get it to work, I had to replace the re.search() call with re.sub().  I also added entity substitution of <.  I still get a lot of "using table --- for dangling row: ---" warnings when I run it, but the output looks useful.  Here is my demork.py patch:

--- demork-0.3.py       2005-11-21 11:28:33.000000000 -0500
+++ demork-0.3+hacked.py        2005-11-21 14:18:22.817678608 -0500
@@ -403,17 +403,17 @@

     for table in [ db.tables[k] for k in tables ]:
        rows = table.rows.keys()
        rows.sort(hexcmp)
        for row in [ table.rows[k] for k in rows ]:
            print '  <entry>'
 #           print '  <entry table="%s" row="%s">' % (table.id, row.id)
            for cell in row.cells:
-               print '    <%s>%s</%s>' % (cell.column, re.search('s/&/&amp;/', encodeMindyValue(cell.atom)), cell.column)
+               print '    <%s>%s</%s>' % (cell.column, re.sub('<', '&lt;', re.sub('&', '&amp;', encodeMindyValue(cell.atom))), cell.column)
            print '  </entry>'

     print '</history>'

 #==========================================================================
 # MAIN PROGRAM
 #==========================================================================
 def main (argv=None):
Comment 28 André Pedralho 2006-01-10 07:32:01 PST
(In reply to comment #0)
Someone could explain me how firefox does the parse of the history.dat? I know there is a parser included in the mozilla source code (mozilla/db/mork/), but I'm developing my own mozilla based browser and wanna use it. I've googled and just found some scripts in perl, but nothing about the mozilla parser.
Comment 29 Gromgull 2006-03-09 08:40:50 PST
To add to the fun i converted the python script I nicked from here to Java. Get it here: http://www.semikolon.co.uk/blog/index.php?entry=entry060309-160008
- Gunnar
Comment 30 Keith Anderson 2006-03-17 09:10:47 PST
Created attachment 215414 [details]
Utility (Win32) to export the HISTORY.DAT file to a tab delimited text file.

A quickly-written utility to export the HISTORY.DAT file to a tab-delimited text file.  (Win32)
Comment 31 alanjstr 2006-03-17 11:05:43 PST
What is the impact of Places on this?
Comment 32 Brett Wilson 2006-03-29 19:59:01 PST
I think places resolves this by replacing Mork.
Comment 33 alanjstr 2006-03-29 20:27:25 PST
I guess the future of this bug would be for Seamonkey.  Is there a more appropriate product now?
Comment 34 Chris Thomas (CTho) [formerly cst@andrew.cmu.edu cst@yecc.com] 2006-03-29 20:29:12 PST
This is a waste of our limited developer resources, in my opinion.
Comment 35 Vladimir Vukicevic [:vlad] [:vladv] 2006-03-29 21:16:09 PST
In that case...
Comment 36 Thomas Erskine 2006-03-30 03:41:52 PST
(In reply to comment #35)
> In that case...
> 

So.  What exactly is the "fix"?

Reading the last few comments, I have the impression that since "This is a waste of our limited developer resources" it has been decided to do nothing.  But then it shouldn't be marked "fixed".  Maybe the fix relates to "places", whatever that is.

Could comeone please add a comment telling what the fix was and how we will (in some future unspecified release) be able to parse bookmarks?

Thanks.
Comment 37 Adam Hauner 2006-03-30 03:46:56 PST
REOPEN to re-resolve.
Comment 38 Adam Hauner 2006-03-30 03:51:33 PST
-> WFM by Places for Firefox 2.0
-> WONTFIX for SeaMonkey
Comment 39 mozilla 2006-03-30 05:16:17 PST
(In reply to comment #36)
> So.  What exactly is the "fix"?

The new "Places" functionality in Firefox 2.0 has moved the storage of history to sqlLite instead of history.dat (mork format).  So now if you know industry standard sql, you should be able to easily query history data from Firefox. At least that is my understanding.
Comment 40 Will Sargent 2006-03-30 10:01:27 PST
(In reply to comment #39)
> (In reply to comment #36)
> > So.  What exactly is the "fix"?
> 
> The new "Places" functionality in Firefox 2.0 has moved the storage of history
> to sqlLite instead of history.dat (mork format).  So now if you know industry
> standard sql, you should be able to easily query history data from Firefox. At
> least that is my understanding.

Any chance of an example to verify/test this bug?
Comment 41 David Wolf 2007-08-12 22:45:15 PDT
I tried to follow the thread above, but got lost.  Has "history.dat" been cracked?  I would very much be interested in a utility that allows me to read/view the contents of the history.dat file.  I'm not a techy and wasn't able to follow the email thread and code above.  Is there any easy to run utility that i could use solve this problem?
Comment 42 Jacob 2007-08-13 00:50:33 PDT
Use http://www.jwz.org/hacks/mork.pl , which is mentioned in the original submission, usually works.

This is a perl program; you'll need perl to run it.  If you don't know what that means, you can start here: http://www.perl.com/download.csp

Comment 43 Will Sargent 2007-08-13 10:10:07 PDT
The ultimate solution to this bug is a complete reimplementation: in 3.0, History and bookmarks are implemented using a database:

http://developer.mozilla.org/en/docs/Places:History_Service

From previous comments, there is no fix scheduled for the current release branch.
Comment 44 Victor Bielawski 2008-05-20 17:10:16 PDT
RESOLVED WORKSFORME? or FIXED?
Comment 45 David Aznar Reguero 2008-08-28 04:43:47 PDT
Now that Firefox3 is saving history using SQLite (Bug 242207) it may be time to
close this bug.

Do you have any plans to purge the piece of **** of Mork from Mozilla codebase?
Comment 46 pass1 2008-08-28 17:05:22 PDT
If mork is still used for Address books in Thunderbird it might still be needed for that.

And if it is still used in TB then it needs be expunged with prejudice ASAP.
Comment 47 quartz12h 2008-08-28 17:07:51 PDT
Which first public release of seamonkey will use the core with the sqlite?
Is that a seamonkey 1.2, 2.0?
What's going on? Can't kill that beauty, can'ya?
Thanks!
Comment 48 David Aznar Reguero 2008-08-29 02:22:54 PDT
If any of you know the bug # to port Address books in Thunderbird from Mork to SQLite, please attach it here. Or should we open a general "Purge Mork" that depends on these bugs?

I would like to see Thunderbird Mork free so I can test it again if it would meet my needs and even start recommending it.
Comment 49 jwq 2008-09-04 05:05:08 PDT
(In reply to comment #48)
> If any of you know the bug # to port Address books in Thunderbird from Mork to
> SQLite, please attach it here.

 Bug 382876.

Note You need to log in before you can comment on or make changes to this bug.