Closed Bug 1192034 Opened 5 years ago Closed 4 years ago

Import history from Microsoft Edge

Categories

(Firefox :: Migration, defect, P2)

defect

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox41 + wontfix
firefox42 + affected

People

(Reporter: Dolske, Unassigned)

References

(Blocks 1 open bug)

Details

We need a migrator that can import a user's existing Edge history.
This seems like it might be difficult.

I've found that the registry contains typedUrls entries like it did for IE, but in a different location (similar to favorites). That's the good news.

The bad news is that I'm stumped trying to figure out where actual history goes. IE had APIs for that, and used index.dat files somewhere. We used the APIs. Trying to import history from IE on Firefox on Win10 right now uses those APIs, and they don't seem to have Edge's history data. Searching the web hasn't really told me anything about where the rest of the history is hiding... Using process monitor is too noisy - Edge seems to access the harddrive and registry repeatedly on the same keys, and generates many thousands of entries - searching for something that mentions "history" doesn't find anything. Going to keep looking for a bit, but have I've lost some hope since I started...
How about we just ask our Microsoft contact and see what they recommend?
Priority: -- → P2
[Tracking Requested - why for this release]: It's on our Win 10 list for 41 but we might need help from MS on this one so will probably move out. I will nom for tracking anyhow.
[Tracking Requested - why for this release]:

Tracked for 41 and nom'ing for 42 as well.
It is getting too late to fix this in FF41 given that we don't have a patch ready or in the works. This does not seem to be a release blocker for 41.
Right. So it seems like this stuff is still in the WebCache database, which is stored at:

%LOCALAPPDATA%\Microsoft\Windows\WebCache

It seems to be permanently locked, at least on Win10, so trying to use esentutl to dump stuff from it as-is doesn't work. Specifically:

esentutl /mh WebCacheV01.dat

> Error: Access to source database 'WebCacheV01.data' failed with Jet error -1032.
> 
> Operation terminated with error -1032 (JET_errFileAccessDenied, Cannot access file, the file is locked or in use) after 0.16 seconds.

The internet suggests ( http://blogs.msdn.com/b/martinc/archive/2015/02/11/using-esentutl-exe-vss-to-examine-an-in-use-database.aspx ) to make a Volume Shadow Service copy in order to read the database. Doing this from a normal user level produces:

esentutl /mh WebCacheV01.dat /vss

> VSS Subsystem Init failed, 0x80070005
> Operation terminated with error -2403 (JET_errOSSnapshotNotAllowed, OS Shadow copy not allowed (backup or recovery in progress)) after 0.15 seconds.


Reading:

http://blog.nirsoft.net/2012/12/08/a-few-words-about-the-cache-history-on-internet-explorer-10/

I tried again from a cmd prompt with admin privileges. This worked! Dumping table information failed because the database was not closed (or something, I didn't save the error message).

I ended up using the following magic incantation:

esentutl /y WebCacheV01.dat /vssrec V01 . /d destination\path\for\copy.dat

to make a copy that replayed the logs into the DB, which led to a successfully closed DB. This still uses the VSS so it presumably wouldn't work without admin privileged. In any case, it produced a copy that ESEDatabaseView was happy to read (see http://www.nirsoft.net/utils/ese_database_view.html ).

For the structure of the database, see http://articles.forensicfocus.com/2013/12/10/forensic-analysis-of-the-ese-database-in-internet-explorer-10/ . Basically, there is a "Containers" table that maps different Container_NNN tables to names like "History", "Cookies" etc.

This also lists a file path. According to the Forensics article, these are "source" paths, but unfortunately, it seems while there are "History" folders in AC/(#!001|#!002|)/MicrosoftEdge folders in the Microsoft.MicrosoftEdge<blahblahblah> folder in %LOCALAPPDATA%\Packages.... they seem to all be empty? That is, when looking with dir /aHS (showing hidden & system files) all I see is empty container.dat files.

There are several "History" tables in this database. It seems like some are specific to IE/explorer.exe, and some specific to Edge. They do contain expected data, which is confirmation this is the right place to look... however:

1) I see no way for us to do what I did because:
1a) there is no way to guarantee we even have admin rights, and
1b) copying a 60mb table seems a wildly inefficient and error-prone way to access a small fraction of the database inside it.
2) It seems like InPrivate data ends up in this DB too... but it is not clear how to distinguish this data from the non-InPrivate data. So we might end up "upgrading" InPrivate history data to "normal" history data, which would obviously be serious and wrong etc. etc.



A potential workaround is described here: http://blog.nirsoft.net/2013/05/02/improved-solution-for-reading-the-history-of-internet-explorer-10/

This basically boils down to finding the file handle to the DB held by another process (http://forum.sysinternals.com/howto-enumerate-handles_topic18892.html), then duplicating that handle and using it to copy the DB elsewhere (a naive copy fails because the file is in use), recovering the logs there, and then reading the database.

All in all, I think the effort+risk/reward there is too high to try to do this at the moment. I will instead put up a minimal patch that reads the typed URLs from the registry so we at least migrate those.
(In reply to :Gijs Kruitbosch from comment #6)

> All in all, I think the effort+risk/reward there is too high to try to do
> this at the moment. I will instead put up a minimal patch that reads the
> typed URLs from the registry so we at least migrate those.

Thanks for the great analysis. I'd be curious to know what :jimm thinks, but it does sound like this would require a big pile of fragile hacks, and so isn't worth it.

Let's do the typed-URLs import in a separate bug, and just wontfix this one? That way there's a clear history of why we can't fix this particular aspect.
Depends on: 1205053
(In reply to Justin Dolske [:Dolske] from comment #7)
> (In reply to :Gijs Kruitbosch from comment #6)
> 
> > All in all, I think the effort+risk/reward there is too high to try to do
> > this at the moment. I will instead put up a minimal patch that reads the
> > typed URLs from the registry so we at least migrate those.
> 
> Thanks for the great analysis. I'd be curious to know what :jimm thinks,

Jim, do you have thoughts re comment #6 ?

> but
> it does sound like this would require a big pile of fragile hacks, and so
> isn't worth it.
> 
> Let's do the typed-URLs import in a separate bug,

Filed bug 1205053.

> and just wontfix this one?

I dunno, do we actively want to say we wouldn't take a patch? I was mostly saying "I don't think it's a good use of my time to try to construct all these fragile hacks", because I expect I would need a fulltime week (maybe several) to write it. I'm somewhat open to a contributed patch if it treads very carefully. Considering what we do to DLL loading on Windows with under-documented hooks and so on, this thing strikes me as "hacky, but not completely insane considering the other stuff we do". I just don't think that right now, I can justify the investment of figuring all the hacks out myself and writing the code.
Flags: needinfo?(jmathies)
Flags: needinfo?(dolske)
> 1) I see no way for us to do what I did because:
> 1a) there is no way to guarantee we even have admin rights, and

helper.exe can do this, but it would have to prompt for elevation. On some systems users don't have the right to elevate so it wouldn't work for everyone. 

> 1b) copying a 60mb table seems a wildly inefficient and error-prone way to
> access a small fraction of the database inside it.

Agree, thumbs down on large file copies just for history data. If there were an easy way to get at it, great. I don't see history data as being very critical.

> 2) It seems like InPrivate data ends up in this DB too... but it is not
> clear how to distinguish this data from the non-InPrivate data. So we might
> end up "upgrading" InPrivate history data to "normal" history data, which
> would obviously be serious and wrong etc. etc.

If this is true then we really don't want to mess with it.

> A potential workaround is described here:
> http://blog.nirsoft.net/2013/05/02/improved-solution-for-reading-the-history-
> of-internet-explorer-10/
> 
> This basically boils down to finding the file handle to the DB held by
> another process
> (http://forum.sysinternals.com/howto-enumerate-handles_topic18892.html),
> then duplicating that handle and using it to copy the DB elsewhere (a naive
> copy fails because the file is in use), recovering the logs there, and then
> reading the database.

yuck, thumbs down here too imo.
Flags: needinfo?(jmathies)
mmk, per comment #9, going to mark this wontfix, and we can reopen if/when a better way to get to this stuff emerges.
Status: NEW → RESOLVED
Closed: 4 years ago
Flags: needinfo?(dolske)
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.