Skip to content

Parser and creator for Netscape Bookmarks file format that is used when exporting bookmarks from browsers

License

Notifications You must be signed in to change notification settings

FlyingWolFox/Netscape-Bookmarks-File-Parser

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

39 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Netscape Bookmarks File Parser

This is a parser for the Netscape Bookmarks file format, which is generated by browsers when exporting bookmarks to html. It parses the file and delivers an object representing the file with the bookmark structure of folders and shortcuts as objects too. The folder tree can be navigated using the "." notation. It an also create a Netscape Bookmarks file.

Installation

Run this in your command line (might need to run as administrator on Windows):

pip install git+https://github.com/FlyingWolFox/Netscape-Bookmarks-File-Parser.git

To update add the --upgrade flag to the end

How to use

Import the classes and the parser:

from NetscapeBookmarksFileParser import *
from NetscapeBookmarksFileParser import parser   # if you want to parse a file

If you want to create a file, import the creator as well:

from NetscapeBookmarksFileParser import creator  # if you want to create a file

then:

bookmarks = NetscapeBookmarksFile(file).parse()

Where file is a string with the file contents or a file opened with open(), e.g.:

with open('bookmarks.html') as file:
    bookmarks = NetscapeBookmarksFile(file)

or:

with open('bookmarks.html') as file:
    bookmarks = NetscapeBookmarksFile(file.read())

If you want to create a file, create the bookmark structure and call create_file().

To know about the classes that the parser and the creator will work with, see the wiki Classes section.

Quick Example

from NetscapeBookmarksFileParser import *
from NetscapeBookmarksFileParser import parser

with open('bookmarks.html') as file:
    bookmarks = NetscapeBookmarksFile(file).parse()

root_folder = bookmarks.bookmarks
print(bookmarks.title)                           # print the file's title
print(root_folder.items[0].name)                 # print the name of the first item on the root folder
print(root_folder.shortcuts[1].href)             # print the url of the first shortcut on the root folder
print(root_folder.children[0].personal_toolbar)  # print if the first children folder is the Bookmarks Toolbar

Notice about this parser

The parser will play like a browser and will ignore most errors and warn for some missing tags. If a folder has an opening <body>, but no closing </body>, an exception will be raised. Since Netscape Bookmarks files are commonly generated by browsers when exporting bookmarks in html, these warnings and exceptions shouldn't be common. This parser was based on Microsoft's documentation on the Netscape File Format mainly, but also on file examples (here, here, here and here) and my own browser exports (test\test.html is one of them). Some more uncommon attributes and items might not be supported. See the Attributes Supported and Items Supported sections in the wiki. If you want to know more about what a file needs to have to be accepted by the parser, read the Netscape Bookmarks File Format page in the wiki.

Notice about the creator

The creator is the parser in reverse. If you parse a file and create it again, if all lines are valid, the files will be equal. You can see this with test/test.html and test/created_file.html. The first was parsed, then the creation process created the second. Look at the wiki Creator page to know more about the creator.

About legacy support

Due to the Netscape Bookmark file format not having an official standard, many things of this parser was got by file examples in the internet (see the Nestcape Bookmarks File format and The parser in the wiki). This has legacy support for some types of items that aren't in use today. These are:

  • Feed: Probably RSS feeds, just some attributes following the Microsoft's Documentation
  • Web Slices: "Live bookmarks". They showed a piece of the page you saved. Extinct but in the Microsoft's Documentation. If you want more details look at the Legacy section in the wiki

Help

  • If you would like to report a bug or ask a question please open an issue.
  • If you would like to help this project, you can open a Pull Request
  • If you want more information about this project, have a look at the wiki