Skip to content

Latest commit

 

History

History
100 lines (76 loc) · 2.63 KB

README.md

File metadata and controls

100 lines (76 loc) · 2.63 KB

can_ada

[Fast] Python bindings for Ada, a fast and WHATWG spec-compliant URL parser. This is the URL parser used in projects like Node.js.

Installation

pip install can_ada

Binary wheels are available for most platforms. If not available, a C++17-or-greater compiler will be required to build the underlying Ada library.

WHATWG URL compliance

Unlike the standard library's urllib.parse module, this library is compliant with the WHATWG URL specification.

import can_ada
urlstring = "https://www.GOoglé.com/./path/../path2/"
url = can_ada.parse(urlstring)
# prints www.xn--googl-fsa.com, the correctly parsed domain name according
# to WHATWG
print(url.hostname)
# prints /path2/, which is the correctly parsed pathname according to WHATWG
print(url.pathname)

import urllib.parse
urlstring = "https://www.GOoglé.com/./path/../path2/"
url = urllib.parse.urlparse(urlstring)
# prints www.googlé.com
print(url.hostname)
# prints /./path/../path2/
print(url.path)

Usage

Parsing is simple:

from can_ada import parse

url = parse("https://tkte.ch/search?q=canada")
print(url.protocol) # https:
print(url.host) # tkte.ch
print(url.pathname) # /search
print(url.search) # ?q=canada

You can also modify URLs:

from can_ada import parse

url = parse("https://tkte.ch/search?q=canada")
url.host = "google.com"
url.search = "?q=canada&safe=off"
print(url) # https://google.com/search?q=canada&safe=off

can_ada also supports the URLSearchParams API:

from can_ada import URLSearchParams

params = URLSearchParams("q=canada&safe=off")
params.append("page", "2")
params.append("page", "3")
params["q"] = "usa"
print(params) # q=usa&safe=off&page=2&page=3
print(params.has("q")) # True
print(params.get("page")) # 2
print(params.get_all("page")) # [2, 3]
print(params.keys()) # ["q", "safe", "page"]
print(params.values()) # ["usa", "off", "2", "3"]

Performance

We find that can_ada is typically ~4x faster than urllib:

---------------------------------------------------------------------------------
Name (time in ms)              Min                 Max                Mean       
---------------------------------------------------------------------------------
test_can_ada_parse         54.1304 (1.0)       54.6734 (1.0)       54.3699 (1.0) 
test_ada_python_parse     107.5653 (1.99)     108.1666 (1.98)     107.7817 (1.98)
test_urllib_parse         251.5167 (4.65)     255.1327 (4.67)     253.2407 (4.66)
---------------------------------------------------------------------------------

To run the benchmarks locally, use:

pytest --runslow