Name	Name	Last commit message	Last commit date
Latest commit asottile Merge pull request #199 from asottile/pre-commit-ci-update-config Feb 17, 2025 82fd4e1 · Feb 17, 2025 History 413 Commits
.github/workflows	.github/workflows	upgrade asottile/workflows	Jan 30, 2025
testing/resources	testing/resources	drop python3.6 support	Jan 15, 2022
tests	tests	handle unicode named escapes in fstring components	Oct 22, 2024
.gitignore	.gitignore	remove unneeded gitignore lines	Mar 13, 2022
.pre-commit-config.yaml	.pre-commit-config.yaml	[pre-commit.ci] pre-commit autoupdate	Feb 17, 2025
LICENSE	LICENSE	Initial commit	Jun 2, 2017
README.md	README.md	azure pipelines -> github actions	Dec 29, 2022
requirements-dev.txt	requirements-dev.txt	improve coverage pragmas with covdefaults 2.1	Nov 30, 2021
setup.cfg	setup.cfg	v6.1.0	Oct 22, 2024
setup.py	setup.py	drop python3.6 support	Jan 15, 2022
tokenize_rt.py	tokenize_rt.py	handle unicode named escapes in fstring components	Oct 22, 2024
tox.ini	tox.ini	drop support for python 3.7	Jun 10, 2023

Repository files navigation

tokenize-rt

The stdlib tokenize module does not properly roundtrip. This wrapper around the stdlib provides two additional tokens ESCAPED_NL and UNIMPORTANT_WS, and a Token data type. Use src_to_tokens and tokens_to_src to roundtrip.

This library is useful if you're writing a refactoring tool based on the python tokenization.

Installation

pip install tokenize-rt

Usage

datastructures

`tokenize_rt.Offset(line=None, utf8_byte_offset=None)`

A token offset, useful as a key when cross referencing the ast and the tokenized source.

`tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)`

Construct a token

name: one of the token names listed in token.tok_name or ESCAPED_NL or UNIMPORTANT_WS
src: token's source as text
line: the line number that this token appears on.
utf8_byte_offset: the utf8 byte offset that this token appears on in the line.

`tokenize_rt.Token.offset`

Retrieves an Offset for this token.

converting to and from `Token` representations

`tokenize_rt.src_to_tokens(text: str) -> List[Token]`

`tokenize_rt.tokens_to_src(Iterable[Token]) -> str`

additional tokens added by `tokenize-rt`

`tokenize_rt.ESCAPED_NL`

`tokenize_rt.UNIMPORTANT_WS`

helpers

`tokenize_rt.NON_CODING_TOKENS`

A frozenset containing tokens which may appear between others while not affecting control flow or code:

COMMENT
ESCAPED_NL
NL
UNIMPORTANT_WS

`tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]`

parse a string literal into its prefix and string content

>>> parse_string_literal('f"foo"')
('f', '"foo"')

`tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]`

yields (index, token) pairs. Useful for rewriting source.

`tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]`

find the indices of the string parts of a (joined) string literal

i should start at the end of the string literal
returns () (an empty tuple) for things which are not string literals

>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)

Differences from `tokenize`

tokenize-rt adds ESCAPED_NL for a backslash-escaped newline "token"
tokenize-rt adds UNIMPORTANT_WS for whitespace (discarded in tokenize)
tokenize-rt normalizes string prefixes, even if they are not parsed -- for instance, this means you'll see Token('STRING', "f'foo'", ...) even in python 2.
tokenize-rt normalizes python 2 long literals (4l / 4L) and octal literals (0755) in python 3 (for easier rewriting of python 2 code while running python 3).

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tokenize-rt

Installation

Usage

datastructures

`tokenize_rt.Offset(line=None, utf8_byte_offset=None)`

`tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)`

`tokenize_rt.Token.offset`

converting to and from `Token` representations

`tokenize_rt.src_to_tokens(text: str) -> List[Token]`

`tokenize_rt.tokens_to_src(Iterable[Token]) -> str`

additional tokens added by `tokenize-rt`

`tokenize_rt.ESCAPED_NL`

`tokenize_rt.UNIMPORTANT_WS`

helpers

`tokenize_rt.NON_CODING_TOKENS`

`tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]`

`tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]`

`tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]`

Differences from `tokenize`

Sample usage

About

Releases

Sponsor this project

Packages

Contributors 3

Languages

License

asottile/tokenize-rt

Folders and files

Latest commit

History

Repository files navigation

tokenize-rt

Installation

Usage

datastructures

tokenize_rt.Offset(line=None, utf8_byte_offset=None)

tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)

tokenize_rt.Token.offset

converting to and from Token representations

tokenize_rt.src_to_tokens(text: str) -> List[Token]

tokenize_rt.tokens_to_src(Iterable[Token]) -> str

additional tokens added by tokenize-rt

tokenize_rt.ESCAPED_NL

tokenize_rt.UNIMPORTANT_WS

helpers

tokenize_rt.NON_CODING_TOKENS

tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]

tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]

tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]

Differences from tokenize

Sample usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 3

Languages

`tokenize_rt.Offset(line=None, utf8_byte_offset=None)`

`tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)`

`tokenize_rt.Token.offset`

converting to and from `Token` representations

`tokenize_rt.src_to_tokens(text: str) -> List[Token]`

`tokenize_rt.tokens_to_src(Iterable[Token]) -> str`

additional tokens added by `tokenize-rt`

`tokenize_rt.ESCAPED_NL`

`tokenize_rt.UNIMPORTANT_WS`

`tokenize_rt.NON_CODING_TOKENS`

`tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]`

`tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]`

`tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]`

Differences from `tokenize`

Packages