Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Type hints: Parts of folders "vegalite", "v5", and "utils" #2976

Merged
merged 21 commits into from
Jun 18, 2023
Merged
Show file tree
Hide file tree
Changes from 18 commits
Commits
Show all changes
21 commits
Select commit Hold shift + click to select a range
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
96 changes: 55 additions & 41 deletions altair/utils/core.py
Original file line number Diff line number Diff line change
@@ -1,15 +1,16 @@
"""
Utility routines
"""
from collections.abc import Mapping
from collections.abc import Mapping, MutableMapping
from copy import deepcopy
import json
import itertools
import re
import sys
import traceback
import warnings
from typing import Callable, TypeVar, Any
from typing import Callable, TypeVar, Any, Union, Dict, Optional, Tuple, Sequence, Type
from types import ModuleType

import jsonschema
import pandas as pd
Expand All @@ -23,9 +24,9 @@
from typing_extensions import ParamSpec

if sys.version_info >= (3, 8):
from typing import Protocol
from typing import Literal, Protocol
else:
from typing_extensions import Protocol
from typing_extensions import Literal, Protocol

try:
from pandas.api.types import infer_dtype as _infer_dtype
Expand Down Expand Up @@ -200,7 +201,12 @@ def infer_dtype(value):
]


def infer_vegalite_type(data):
InferredVegaLiteType = Literal["ordinal", "nominal", "quantitative", "temporal"]
binste marked this conversation as resolved.
Show resolved Hide resolved


def infer_vegalite_type(
data: Union[np.ndarray, pd.Series]
Copy link
Contributor

@mattijn mattijn Jun 17, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Another question, here you mention: data: Union[np.ndarray, pd.Series]. When I look to where this is used: https://github.com/altair-viz/altair/blob/master/altair/vegalite/v5/api.py#L2411, I cannot understand why we need to be restrictive on the input types from numpy and pandas here. I really like types like DataFrameLike and SupportsGeoInterface for input data features (so we can be flexible on input and restrictive on output).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried to define a type hint named ArrayLike1D something that can serve for one-dimensional data of various forms, but not sure if this is possible yet (eg. python/mypy#12280).

Otherwise, maybe something like they define as type hint for the values of a polars series? There they define an ArrayLike type as such (see here:

ArrayLike = Union[
    Sequence[Any],
    "Series",
    "pa.Array",
    "pa.ChunkedArray",
    "np.ndarray",
    "pd.Series",
    "pd.DatetimeIndex",
]

And in the docstring for values of the series they mention:

values : ArrayLike, default None
    One-dimensional data in various forms. Supported are: Sequence, Series,
    pyarrow Array, and numpy ndarray.

I don't think this type-hint is strictly 1D as the docstring suggest, but it can serve as a reference.
Somehow wished it would be possible to include range as well (ref: #2877).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're right, that's too restrictive. I used the type hints which were already present in the docstring but seems like this function supports anything that the pandas function infer_type can handle. According to their own type hints this is all object so basically everything. I'd suggest that we type hint it the same to be consistent with pandas. Implemented in e974bc9

Btw, the function infer_vegalite_type is used only here. Your link points to the usage of infer_encoding_types.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That is flexible enough, thanks for the change.

) -> Union[InferredVegaLiteType, Tuple[InferredVegaLiteType, list]]:
"""
From an array-like input, infer the correct vega typecode
('ordinal', 'nominal', 'quantitative', or 'temporal')
Expand All @@ -220,8 +226,10 @@ def infer_vegalite_type(data):
"complex",
]:
return "quantitative"
elif typ == "categorical" and data.cat.ordered:
return ("ordinal", data.cat.categories.tolist())
# Can ignore error that np.ndarray has no attribute cat as in this case
# it should always be a pd.DataFrame anyway
elif typ == "categorical" and data.cat.ordered: # type: ignore[union-attr]
return ("ordinal", data.cat.categories.tolist()) # type: ignore[union-attr]
elif typ in ["string", "bytes", "categorical", "boolean", "mixed", "unicode"]:
return "nominal"
elif typ in [
Expand All @@ -243,7 +251,7 @@ def infer_vegalite_type(data):
return "nominal"


def merge_props_geom(feat):
def merge_props_geom(feat: dict) -> dict:
"""
Merge properties with geometry
* Overwrites 'type' and 'geometry' entries if existing
Expand All @@ -261,7 +269,7 @@ def merge_props_geom(feat):
return props_geom


def sanitize_geo_interface(geo):
def sanitize_geo_interface(geo: MutableMapping) -> dict:
"""Santize a geo_interface to prepare it for serialization.

* Make a copy
Expand All @@ -278,23 +286,23 @@ def sanitize_geo_interface(geo):
geo[key] = geo[key].tolist()

# convert (nested) tuples to lists
geo = json.loads(json.dumps(geo))
geo_dct: dict = json.loads(json.dumps(geo))

# sanitize features
if geo["type"] == "FeatureCollection":
geo = geo["features"]
if len(geo) > 0:
for idx, feat in enumerate(geo):
geo[idx] = merge_props_geom(feat)
elif geo["type"] == "Feature":
geo = merge_props_geom(geo)
if geo_dct["type"] == "FeatureCollection":
geo_dct = geo_dct["features"]
if len(geo_dct) > 0:
for idx, feat in enumerate(geo_dct):
geo_dct[idx] = merge_props_geom(feat)
elif geo_dct["type"] == "Feature":
geo_dct = merge_props_geom(geo_dct)
else:
geo = {"type": "Feature", "geometry": geo}
geo_dct = {"type": "Feature", "geometry": geo_dct}

return geo
return geo_dct


def sanitize_dataframe(df): # noqa: C901
def sanitize_dataframe(df: pd.DataFrame) -> pd.DataFrame: # noqa: C901
"""Sanitize a DataFrame to prepare it for serialization.

* Make a copy
Expand Down Expand Up @@ -433,13 +441,13 @@ def sanitize_arrow_table(pa_table):


def parse_shorthand(
shorthand,
data=None,
parse_aggregates=True,
parse_window_ops=False,
parse_timeunits=True,
parse_types=True,
):
shorthand: Union[Dict[str, Any], str],
data: Optional[pd.DataFrame] = None,
parse_aggregates: bool = True,
parse_window_ops: bool = False,
parse_timeunits: bool = True,
parse_types: bool = True,
) -> Dict[str, Any]:
"""General tool to parse shorthand values

These are of the form:
Expand Down Expand Up @@ -554,7 +562,9 @@ def parse_shorthand(
attrs = shorthand
else:
attrs = next(
exp.match(shorthand).groupdict() for exp in regexps if exp.match(shorthand)
exp.match(shorthand).groupdict() # type: ignore[union-attr]
for exp in regexps
if exp.match(shorthand) is not None
)

# Handle short form of the type expression
Expand Down Expand Up @@ -629,21 +639,23 @@ def decorate(f: Callable[..., _V]) -> Callable[_P, _V]:
return decorate


def update_nested(original, update, copy=False):
def update_nested(
original: MutableMapping, update: Mapping, copy: bool = False
) -> MutableMapping:
"""Update nested dictionaries

Parameters
----------
original : dict
original : MutableMapping
the original (nested) dictionary, which will be updated in-place
update : dict
update : Mapping
the nested dictionary of updates
copy : bool, default False
if True, then copy the original dictionary rather than modifying it

Returns
-------
original : dict
original : MutableMapping
a reference to the (modified) original dict

Examples
Expand All @@ -660,7 +672,7 @@ def update_nested(original, update, copy=False):
for key, val in update.items():
if isinstance(val, Mapping):
orig_val = original.get(key, {})
if isinstance(orig_val, Mapping):
if isinstance(orig_val, MutableMapping):
original[key] = update_nested(orig_val, val)
else:
original[key] = val
Expand All @@ -669,7 +681,7 @@ def update_nested(original, update, copy=False):
return original


def display_traceback(in_ipython=True):
def display_traceback(in_ipython: bool = True):
exc_info = sys.exc_info()

if in_ipython:
Expand All @@ -685,16 +697,16 @@ def display_traceback(in_ipython=True):
traceback.print_exception(*exc_info)


def infer_encoding_types(args, kwargs, channels):
def infer_encoding_types(args: Sequence, kwargs: MutableMapping, channels: ModuleType):
"""Infer typed keyword arguments for args and kwargs

Parameters
----------
args : tuple
List of function args
kwargs : dict
args : Sequence
Sequence of function args
kwargs : MutableMapping
Dict of function kwargs
channels : module
channels : ModuleType
The module containing all altair encoding channel classes.

Returns
Expand All @@ -709,8 +721,10 @@ def infer_encoding_types(args, kwargs, channels):
channel_objs = (
c for c in channel_objs if isinstance(c, type) and issubclass(c, SchemaBase)
)
channel_to_name = {c: c._encoding_name for c in channel_objs}
name_to_channel = {}
channel_to_name: Dict[Type[SchemaBase], str] = {
c: c._encoding_name for c in channel_objs
}
name_to_channel: Dict[str, Dict[str, Type[SchemaBase]]] = {}
for chan, name in channel_to_name.items():
chans = name_to_channel.setdefault(name, {})
if chan.__name__.endswith("Datum"):
Expand Down
Loading