Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Exhausted inference from reused context in 3.0.0a6 #2237

Closed
jacobtylerwalls opened this issue Jul 4, 2023 · 1 comment · Fixed by #2238
Closed

Exhausted inference from reused context in 3.0.0a6 #2237

jacobtylerwalls opened this issue Jul 4, 2023 · 1 comment · Fixed by #2238
Assignees
Milestone

Comments

@jacobtylerwalls
Copy link
Member

Steps to reproduce

pylint tests/.pylint_primer_tests/pandas-dev/pandas/pandas/tests/arrays/test_array.py

Bug description

When parsing the following a.py:

import datetime
import decimal

import numpy as np
import pytest
import pytz

import pandas as pd
import pandas._testing as tm
from pandas.api.extensions import register_extension_dtype
from pandas.arrays import (
    BooleanArray,
    DatetimeArray,
    FloatingArray,
    IntegerArray,
    IntervalArray,
    SparseArray,
    TimedeltaArray,
)
from pandas.core.arrays import (
    PandasArray,
    period_array,
)
from pandas.tests.extension.decimal import (
    DecimalArray,
    DecimalDtype,
    to_decimal,
)


@pytest.mark.parametrize(
    "data, dtype, expected",
    [
        # Basic NumPy defaults.
        ([1, 2], None, IntegerArray._from_sequence([1, 2])),
        ([1, 2], object, PandasArray(np.array([1, 2], dtype=object))),
        (
            [1, 2],
            np.dtype("float32"),
            PandasArray(np.array([1.0, 2.0], dtype=np.dtype("float32"))),
        ),
        (np.array([1, 2], dtype="int64"), None, IntegerArray._from_sequence([1, 2])),
        (
            np.array([1.0, 2.0], dtype="float64"),
            None,
            FloatingArray._from_sequence([1.0, 2.0]),
        ),
        # String alias passes through to NumPy
        ([1, 2], "float32", PandasArray(np.array([1, 2], dtype="float32"))),
        ([1, 2], "int64", PandasArray(np.array([1, 2], dtype=np.int64))),
        # GH#44715 FloatingArray does not support float16, so fall back to PandasArray
        (
            np.array([1, 2], dtype=np.float16),
            None,
            PandasArray(np.array([1, 2], dtype=np.float16)),
        ),
        # idempotency with e.g. pd.array(pd.array([1, 2], dtype="int64"))
        (
            PandasArray(np.array([1, 2], dtype=np.int32)),
            None,
            PandasArray(np.array([1, 2], dtype=np.int32)),
        ),
        # Period alias
        (
            [pd.Period("2000", "D"), pd.Period("2001", "D")],
            "Period[D]",
            period_array(["2000", "2001"], freq="D"),
        ),
        # Period dtype
        (
            [pd.Period("2000", "D")],
            pd.PeriodDtype("D"),
            period_array(["2000"], freq="D"),
        ),
        # Datetime (naive)
        (
            [1, 2],
            np.dtype("datetime64[ns]"),
            DatetimeArray._from_sequence(np.array([1, 2], dtype="datetime64[ns]")),
        ),
        (
            [1, 2],
            np.dtype("datetime64[s]"),
            DatetimeArray._from_sequence(np.array([1, 2], dtype="datetime64[s]")),
        ),
        (
            np.array([1, 2], dtype="datetime64[ns]"),
            None,
            DatetimeArray._from_sequence(np.array([1, 2], dtype="datetime64[ns]")),
        ),
        (
            pd.DatetimeIndex(["2000", "2001"]),
            np.dtype("datetime64[ns]"),
            DatetimeArray._from_sequence(["2000", "2001"]),
        ),
        (
            pd.DatetimeIndex(["2000", "2001"]),
            None,
            DatetimeArray._from_sequence(["2000", "2001"]),
        ),
        (
            ["2000", "2001"],
            np.dtype("datetime64[ns]"),
            DatetimeArray._from_sequence(["2000", "2001"]),
        ),
        # Datetime (tz-aware)
        (
            ["2000", "2001"],
            pd.DatetimeTZDtype(tz="CET"),
            DatetimeArray._from_sequence(
                ["2000", "2001"], dtype=pd.DatetimeTZDtype(tz="CET")
            ),
        ),
        # Timedelta
        (
            ["1H", "2H"],
            np.dtype("timedelta64[ns]"),
            TimedeltaArray._from_sequence(["1H", "2H"]),
        ),
        (
            pd.TimedeltaIndex(["1H", "2H"]),
            np.dtype("timedelta64[ns]"),
            TimedeltaArray._from_sequence(["1H", "2H"]),
        ),
        (
            np.array([1, 2], dtype="m8[s]"),
            np.dtype("timedelta64[s]"),
            TimedeltaArray._from_sequence(np.array([1, 2], dtype="m8[s]")),
        ),
        (
            pd.TimedeltaIndex(["1H", "2H"]),
            None,
            TimedeltaArray._from_sequence(["1H", "2H"]),
        ),
        (
            # preserve non-nano, i.e. don't cast to PandasArray
            TimedeltaArray._simple_new(
                np.arange(5, dtype=np.int64).view("m8[s]"), dtype=np.dtype("m8[s]")
            ),
            None,
            TimedeltaArray._simple_new(
                np.arange(5, dtype=np.int64).view("m8[s]"), dtype=np.dtype("m8[s]")
            ),
        ),
        (
            # preserve non-nano, i.e. don't cast to PandasArray
            TimedeltaArray._simple_new(
                np.arange(5, dtype=np.int64).view("m8[s]"), dtype=np.dtype("m8[s]")
            ),
            np.dtype("m8[s]"),
            TimedeltaArray._simple_new(
                np.arange(5, dtype=np.int64).view("m8[s]"), dtype=np.dtype("m8[s]")
            ),
        ),
        # Category
        (["a", "b"], "category", pd.Categorical(["a", "b"])),
        (
            ["a", "b"],
            pd.CategoricalDtype(None, ordered=True),
            pd.Categorical(["a", "b"], ordered=True),
        ),
        # Interval
        (
            [pd.Interval(1, 2), pd.Interval(3, 4)],
            "interval",
            IntervalArray.from_tuples([(1, 2), (3, 4)]),
        ),
        # Sparse
        ([0, 1], "Sparse[int64]", SparseArray([0, 1], dtype="int64")),
        # IntegerNA
        ([1, None], "Int16", pd.array([1, None], dtype="Int16")),
        (pd.Series([1, 2]), None, PandasArray(np.array([1, 2], dtype=np.int64))),
        # String
        (
            ["a", None],
            "string",
            pd.StringDtype().construct_array_type()._from_sequence(["a", None]),
        ),
        (
            ["a", None],
            pd.StringDtype(),
            pd.StringDtype().construct_array_type()._from_sequence(["a", None]),
        ),
        # Boolean
        ([True, None], "boolean", BooleanArray._from_sequence([True, None])),
        ([True, None], pd.BooleanDtype(), BooleanArray._from_sequence([True, None])),
        # Index
        (pd.Index([1, 2]), None, PandasArray(np.array([1, 2], dtype=np.int64))),
        # Series[EA] returns the EA
        (
            pd.Series(pd.Categorical(["a", "b"], categories=["a", "b", "c"])),
            None,
            pd.Categorical(["a", "b"], categories=["a", "b", "c"]),
        ),
        # "3rd party" EAs work
        ([decimal.Decimal(0), decimal.Decimal(1)], "decimal", to_decimal([0, 1])),
        # pass an ExtensionArray, but a different dtype
        (
            period_array(["2000", "2001"], freq="D"),
            "category",
            pd.Categorical([pd.Period("2000", "D"), pd.Period("2001", "D")]),
        ),
    ],
)
def test_array(data, dtype, expected):
    result = pd.array(data, dtype=dtype)
    tm.assert_equal(result, expected)


def test_array_copy():
    a = np.array([1, 2])
    # default is to copy
    b = pd.array(a, dtype=a.dtype)
    assert not tm.shares_memory(a, b)

    # copy=True
    b = pd.array(a, dtype=a.dtype, copy=True)
    assert not tm.shares_memory(a, b)

    # copy=False
    b = pd.array(a, dtype=a.dtype, copy=False)
    assert tm.shares_memory(a, b)


cet = pytz.timezone("CET")


@pytest.mark.parametrize(
    "data, expected",
    [
        # period
        (
            [pd.Period("2000", "D"), pd.Period("2001", "D")],
            period_array(["2000", "2001"], freq="D"),
        ),
        # interval
        ([pd.Interval(0, 1), pd.Interval(1, 2)], IntervalArray.from_breaks([0, 1, 2])),
        # datetime
        (
            [pd.Timestamp("2000"), pd.Timestamp("2001")],
            DatetimeArray._from_sequence(["2000", "2001"]),
        ),
        (
            [datetime.datetime(2000, 1, 1), datetime.datetime(2001, 1, 1)],
            DatetimeArray._from_sequence(["2000", "2001"]),
        ),
        (
            np.array([1, 2], dtype="M8[ns]"),
            DatetimeArray(np.array([1, 2], dtype="M8[ns]")),
        ),
        (
            np.array([1, 2], dtype="M8[us]"),
            DatetimeArray._simple_new(
                np.array([1, 2], dtype="M8[us]"), dtype=np.dtype("M8[us]")
            ),
        ),
        # datetimetz
        (
            [pd.Timestamp("2000", tz="CET"), pd.Timestamp("2001", tz="CET")],
            DatetimeArray._from_sequence(
                ["2000", "2001"], dtype=pd.DatetimeTZDtype(tz="CET")
            ),
        ),
        (
            [
                datetime.datetime(2000, 1, 1, tzinfo=cet),
                datetime.datetime(2001, 1, 1, tzinfo=cet),
            ],
            DatetimeArray._from_sequence(
                ["2000", "2001"], dtype=pd.DatetimeTZDtype(tz=cet)
            ),
        ),
        # timedelta
        (
            [pd.Timedelta("1H"), pd.Timedelta("2H")],
            TimedeltaArray._from_sequence(["1H", "2H"]),
        ),
        (
            np.array([1, 2], dtype="m8[ns]"),
            TimedeltaArray(np.array([1, 2], dtype="m8[ns]")),
        ),
        (
            np.array([1, 2], dtype="m8[us]"),
            TimedeltaArray(np.array([1, 2], dtype="m8[us]")),
        ),
        # integer
        ([1, 2], IntegerArray._from_sequence([1, 2])),
        ([1, None], IntegerArray._from_sequence([1, None])),
        ([1, pd.NA], IntegerArray._from_sequence([1, pd.NA])),
        ([1, np.nan], IntegerArray._from_sequence([1, np.nan])),
        # float
        ([0.1, 0.2], FloatingArray._from_sequence([0.1, 0.2])),
        ([0.1, None], FloatingArray._from_sequence([0.1, pd.NA])),
        ([0.1, np.nan], FloatingArray._from_sequence([0.1, pd.NA])),
        ([0.1, pd.NA], FloatingArray._from_sequence([0.1, pd.NA])),
        # integer-like float
        ([1.0, 2.0], FloatingArray._from_sequence([1.0, 2.0])),
        ([1.0, None], FloatingArray._from_sequence([1.0, pd.NA])),
        ([1.0, np.nan], FloatingArray._from_sequence([1.0, pd.NA])),
        ([1.0, pd.NA], FloatingArray._from_sequence([1.0, pd.NA])),
        # mixed-integer-float
        ([1, 2.0], FloatingArray._from_sequence([1.0, 2.0])),
        ([1, np.nan, 2.0], FloatingArray._from_sequence([1.0, None, 2.0])),
        # string
        (
            ["a", "b"],
            pd.StringDtype().construct_array_type()._from_sequence(["a", "b"]),
        ),
        (
            ["a", None],
            pd.StringDtype().construct_array_type()._from_sequence(["a", None]),
        ),
        # Boolean
        ([True, False], BooleanArray._from_sequence([True, False])),
        ([True, None], BooleanArray._from_sequence([True, None])),
    ],
)
def test_array_inference(data, expected):
    result = pd.array(data)
    tm.assert_equal(result, expected)


@pytest.mark.parametrize(
    "data",
    [
        # mix of frequencies
        [pd.Period("2000", "D"), pd.Period("2001", "A")],
        # mix of closed
        [pd.Interval(0, 1, closed="left"), pd.Interval(1, 2, closed="right")],
        # Mix of timezones
        [pd.Timestamp("2000", tz="CET"), pd.Timestamp("2000", tz="UTC")],
        # Mix of tz-aware and tz-naive
        [pd.Timestamp("2000", tz="CET"), pd.Timestamp("2000")],
        np.array([pd.Timestamp("2000"), pd.Timestamp("2000", tz="CET")]),
    ],
)
def test_array_inference_fails(data):
    result = pd.array(data)
    expected = PandasArray(np.array(data, dtype=object))
    tm.assert_extension_array_equal(result, expected)


@pytest.mark.parametrize("data", [np.array(0)])
def test_nd_raises(data):
    with pytest.raises(ValueError, match="PandasArray must be 1-dimensional"):
        pd.array(data, dtype="int64")


def test_scalar_raises():
    with pytest.raises(ValueError, match="Cannot pass scalar '1'"):
        pd.array(1)


def test_dataframe_raises():
    # GH#51167 don't accidentally cast to StringArray by doing inference on columns
    df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
    msg = "Cannot pass DataFrame to 'pandas.array'"
    with pytest.raises(TypeError, match=msg):
        pd.array(df)


def test_bounds_check():
    # GH21796
    with pytest.raises(
        TypeError, match=r"cannot safely cast non-equivalent int(32|64) to uint16"
    ):
        pd.array([-1, 2, 3], dtype="UInt16")


# ---------------------------------------------------------------------------
# A couple dummy classes to ensure that Series and Indexes are unboxed before
# getting to the EA classes.


@register_extension_dtype
class DecimalDtype2(DecimalDtype):
    name = "decimal2"

    @classmethod
    def construct_array_type(cls):
        """
        Return the array type associated with this dtype.

        Returns
        -------
        type
        """
        return DecimalArray2


class DecimalArray2(DecimalArray):
    @classmethod
    def _from_sequence(cls, scalars, dtype=None, copy=False):
        if isinstance(scalars, (pd.Series, pd.Index)):
            raise TypeError("scalars should not be of type pd.Series or pd.Index")

        return super()._from_sequence(scalars, dtype=dtype, copy=copy)


def test_array_unboxes(index_or_series):
    box = index_or_series

    data = box([decimal.Decimal("1"), decimal.Decimal("2")])
    # make sure it works
    with pytest.raises(
        TypeError, match="scalars should not be of type pd.Series or pd.Index"
    ):
        DecimalArray2._from_sequence(data)

    result = pd.array(data, dtype="decimal2")
    expected = DecimalArray2._from_sequence(data.values)
    tm.assert_equal(result, expected)


def test_array_to_numpy_na():
    # GH#40638
    arr = pd.array([pd.NA, 1], dtype="string")
    result = arr.to_numpy(na_value=True, dtype=bool)
    expected = np.array([True, True])
    tm.assert_numpy_array_equal(result, expected)

Command used

pylint a.py

Pylint output

pylint crashed with a ``AstroidError`` and with the following stacktrace:
Traceback (most recent call last):
  File "/Users/.../pylint/pylint/lint/pylinter.py", line 788, in _lint_file
    check_astroid_module(module)
  File "/Users/.../pylint/pylint/lint/pylinter.py", line 1017, in check_astroid_module
    retval = self._check_astroid_module(
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/.../pylint/pylint/lint/pylinter.py", line 1063, in _check_astroid_module
    walker.walk(node)
  File "/Users/.../pylint/pylint/utils/ast_walker.py", line 94, in walk
    self.walk(child)
  File "/Users/.../pylint/pylint/utils/ast_walker.py", line 94, in walk
    self.walk(child)
  File "/Users/.../pylint/pylint/utils/ast_walker.py", line 94, in walk
    self.walk(child)
  File "/Users/.../pylint/pylint/utils/ast_walker.py", line 91, in walk
    callback(astroid)
  File "/Users/.../pylint/pylint/checkers/typecheck.py", line 1948, in visit_unaryop
    for error in node.type_errors():
                 ^^^^^^^^^^^^^^^^^^
  File "/Users/.../astroid/astroid/nodes/node_classes.py", line 4241, in type_errors
    return [
           ^
  File "/Users/.../astroid/astroid/nodes/node_classes.py", line 4241, in <listcomp>
    return [
           ^
  File "/Users/.../astroid/astroid/nodes/node_classes.py", line 4266, in _infer_unaryop
    for operand in self.operand.infer(context):
  File "/Users/.../astroid/astroid/nodes/node_ng.py", line 166, in infer
    yield from self._infer(context=context, **kwargs)
  File "/Users/.../astroid/astroid/decorators.py", line 103, in inner
    yield from generator
  File "/Users/.../astroid/astroid/decorators.py", line 49, in wrapped
    for res in _func(node, context, **kwargs):
  File "/Users/.../astroid/astroid/nodes/node_classes.py", line 1723, in _infer
    yield from callee.infer_call_result(
  File "/Users/.../astroid/astroid/nodes/scoped_nodes/scoped_nodes.py", line 1658, in infer_call_result
    yield from returnnode.value.infer(context)
  File "/Users/.../astroid/astroid/nodes/node_ng.py", line 179, in infer
    for i, result in enumerate(self._infer(context=context, **kwargs)):
  File "/Users/.../astroid/astroid/decorators.py", line 103, in inner
    yield from generator
  File "/Users/.../astroid/astroid/decorators.py", line 49, in wrapped
    for res in _func(node, context, **kwargs):
  File "/Users/.../astroid/astroid/nodes/node_classes.py", line 1723, in _infer
    yield from callee.infer_call_result(
  File "/Users/.../astroid/astroid/nodes/scoped_nodes/scoped_nodes.py", line 1658, in infer_call_result
    yield from returnnode.value.infer(context)
  File "/Users/.../astroid/astroid/nodes/node_ng.py", line 179, in infer
    for i, result in enumerate(self._infer(context=context, **kwargs)):
  File "/Users/.../astroid/astroid/decorators.py", line 103, in inner
    yield from generator
  File "/Users/.../astroid/astroid/decorators.py", line 49, in wrapped
    for res in _func(node, context, **kwargs):
  File "/Users/.../astroid/astroid/nodes/node_classes.py", line 1723, in _infer
    yield from callee.infer_call_result(
  File "/Users/.../astroid/astroid/nodes/scoped_nodes/scoped_nodes.py", line 1658, in infer_call_result
    yield from returnnode.value.infer(context)
  File "/Users/.../astroid/astroid/nodes/node_ng.py", line 179, in infer
    for i, result in enumerate(self._infer(context=context, **kwargs)):
  File "/Users/.../astroid/astroid/decorators.py", line 103, in inner
    yield from generator
  File "/Users/.../astroid/astroid/decorators.py", line 49, in wrapped
    for res in _func(node, context, **kwargs):
  File "/Users/.../astroid/astroid/nodes/node_classes.py", line 1723, in _infer
    yield from callee.infer_call_result(
  File "/Users/.../astroid/astroid/nodes/scoped_nodes/scoped_nodes.py", line 1658, in infer_call_result
    yield from returnnode.value.infer(context)
  File "/Users/.../astroid/astroid/nodes/node_ng.py", line 179, in infer
    for i, result in enumerate(self._infer(context=context, **kwargs)):
  File "/Users/.../astroid/astroid/nodes/node_classes.py", line 1875, in _infer
    lhs = list(left_node.infer(context=context))
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/.../astroid/astroid/nodes/node_ng.py", line 179, in infer
    for i, result in enumerate(self._infer(context=context, **kwargs)):
  File "/Users/.../astroid/astroid/decorators.py", line 49, in wrapped
    for res in _func(node, context, **kwargs):
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Users/.../astroid/astroid/nodes/node_classes.py", line 585, in _infer
    frame, stmts = self.lookup(self.name)
                   ^^^^
  File "/Users/.../astroid/astroid/nodes/node_classes.py", line 585, in _infer
    frame, stmts = self.lookup(self.name)
                   ^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/bdb.py", line 90, in trace_dispatch
    return self.dispatch_line(frame)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/Library/Frameworks/Python.framework/Versions/3.11/lib/python3.11/bdb.py", line 115, in dispatch_line
    if self.quitting: raise BdbQuit
                      ^^^^^^^^^^^^^
bdb.BdbQuit

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/Users/.../pylint/pylint/lint/pylinter.py", line 752, in _lint_files
    self._lint_file(fileitem, module, check_astroid_module)
  File "/Users/.../pylint/pylint/lint/pylinter.py", line 790, in _lint_file
    raise astroid.AstroidError from e
astroid.exceptions.AstroidError

Expected behavior

No crash.

Pylint version

pylint 3.0.0b1
astroid 3.0.0a7-dev0
Python 3.11.2 (v3.11.2:878ead1ac1, Feb  7 2023, 10:02:41) [Clang 13.0.0 (clang-1300.0.29.30)]

OS / Environment

darwin (Darwin)

Additional dependencies

@jacobtylerwalls jacobtylerwalls added this to the 3.0.0a7 milestone Jul 4, 2023
@jacobtylerwalls jacobtylerwalls self-assigned this Jul 4, 2023
@jacobtylerwalls
Copy link
Member Author

This fixes it, but it's probably just a symptom of a deeper cause, so I'll investigate further.

diff --git a/astroid/nodes/node_classes.py b/astroid/nodes/node_classes.py
index 7bebac9d..885bafff 100644
--- a/astroid/nodes/node_classes.py
+++ b/astroid/nodes/node_classes.py
@@ -1870,6 +1870,7 @@ class Compare(NodeNG):
 
         ops = self.ops
         left_node = self.left
+        context = copy_context(context)
         lhs = list(left_node.infer(context=context))
         # should we break early if first element is uninferable?
         for op, right_node in ops:

@jacobtylerwalls jacobtylerwalls changed the title Inference failure from reused context in 3.0.0a6 Exhausted inference from reused context in 3.0.0a6 Jul 5, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging a pull request may close this issue.

1 participant