Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Moto server #35655

Merged
merged 70 commits into from
Aug 21, 2020
Merged
Show file tree
Hide file tree
Changes from 48 commits
Commits
Show all changes
70 commits
Select commit Hold shift + click to select a range
3a54dde
Pepper storage_options
Jul 10, 2020
f0922c4
Merge branch 'master' into storage_options
Jul 10, 2020
e549f8d
Add feather test
Jul 22, 2020
e8540c4
Merge branch 'master' into storage_options
Jul 22, 2020
0034bff
Add CSV and parquet options tests; lint
Jul 22, 2020
19f041d
deeper lint
Jul 22, 2020
f9e1e69
more tests
Jul 22, 2020
7f69afe
blank line
Jul 22, 2020
cc0e4c3
attempt relint
Jul 22, 2020
e356e93
unused import
Jul 22, 2020
c7170dd
more order
Jul 22, 2020
b96778d
plumb stata and test
Jul 23, 2020
1dc41b1
Add note about storage_options in whatsnew
Jul 23, 2020
d882984
Plumb and test markdown
Jul 23, 2020
f1e455d
optional markdown
Jul 23, 2020
c88b75f
remove extraneous
Jul 23, 2020
58481a4
more extraneous
Jul 23, 2020
704770b
Add fsspec options error and docstrings
Jul 24, 2020
1b8637e
fix that
Jul 24, 2020
bbcef17
black
Jul 24, 2020
a18686c
fix it again
Jul 24, 2020
fa656cb
more lint
Jul 24, 2020
e8d5312
Merge branch 'master' into storage_options
Jul 27, 2020
a79a274
Requested changes
Jul 27, 2020
28d6d38
Make moto server process instead of monkey
Jul 28, 2020
97c7263
Merge branch 'master' into storage_options
Jul 29, 2020
e99f8ed
Update versions
Jul 29, 2020
aa2751e
Merge branch 'storage_options' into moto_server
Jul 29, 2020
0a2fc29
black and start excel
Jul 29, 2020
6ce6ecc
Plumb excel
Jul 29, 2020
23f4fc4
Merge branch 'master' into storage_options
Jul 31, 2020
3ec9342
Merge branch 'storage_options' into moto_server
Jul 31, 2020
4732729
Merge branch 'master' into moto_server
Aug 10, 2020
a1dba75
fix merge
Aug 10, 2020
e2717db
isort
Aug 10, 2020
e9ed76f
option typo
Aug 10, 2020
1fb4b40
remove moto variable
Aug 10, 2020
e646c16
Add flask where there is moto
Aug 10, 2020
f61bf0b
specific options for s3
Aug 10, 2020
d7e5b4a
skip some; fix unrelated HDF arg order
Aug 10, 2020
df6d48f
rerun generate_pip_deps
Aug 10, 2020
84a8149
try simpler
Aug 10, 2020
32cf3e1
try again
Aug 10, 2020
867c985
Check moto not py
Aug 11, 2020
ee7b156
Merge branch 'master' into moto_server
Aug 11, 2020
65ec74f
Suggestions
Aug 11, 2020
e5fa341
maybe mypy fix
Aug 11, 2020
8ab2702
Move fsspec fixture imports; add whatsnew note
Aug 12, 2020
476e96a
responses
Aug 14, 2020
8ae9189
relint
Aug 14, 2020
eab08b9
update moto deps
Aug 14, 2020
b13614b
latest on windows env
Aug 14, 2020
4c2c1a0
Add kwargs
Aug 14, 2020
9c4124d
try in win-py38 env
Aug 14, 2020
3449870
Merge branch 'master' into moto_server
Aug 17, 2020
09a8e8e
Env only
Aug 18, 2020
51bb02b
Revert - reintroduce code
Aug 18, 2020
8387ea6
Skip test on win
Aug 18, 2020
5e2a86f
not "reason" in skip
Aug 18, 2020
8711c96
typo
Aug 18, 2020
a8a34a0
Merge branch 'master' into moto_server
Aug 20, 2020
6d21fa0
Fewer moto server processes
Aug 20, 2020
1e1d3fe
Dumb
Aug 20, 2020
9f8ad5a
With exceptions
Aug 20, 2020
015512d
Skip old pyarrow; remove moto where no s3fs in env
Aug 20, 2020
989a9f1
sign error
Aug 20, 2020
7617256
Add arrow to env that was getting it anyway (but an old one)
Aug 20, 2020
4624cef
Add reasonable timeouts
Aug 21, 2020
f5c2a44
lint
Aug 21, 2020
5d50cda
small revert
Aug 21, 2020
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions ci/deps/azure-37-locale.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -21,6 +21,7 @@ dependencies:
- lxml
- matplotlib>=3.3.0
- moto
- flask
- nomkl
- numexpr
- numpy=1.16.*
Expand Down
1 change: 1 addition & 0 deletions ci/deps/azure-37-slow.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -33,3 +33,4 @@ dependencies:
- xlsxwriter
- xlwt
- moto
- flask
2 changes: 2 additions & 0 deletions ci/deps/azure-38-locale.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ dependencies:

# pandas dependencies
- beautifulsoup4
- flask
- html5lib
- ipython
- jinja2
Expand All @@ -32,6 +33,7 @@ dependencies:
- xlrd
- xlsxwriter
- xlwt
- moto
- pyarrow>=0.15
- pip
- pip:
Expand Down
1 change: 1 addition & 0 deletions ci/deps/azure-windows-37.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ dependencies:
- lxml
- matplotlib=2.2.*
- moto
- flask
- numexpr
- numpy=1.16.*
- openpyxl
Expand Down
1 change: 1 addition & 0 deletions ci/deps/travis-37-arm64.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -17,5 +17,6 @@ dependencies:
- python-dateutil
- pytz
- pip
- flask
- pip:
- moto
1 change: 1 addition & 0 deletions ci/deps/travis-37-cov.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ dependencies:
- html5lib
- matplotlib
- moto
- flask
- nomkl
- numexpr
- numpy=1.16.*
Expand Down
1 change: 1 addition & 0 deletions ci/deps/travis-37-locale.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,7 @@ dependencies:
- lxml=4.3.0
- matplotlib=3.0.*
- moto
- flask
- nomkl
- numexpr
- numpy
Expand Down
3 changes: 3 additions & 0 deletions doc/source/whatsnew/v1.2.0.rst
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,9 @@ of the individual storage backends (detailed from the fsspec docs for
`builtin implementations`_ and linked to `external ones`_). See
Section :ref:`io.remote`.

:issue:`35655` added fsspec support (including ``storage_options``)
for reading excel files.

.. _builtin implementations: https://filesystem-spec.readthedocs.io/en/latest/api.html#built-in-implementations
.. _external ones: https://filesystem-spec.readthedocs.io/en/latest/api.html#other-known-implementations

Expand Down
1 change: 1 addition & 0 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,7 @@ dependencies:
- botocore>=1.11
- hypothesis>=3.82
- moto # mock S3
- flask
- pytest>=5.0.1
- pytest-cov
- pytest-xdist>=1.21
Expand Down
30 changes: 23 additions & 7 deletions pandas/io/excel/_base.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,11 +3,12 @@
from io import BufferedIOBase, BytesIO, RawIOBase
import os
from textwrap import fill
from typing import Union
from typing import Any, Mapping, Union

from pandas._config import config

from pandas._libs.parsers import STR_NA_VALUES
from pandas._typing import StorageOptions
from pandas.errors import EmptyDataError
from pandas.util._decorators import Appender, deprecate_nonkeyword_arguments

Expand Down Expand Up @@ -199,6 +200,15 @@
Duplicate columns will be specified as 'X', 'X.1', ...'X.N', rather than
'X'...'X'. Passing in False will cause data to be overwritten if there
are duplicate names in the columns.
storage_options : StorageOptions
Extra options that make sense for a particular storage connection, e.g.
host, port, username, password, etc., if using a URL that will
be parsed by ``fsspec``, e.g., starting "s3://", "gcs://". An error
will be raised if providing this argument with a local path or
a file-like buffer. See the fsspec and backend storage implementation
docs for the set of allowed keys and values

.. versionadded:: 1.2.0

Returns
-------
Expand Down Expand Up @@ -298,10 +308,11 @@ def read_excel(
skipfooter=0,
convert_float=True,
mangle_dupe_cols=True,
storage_options: StorageOptions = None,
):

if not isinstance(io, ExcelFile):
io = ExcelFile(io, engine=engine)
io = ExcelFile(io, storage_options=storage_options, engine=engine)
elif engine and engine != io.engine:
raise ValueError(
"Engine should not be specified when passing "
Expand Down Expand Up @@ -336,12 +347,14 @@ def read_excel(


class _BaseExcelReader(metaclass=abc.ABCMeta):
def __init__(self, filepath_or_buffer):
def __init__(self, filepath_or_buffer, storage_options: StorageOptions = None):
# If filepath_or_buffer is a url, load the data into a BytesIO
if is_url(filepath_or_buffer):
filepath_or_buffer = BytesIO(urlopen(filepath_or_buffer).read())
elif not isinstance(filepath_or_buffer, (ExcelFile, self._workbook_class)):
filepath_or_buffer, _, _, _ = get_filepath_or_buffer(filepath_or_buffer)
filepath_or_buffer, _, _, _ = get_filepath_or_buffer(
filepath_or_buffer, storage_options=storage_options
)

if isinstance(filepath_or_buffer, self._workbook_class):
self.book = filepath_or_buffer
Expand Down Expand Up @@ -837,14 +850,16 @@ class ExcelFile:
from pandas.io.excel._pyxlsb import _PyxlsbReader
from pandas.io.excel._xlrd import _XlrdReader

_engines = {
_engines: Mapping[str, Any] = {
"xlrd": _XlrdReader,
"openpyxl": _OpenpyxlReader,
"odf": _ODFReader,
"pyxlsb": _PyxlsbReader,
}

def __init__(self, path_or_buffer, engine=None):
def __init__(
self, path_or_buffer, storage_options: StorageOptions = None, engine=None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Put storage_options after engine, for users passing that as a position?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

done

):
if engine is None:
engine = "xlrd"
if isinstance(path_or_buffer, (BufferedIOBase, RawIOBase)):
Expand All @@ -858,13 +873,14 @@ def __init__(self, path_or_buffer, engine=None):
raise ValueError(f"Unknown engine: {engine}")

self.engine = engine
self.storage_options = storage_options

# Could be a str, ExcelFile, Book, etc.
self.io = path_or_buffer
# Always a string
self._io = stringify_path(path_or_buffer)

self._reader = self._engines[engine](self._io)
self._reader = self._engines[engine](self._io, storage_options=storage_options)

def __fspath__(self):
return self._io
Expand Down
14 changes: 10 additions & 4 deletions pandas/io/excel/_odfreader.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import numpy as np

from pandas._typing import FilePathOrBuffer, Scalar
from pandas._typing import FilePathOrBuffer, Scalar, StorageOptions
from pandas.compat._optional import import_optional_dependency

import pandas as pd
Expand All @@ -16,13 +16,19 @@ class _ODFReader(_BaseExcelReader):

Parameters
----------
filepath_or_buffer: string, path to be parsed or
filepath_or_buffer : string, path to be parsed or
an open readable stream.
storage_options : StorageOptions
passed to fsspec for appropriate URLs (see ``get_filepath_or_buffer``)
"""

def __init__(self, filepath_or_buffer: FilePathOrBuffer):
def __init__(
self,
filepath_or_buffer: FilePathOrBuffer,
storage_options: StorageOptions = None,
):
import_optional_dependency("odf")
super().__init__(filepath_or_buffer)
super().__init__(filepath_or_buffer, storage_options=storage_options)

@property
def _workbook_class(self):
Expand Down
12 changes: 9 additions & 3 deletions pandas/io/excel/_openpyxl.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import numpy as np

from pandas._typing import FilePathOrBuffer, Scalar
from pandas._typing import FilePathOrBuffer, Scalar, StorageOptions
from pandas.compat._optional import import_optional_dependency

from pandas.io.excel._base import ExcelWriter, _BaseExcelReader
Expand Down Expand Up @@ -467,17 +467,23 @@ def write_cells(


class _OpenpyxlReader(_BaseExcelReader):
def __init__(self, filepath_or_buffer: FilePathOrBuffer) -> None:
def __init__(
self,
filepath_or_buffer: FilePathOrBuffer,
storage_options: StorageOptions = None,
) -> None:
"""
Reader using openpyxl engine.

Parameters
----------
filepath_or_buffer : string, path object or Workbook
Object to be parsed.
storage_options : StorageOptions
passed to fsspec for appropriate URLs (see ``get_filepath_or_buffer``)
"""
import_optional_dependency("openpyxl")
super().__init__(filepath_or_buffer)
super().__init__(filepath_or_buffer, storage_options=storage_options)

@property
def _workbook_class(self):
Expand Down
14 changes: 10 additions & 4 deletions pandas/io/excel/_pyxlsb.py
Original file line number Diff line number Diff line change
@@ -1,25 +1,31 @@
from typing import List

from pandas._typing import FilePathOrBuffer, Scalar
from pandas._typing import FilePathOrBuffer, Scalar, StorageOptions
from pandas.compat._optional import import_optional_dependency

from pandas.io.excel._base import _BaseExcelReader


class _PyxlsbReader(_BaseExcelReader):
def __init__(self, filepath_or_buffer: FilePathOrBuffer):
def __init__(
self,
filepath_or_buffer: FilePathOrBuffer,
storage_options: StorageOptions = None,
):
"""
Reader using pyxlsb engine.

Parameters
----------
filepath_or_buffer: str, path object, or Workbook
filepath_or_buffer : str, path object, or Workbook
Object to be parsed.
storage_options : StorageOptions
passed to fsspec for appropriate URLs (see ``get_filepath_or_buffer``)
"""
import_optional_dependency("pyxlsb")
# This will call load_workbook on the filepath or buffer
# And set the result to the book-attribute
super().__init__(filepath_or_buffer)
super().__init__(filepath_or_buffer, storage_options=storage_options)

@property
def _workbook_class(self):
Expand Down
7 changes: 5 additions & 2 deletions pandas/io/excel/_xlrd.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,24 +2,27 @@

import numpy as np

from pandas._typing import StorageOptions
from pandas.compat._optional import import_optional_dependency

from pandas.io.excel._base import _BaseExcelReader


class _XlrdReader(_BaseExcelReader):
def __init__(self, filepath_or_buffer):
def __init__(self, filepath_or_buffer, storage_options: StorageOptions = None):
"""
Reader using xlrd engine.

Parameters
----------
filepath_or_buffer : string, path object or Workbook
Object to be parsed.
storage_options : StorageOptions
passed to fsspec for appropriate URLs (see ``get_filepath_or_buffer``)
"""
err_msg = "Install xlrd >= 1.0.0 for Excel support"
import_optional_dependency("xlrd", extra=err_msg)
super().__init__(filepath_or_buffer)
super().__init__(filepath_or_buffer, storage_options=storage_options)

@property
def _workbook_class(self):
Expand Down
7 changes: 5 additions & 2 deletions pandas/io/feather_format.py
Original file line number Diff line number Diff line change
@@ -1,13 +1,14 @@
""" feather-format compat """

from pandas._typing import StorageOptions
from pandas.compat._optional import import_optional_dependency

from pandas import DataFrame, Int64Index, RangeIndex

from pandas.io.common import get_filepath_or_buffer


def to_feather(df: DataFrame, path, storage_options=None, **kwargs):
def to_feather(df: DataFrame, path, storage_options: StorageOptions = None, **kwargs):
"""
Write a DataFrame to the binary Feather format.

Expand Down Expand Up @@ -77,7 +78,9 @@ def to_feather(df: DataFrame, path, storage_options=None, **kwargs):
feather.write_feather(df, path, **kwargs)


def read_feather(path, columns=None, use_threads: bool = True, storage_options=None):
def read_feather(
path, columns=None, use_threads: bool = True, storage_options: StorageOptions = None
):
"""
Load a feather-format object from the file path.

Expand Down
4 changes: 2 additions & 2 deletions pandas/io/parsers.py
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@
import pandas._libs.parsers as parsers
from pandas._libs.parsers import STR_NA_VALUES
from pandas._libs.tslibs import parsing
from pandas._typing import FilePathOrBuffer, Union
from pandas._typing import FilePathOrBuffer, StorageOptions, Union
from pandas.errors import (
AbstractMethodError,
EmptyDataError,
Expand Down Expand Up @@ -596,7 +596,7 @@ def read_csv(
low_memory=_c_parser_defaults["low_memory"],
memory_map=False,
float_precision=None,
storage_options=None,
storage_options: StorageOptions = None,
):
# gh-23761
#
Expand Down
Loading