Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Python] Fix typo in serializer exception #21566

Closed
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 9 additions & 8 deletions python/pyspark/serializers.py
Original file line number Diff line number Diff line change
Expand Up @@ -33,8 +33,9 @@
[0, 2, 4, 6, 8, 10, 12, 14, 16, 18]
>>> sc.stop()

PySpark serialize objects in batches; By default, the batch size is chosen based
on the size of objects, also configurable by SparkContext's C{batchSize} parameter:
PySpark serializes objects in batches; by default, the batch size is chosen based
on the size of objects and is also configurable by SparkContext's C{batchSize}
parameter:

>>> sc = SparkContext('local', 'test', batchSize=2)
>>> rdd = sc.parallelize(range(16), 4).map(lambda x: x)
Expand Down Expand Up @@ -100,7 +101,7 @@ def load_stream(self, stream):
def _load_stream_without_unbatching(self, stream):
"""
Return an iterator of deserialized batches (iterable) of objects from the input stream.
if the serializer does not operate on batches the default implementation returns an
If the serializer does not operate on batches the default implementation returns an
iterator of single element lists.
"""
return map(lambda x: [x], self.load_stream(stream))
Expand Down Expand Up @@ -461,7 +462,7 @@ def dumps(self, obj):
return obj


# Hook namedtuple, make it picklable
# Hack namedtuple, make it picklable
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh, haha. Hook and Hack.


__cls = {}

Expand Down Expand Up @@ -525,15 +526,15 @@ def namedtuple(*args, **kwargs):
cls = _old_namedtuple(*args, **kwargs)
return _hack_namedtuple(cls)

# replace namedtuple with new one
# replace namedtuple with the new one
collections.namedtuple.__globals__["_old_namedtuple_kwdefaults"] = _old_namedtuple_kwdefaults
collections.namedtuple.__globals__["_old_namedtuple"] = _old_namedtuple
collections.namedtuple.__globals__["_hack_namedtuple"] = _hack_namedtuple
collections.namedtuple.__code__ = namedtuple.__code__
collections.namedtuple.__hijack = 1

# hack the cls already generated by namedtuple
# those created in other module can be pickled as normal,
# hack the cls already generated by namedtuple.
# Those created in other modules can be pickled as normal,
# so only hack those in __main__ module
for n, o in sys.modules["__main__"].__dict__.items():
if (type(o) is type and o.__base__ is tuple
Expand Down Expand Up @@ -627,7 +628,7 @@ def loads(self, obj):
elif _type == b'P':
return pickle.loads(obj[1:])
else:
raise ValueError("invalid sevialization type: %s" % _type)
raise ValueError("invalid serialization type: %s" % _type)


class CompressedSerializer(FramedSerializer):
Expand Down