Skip to content

Commit

Permalink
[SPARK-4348] [PySpark] [MLlib] rename random.py to rand.py
Browse files Browse the repository at this point in the history
This PR rename random.py to rand.py to avoid the side affects of conflict with random module, but still keep the same interface as before.

```
>>> from pyspark.mllib.random import RandomRDDs
```

```
$ pydoc pyspark.mllib.random
Help on module random in pyspark.mllib:
NAME
    random - Python package for random data generation.

FILE
    /Users/davies/work/spark/python/pyspark/mllib/rand.py

CLASSES
    __builtin__.object
        pyspark.mllib.random.RandomRDDs

    class RandomRDDs(__builtin__.object)
     |  Generator methods for creating RDDs comprised of i.i.d samples from
     |  some distribution.
     |
     |  Static methods defined here:
     |
     |  normalRDD(sc, size, numPartitions=None, seed=None)
```

cc mengxr

reference link: http://xion.org.pl/2012/05/06/hacking-python-imports/

Author: Davies Liu <[email protected]>

Closes apache#3216 from davies/random and squashes the following commits:

7ac4e8b [Davies Liu] rename random.py to rand.py

(cherry picked from commit ce0333f)
Signed-off-by: Josh Rosen <[email protected]>

Conflicts:
	python/pyspark/mllib/feature.py
	python/run-tests
  • Loading branch information
Davies Liu authored and JoshRosen committed Jan 12, 2015
1 parent ee33699 commit 7ae5a1c
Show file tree
Hide file tree
Showing 5 changed files with 35 additions and 15 deletions.
10 changes: 0 additions & 10 deletions python/pyspark/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -49,16 +49,6 @@
Main entry point for accessing data stored in Apache Hive..
"""

# The following block allows us to import python's random instead of mllib.random for scripts in
# mllib that depend on top level pyspark packages, which transitively depend on python's random.
# Since Python's import logic looks for modules in the current package first, we eliminate
# mllib.random as a candidate for C{import random} by removing the first search path, the script's
# location, in order to force the loader to look in Python's top-level modules for C{random}.
import sys
s = sys.path.pop(0)
import random
sys.path.insert(0, s)

from pyspark.conf import SparkConf
from pyspark.context import SparkContext
from pyspark.sql import SQLContext
Expand Down
34 changes: 34 additions & 0 deletions python/pyspark/mllib/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -24,3 +24,37 @@
import numpy
if numpy.version.version < '1.4':
raise Exception("MLlib requires NumPy 1.4+")

__all__ = ['classification', 'clustering', 'linalg', 'random',
'recommendation', 'regression', 'stat', 'tree', 'util']

import sys
import rand as random
random.__name__ = 'random'
random.RandomRDDs.__module__ = __name__ + '.random'


class RandomModuleHook(object):
"""
Hook to import pyspark.mllib.random
"""
fullname = __name__ + '.random'

def find_module(self, name, path=None):
# skip all other modules
if not name.startswith(self.fullname):
return
return self

def load_module(self, name):
if name == self.fullname:
return random

cname = name.rsplit('.', 1)[-1]
try:
return getattr(random, cname)
except AttributeError:
raise ImportError


sys.meta_path.append(RandomModuleHook())
4 changes: 0 additions & 4 deletions python/pyspark/mllib/linalg.py
Original file line number Diff line number Diff line change
Expand Up @@ -267,8 +267,4 @@ def _test():
exit(-1)

if __name__ == "__main__":
# remove current path from list of search paths to avoid importing mllib.random
# for C{import random}, which is done in an external dependency of pyspark during doctests.
import sys
sys.path.pop(0)
_test()
File renamed without changes.
2 changes: 1 addition & 1 deletion python/run-tests
Original file line number Diff line number Diff line change
Expand Up @@ -73,7 +73,7 @@ run_test "pyspark/mllib/_common.py"
run_test "pyspark/mllib/classification.py"
run_test "pyspark/mllib/clustering.py"
run_test "pyspark/mllib/linalg.py"
run_test "pyspark/mllib/random.py"
run_test "pyspark/mllib/rand.py"
run_test "pyspark/mllib/recommendation.py"
run_test "pyspark/mllib/regression.py"
run_test "pyspark/mllib/stat.py"
Expand Down

0 comments on commit 7ae5a1c

Please sign in to comment.