Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

removed LabeledPoint from Python LDA internals #3

Closed
wants to merge 5 commits into from

Conversation

jkbradley
Copy link

Please review, and feel free to either merge or take what you need.

yu-iskw and others added 5 commits June 13, 2015 05:19
TODO: LDAModel.describeTopics() in Python must be also implemented.
But it would be nice to fit for another issue. Implementing it is
a little hard, since the return value of `describeTopics` in Scala
consists of Tuple classes.
@yu-iskw
Copy link
Owner

yu-iskw commented Jun 20, 2015

One thing. we can't use L to express a long integer in Python 3.4. How should we support long integers in both of Python 2.x and Python 3.x?

"All integers are implemented as “long” integer objects of arbitrary size."
https://docs.python.org/3/c-api/long.html

$ python3.4

Python 3.4.2 (default, Jun 13 2015, 00:21:09)
[GCC 4.2.1 Compatible Apple LLVM 6.0 (clang-600.0.57)] on darwin
Type "help", "copyright", "credits" or "license" for more information.
>>> 1L
  File "<stdin>", line 1
    1L
     ^
SyntaxError: invalid syntax

@yu-iskw
Copy link
Owner

yu-iskw commented Jun 20, 2015

I have done a survey about int type and long int type in python2.6/python3.4 how those were treated in Java.

According to my research, if we want to deal with long int for the document index in LDA, we should go with LabeledPoint or such a new class. Because, it is quite difficulat to terat int and long int between python2.6/python3.4 and Java. Could you please check my experiment about that as follows?

https://gist.github.com/yu-iskw/12e92c2d718ca41dea90

Thanks
Yu

@jkbradley
Copy link
Author

@yu-iskw I left a few comments on the survey about type conversions for Python versions. Thank you for researching that so thoroughly! I don't think I have answers for all of your comments.

These complexities make me wonder if we should go through the DataFrame API instead. Perhaps the Python RDD can be converted into a DataFrame, which can be passed to the Java stub for LDA. That way, you can rely on DataFrame SerDe for type conversions, and that should be robust and well-tested.

@yu-iskw
Copy link
Owner

yu-iskw commented Jun 20, 2015

@jkbradley thank you for your feedback. That's a good idea! Using the DataFrame API would be nice. I'll try it!
I think if the DataFrame API doesn't work about type conversions, we should go with LabeledPoint or create a wrapper class.
Thanks!

yu-iskw pushed a commit that referenced this pull request Jul 2, 2015
Fix for incorrect memory in Spark UI as per SPARK-5768

Author: Joshi <[email protected]>
Author: Rekha Joshi <[email protected]>

Closes apache#6972 from rekhajoshm/SPARK-5768 and squashes the following commits:

b678a91 [Joshi] Fix for incorrect memory in Spark UI
2fe53d9 [Joshi] Fix for incorrect memory in Spark UI
eb823b8 [Joshi] SPARK-5768: Fix for incorrect memory in Spark UI
0be142d [Rekha Joshi] Merge pull request #3 from apache/master
106fd8e [Rekha Joshi] Merge pull request #2 from apache/master
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
yu-iskw pushed a commit that referenced this pull request Jul 2, 2015
… without side effects.

Fix for SparkContext stop behavior - Allow sc.stop() to be called multiple times without side effects.

Author: Joshi <[email protected]>
Author: Rekha Joshi <[email protected]>

Closes apache#6973 from rekhajoshm/SPARK-2645 and squashes the following commits:

277043e [Joshi] Fix for SparkContext stop behavior
446b0a4 [Joshi] Fix for SparkContext stop behavior
2ce5760 [Joshi] Fix for SparkContext stop behavior
c97839a [Joshi] Fix for SparkContext stop behavior
1aff39c [Joshi] Fix for SparkContext stop behavior
12f66b5 [Joshi] Fix for SparkContext stop behavior
72bb484 [Joshi] Fix for SparkContext stop behavior
a5a7d7f [Joshi] Fix for SparkContext stop behavior
9193a0c [Joshi] Fix for SparkContext stop behavior
58dba70 [Joshi] SPARK-2645: Fix for SparkContext stop behavior
380c5b0 [Joshi] SPARK-2645: Fix for SparkContext stop behavior
b566b66 [Joshi] SPARK-2645: Fix for SparkContext stop behavior
0be142d [Rekha Joshi] Merge pull request #3 from apache/master
106fd8e [Rekha Joshi] Merge pull request #2 from apache/master
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
yu-iskw pushed a commit that referenced this pull request Jul 21, 2015
This makes sure attempts are listed in the order they were executed, and that the
app's state matches the state of the most current attempt.

Author: Joshi <[email protected]>
Author: Rekha Joshi <[email protected]>

Closes apache#7253 from rekhajoshm/SPARK-8593 and squashes the following commits:

874dd80 [Joshi] History Server: updated order for multiple attempts(logcleaner)
716e0b1 [Joshi] History Server: updated order for multiple attempts(descending start time works everytime)
548c753 [Joshi] History Server: updated order for multiple attempts(descending start time works everytime)
83306a8 [Joshi] History Server: updated order for multiple attempts(descending start time)
b0fc922 [Joshi] History Server: updated order for multiple attempts(updated comment)
cc0fda7 [Joshi] History Server: updated order for multiple attempts(updated test)
304cb0b [Joshi] History Server: updated order for multiple attempts(reverted HistoryPage)
85024e8 [Joshi] History Server: updated order for multiple attempts
a41ac4b [Joshi] History Server: updated order for multiple attempts
ab65fa1 [Joshi] History Server: some attempt completed to work with showIncomplete
0be142d [Rekha Joshi] Merge pull request #3 from apache/master
106fd8e [Rekha Joshi] Merge pull request #2 from apache/master
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
yu-iskw pushed a commit that referenced this pull request Jul 21, 2015
Implement IntArrayParam in mllib

Author: Rekha Joshi <[email protected]>
Author: Joshi <[email protected]>

Closes apache#7481 from rekhajoshm/SPARK-9118 and squashes the following commits:

d3b1766 [Joshi] Implement IntArrayParam
0be142d [Rekha Joshi] Merge pull request #3 from apache/master
106fd8e [Rekha Joshi] Merge pull request #2 from apache/master
e3677c9 [Rekha Joshi] Merge pull request #1 from apache/master
@jkbradley jkbradley closed this Mar 15, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants