We read every piece of feedback, and take your input very seriously.
To see all available qualifiers, see our documentation.
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Would be very useful if petastorm can read these two data types from HDFS.
The text was updated successfully, but these errors were encountered:
Agreed. We can try and add. Not sure about the time-frame for this though.
Sorry, something went wrong.
Efficient conversion requires Scala UDFs. Maybe we should add utility methods to Spark so in petastorm we can do the following:
from pyspark.ml.functions import vector_to_dense_array df.select(vector_to_dense_array(col("features")).alias("features"))
This approach doesn't require Scala code in petastorm. Created a Spark JIRA: https://issues.apache.org/jira/browse/SPARK-30154.
cc: @WeichenXu123
FYI. The UDF was merged into Spark master: apache/spark#26910
No branches or pull requests
Would be very useful if petastorm can read these two data types from HDFS.
The text was updated successfully, but these errors were encountered: