You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi i executed below code in pyspark in jupyter notebook.
`from mleap import pyspark
from pyspark.ml import Pipeline, PipelineModel
from mleap.pyspark.spark_support import SimpleSparkSerializer
from pyspark.mllib.regression import LabeledPoint
from pyspark.ml.classification import NaiveBayes
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.feature import HashingTF, Tokenizer, StringIndexer
from pyspark.ml import Pipeline
from pyspark.sql import Row
from pyspark.sql.types import *
from pyspark.ml.feature import IndexToString
schema = StructType([
StructField("category", IntegerType(), True),
StructField("text", StringType(), True)])
spark = SparkSession.builder.master("local").enableHiveSupport().getOrCreate()
textFile=spark.read.csv(
"/home/opentext/bda/home/bin/notebook/Sell1.csv", header=True, mode="DROPMALFORMED", schema=schema
)
textFile.show()
textFile.write.save("/home/opentext/bda/home/bin/notebook/Sell.parquet", format="parquet")
schemaSell = spark.read.load("/home/opentext/bda/home/bin/notebook/Sell.parquet")
train_data, test_data = schemaSell.randomSplit([0.8, 0.2])
categoryIndexer = StringIndexer(inputCol="category", outputCol="label")
labels = categoryIndexer.fit(train_data).labels
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol="words", outputCol="features", numFeatures=10000)
nb = NaiveBayes(smoothing=1.0, modelType="multinomial")
pipeline = Pipeline(stages=[categoryIndexer, tokenizer,hashingTF, nb])
model = pipeline.fit(train_data)
pr = model.transform(schemaSell)
( No problem with this below show )
pr.show()
from mleap import pyspark
from pyspark.ml import Pipeline, PipelineModel
from mleap.pyspark.spark_support import SimpleSparkSerializer
model.serializeToBundle("jar:file:///home/opentext/bda/home/bin/notebook/modelnb.zip",pr)
transformer=PipelineModel.deserializeFromBundle("jar:file:///home/opentext/bda/home/bin/notebook/modelnb.zip")
ds=transformer.transform(test_data)
ds.show() `
On calling show () method ( ds.show()) it throws below exception:
Name: org.apache.toree.interpreter.broker.BrokerException
Message: Py4JJavaError: An error occurred while calling o982.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 56.0 failed 1 times, most recent failure: Lost task 0.0 in stage 56.0 (TID 56, localhost, executor driver): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$1: (vector) => vector)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745) Caused by: java.util.NoSuchElementException: Failed to find a default value for modelType
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:652)
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:652)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:651)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42)
at org.apache.spark.ml.param.Params$class.$(params.scala:656)
at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42)
at org.apache.spark.ml.classification.NaiveBayesModel.predictRaw(NaiveBayes.scala:317)
at org.apache.spark.ml.classification.NaiveBayesModel.predictRaw(NaiveBayes.scala:252)
at org.apache.spark.ml.classification.ProbabilisticClassificationModel$$anonfun$1.apply(ProbabilisticClassifier.scala:117)
at org.apache.spark.ml.classification.ProbabilisticClassificationModel$$anonfun$1.apply(ProbabilisticClassifier.scala:116)
... 16 more
The text was updated successfully, but these errors were encountered:
It throws Caused by: java.util.NoSuchElementException: Failed to find a default value for modelType
even though i passed modelType ="multinomial" while intializing
NaiveBayes(smoothing=1.0, modelType="multinomial")
Hi @GowthamGoud, thanks for raising this bug find! We have a few instances where we've missed some params when re-loading the transformers back into Spark, I've raised #483 to fix the NaiveBayes model and a couple others.
`from mleap import pyspark
from pyspark.ml import Pipeline, PipelineModel
from mleap.pyspark.spark_support import SimpleSparkSerializer
from pyspark.mllib.regression import LabeledPoint
from pyspark.ml.classification import NaiveBayes
from pyspark.ml.evaluation import MulticlassClassificationEvaluator
from pyspark.ml.feature import HashingTF, Tokenizer, StringIndexer
from pyspark.ml import Pipeline
from pyspark.sql import Row
from pyspark.sql.types import *
from pyspark.ml.feature import IndexToString
schema = StructType([
StructField("category", IntegerType(), True),
StructField("text", StringType(), True)])
spark = SparkSession.builder.master("local").enableHiveSupport().getOrCreate()
textFile=spark.read.csv(
"/home/opentext/bda/home/bin/notebook/Sell1.csv", header=True, mode="DROPMALFORMED", schema=schema
)
textFile.show()
textFile.write.save("/home/opentext/bda/home/bin/notebook/Sell.parquet", format="parquet")
schemaSell = spark.read.load("/home/opentext/bda/home/bin/notebook/Sell.parquet")
train_data, test_data = schemaSell.randomSplit([0.8, 0.2])
categoryIndexer = StringIndexer(inputCol="category", outputCol="label")
labels = categoryIndexer.fit(train_data).labels
tokenizer = Tokenizer(inputCol="text", outputCol="words")
hashingTF = HashingTF(inputCol="words", outputCol="features", numFeatures=10000)
nb = NaiveBayes(smoothing=1.0, modelType="multinomial")
pipeline = Pipeline(stages=[categoryIndexer, tokenizer,hashingTF, nb])
model = pipeline.fit(train_data)
pr = model.transform(schemaSell)
( No problem with this below show )
pr.show()
from mleap import pyspark
from pyspark.ml import Pipeline, PipelineModel
from mleap.pyspark.spark_support import SimpleSparkSerializer
model.serializeToBundle("jar:file:///home/opentext/bda/home/bin/notebook/modelnb.zip",pr)
transformer=PipelineModel.deserializeFromBundle("jar:file:///home/opentext/bda/home/bin/notebook/modelnb.zip")
ds=transformer.transform(test_data)
ds.show() `
On calling show () method ( ds.show()) it throws below exception:
Name: org.apache.toree.interpreter.broker.BrokerException
Message: Py4JJavaError: An error occurred while calling o982.showString.
: org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 56.0 failed 1 times, most recent failure: Lost task 0.0 in stage 56.0 (TID 56, localhost, executor driver): org.apache.spark.SparkException: Failed to execute user defined function($anonfun$1: (vector) => vector)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:377)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:231)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:225)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:99)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:322)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.util.NoSuchElementException: Failed to find a default value for modelType
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:652)
at org.apache.spark.ml.param.Params$$anonfun$getOrDefault$2.apply(params.scala:652)
at scala.Option.getOrElse(Option.scala:121)
at org.apache.spark.ml.param.Params$class.getOrDefault(params.scala:651)
at org.apache.spark.ml.PipelineStage.getOrDefault(Pipeline.scala:42)
at org.apache.spark.ml.param.Params$class.$(params.scala:656)
at org.apache.spark.ml.PipelineStage.$(Pipeline.scala:42)
at org.apache.spark.ml.classification.NaiveBayesModel.predictRaw(NaiveBayes.scala:317)
at org.apache.spark.ml.classification.NaiveBayesModel.predictRaw(NaiveBayes.scala:252)
at org.apache.spark.ml.classification.ProbabilisticClassificationModel$$anonfun$1.apply(ProbabilisticClassifier.scala:117)
at org.apache.spark.ml.classification.ProbabilisticClassificationModel$$anonfun$1.apply(ProbabilisticClassifier.scala:116)
... 16 more
The text was updated successfully, but these errors were encountered: