I am trying to extract Data from MongoDb to my DF. Using Java Spark
Below is the sample Code :-
SparkConf conf = new SparkConf()
.setAppName("MongoSparkConnectorTour")
.setMaster("local")
.set("spark.app.id", "MongoSparkConnectorTour")
.set("spark.mongodb.input.uri", uri)
.set("sampleSize", args[2])
.set("spark.mongodb.output.uri", uri)
.set("spark.mongodb.input.partitioner", "MongoPaginateByCountPartitioner")
.set("spark.mongodb.input.partitionerOptions.numberOfPartitions", "64")
JavaSparkContext jsc = new JavaSparkContext(conf)
DataFrame df = MongoSpark.load(jsc).toDF();
System.out.println("DF Count - " + df.count());
df.printSchema();
There are 2 tables, and 1 Table i am able to fetch data without any issues, but for another table i get the following issue -
20/07/15 14:17:31 ERROR Executor: Exception in task 1.0 in stage 3.0 (TID 4)
com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast STRING into a NullType (value: BsonString{value='4492148'})
at com.mongodb.spark.sql.MapFunctions$.com$mongodb$spark$sql$MapFunctions$convertToDataType(MapFunctions.scala:80)
at com.mongodb.spark.sql.MapFunctions$anonfun$3.apply(MapFunctions.scala:38)
at com.mongodb.spark.sql.MapFunctions$anonfun$3.apply(MapFunctions.scala:36)
at scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at com.mongodb.spark.sql.MapFunctions$.documentToRow(MapFunctions.scala:36)
at com.mongodb.spark.sql.MapFunctions$.castToStructType(MapFunctions.scala:109)
at com.mongodb.spark.sql.MapFunctions$.com$mongodb$spark$sql$MapFunctions$convertToDataType(MapFunctions.scala:75)
at com.mongodb.spark.sql.MapFunctions$anonfun$3.apply(MapFunctions.scala:38)
at com.mongodb.spark.sql.MapFunctions$anonfun$3.apply(MapFunctions.scala:36)
at scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.TraversableLike$anonfun$map$1.apply(TraversableLike.scala:244)
at scala.collection.IndexedSeqOptimized$class.foreach(IndexedSeqOptimized.scala:33)
at scala.collection.mutable.ArrayOps$ofRef.foreach(ArrayOps.scala:108)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
at scala.collection.mutable.ArrayOps$ofRef.map(ArrayOps.scala:108)
at com.mongodb.spark.sql.MapFunctions$.documentToRow(MapFunctions.scala:36)
at com.mongodb.spark.sql.MapFunctions$.castToStructType(MapFunctions.scala:109)
at com.mongodb.spark.sql.MapFunctions$.com$mongodb$spark$sql$MapFunctions$convertToDataType(MapFunctions.scala:75)
at com.mongodb.spark.sql.MapFunctions$anonfun$3.apply(MapFunctions.scala:38)
From Google the only solution i see is to increase the sample size and still it is not working.
[Cast failing From Stack Overflow] (How to resolve com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast... Java Spark - Stack Overflow)
[Increase Sample Size] (mongodb - How to config Java Spark sparksession samplesize - Stack Overflow)
The second table volume is bit higher, and i have tried a higher sample size but still it fails.
Looks like it might not be a sample size issue. Any other suggestion or ideas to solve this will be useful.