Reading from MongoDB using Spark-MongoDB connector is failing

I am using the org.mongodb.spark:mongo-spark-connector_2.11:2.2.7 version of the Spark MongoDB connector to fetch the data from MongoDB
It works for most of the objects, but it fails for some of them.

Below is the code snippet I am using to read the data.

collection_name='user'

df = spark.read.format("mongo").option('header', 'true')\
        .option("uri","mongodb://{}:{}@{}/{}.{}".format(mongo_user_name, mongo_password, mongo_addr,mongo_db_name,collection_name))\
        .load()

if df.count() > 0:
  df.write.format(TARGET_NAME).options(**sfOptions).option("dbtable", "{}.{}".format(schema_name, collection_name)).\
         mode('overwrite').save()

The exception is given below:

com.mongodb.spark.exceptions.MongoTypeConversionException: Cannot cast UNDEFINED into a StructType(StructField(oid,StringType,true)) (value: org.bson.BsonUndefined@0)

So when I printed the schema I see some undefined data types in the input data. But even if I ignore the column and try to select the remaining columns,
the code is failing.

I would request your help on this.

1 Like