MongoDB.live, free & fully virtual. June 9th - 10th. Register Now MongoDB.live, free & fully virtual. June 9th - 10th. Register Now

Getting Size Exceeded Exception while storing Dataframe into MongoDB

I am trying to store Apache Spark Dataframe into MongoDB using Scala but getting Caused by: org.bson.BsonMaximumSizeExceededException: Payload document size is larger than maximum of 16777216. exception while storing dataframe into MongoDB

Code Snippet:

 val spark = SparkSession.builder()
      .appName("User Network Graph")
      .config("spark.mongodb.input.uri", "mongodb://mongo/socio.d3raw")
      .config("spark.mongodb.output.uri", "mongodb://mongo/socio.d3raw")
      .master("yarn").getOrCreate()

 val rawD3str=seqGraph.toDF()

 MongoSpark.write(rawD3str).option("spark.mongodb.output.uri", "mongodb://mongo/socio" 
   ).option("collection","d3raw").mode("append").save()

Error stack trace

0 failed 4 times, most recent failure: Lost task 0.3 in stage 332.0 (TID 11617, hadoop-node022, executor 1): org.bson.BsonMaximumSizeExceededException: Payload document size is larger than maximum of 16777216. at com.mongodb.internal.connection.BsonWriterHelper.writePayload(BsonWriterHelper.java:68) at com.mongodb.internal.connection.CommandMessage.encodeMessageBodyWithMetadata(CommandMessage.java:147) at com.mongodb.internal.connection.RequestMessage.encode(RequestMessage.java:138) at com.mongodb.internal.connection.CommandMessage.encode(CommandMessage.java:61) at com.mongodb.internal.connection.InternalStreamConnection.sendAndReceive(InternalStreamConnection.java:248) at com.mongodb.internal.connection.UsageTrackingInternalConnection.sendAndReceive(UsageTrackingInternalConnection.java:99) at com.mongodb.internal.connection.DefaultConnectionPool$PooledConnection.sendAndReceive(DefaultConnectionPool.java:450) at com.mongodb.internal.connection.CommandProtocolImpl.execute(CommandProtocolImpl.java:72) at com.mongodb.internal.connection.DefaultServer$DefaultServerProtocolExecutor.execute(DefaultServer.java:226) at com.mongodb.internal.connection.DefaultServerConnection.executeProtocol(DefaultServerConnection.java:269) at com.mongodb.internal.connection.DefaultServerConnection.command(DefaultServerConnection.java:131) at com.mongodb.operation.MixedBulkWriteOperation.executeCommand(MixedBulkWriteOperation.java:435) at com.mongodb.operation.MixedBulkWriteOperation.executeBulkWriteBatch(MixedBulkWriteOperation.java:261) at com.mongodb.operation.MixedBulkWriteOperation.access$700(MixedBulkWriteOperation.java:72) at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:205) at com.mongodb.operation.MixedBulkWriteOperation$1.call(MixedBulkWriteOperation.java:196) at com.mongodb.operation.OperationHelper.wi

Hi @Ameen_Nagiwale a single MongoDB document can not exceed 16MB (unless you use the GridFS storage). From the sounds of things your dataframe is bigger than that.

https://docs.mongodb.com/manual/core/document/#document-size-limit