Mongodb update_many and limit

Hello.
Python 3.7.3
pymongo==3.10.1
Mongodb version 4.2.3
in the collection of 100 million documents and every 30min + 300K documents

How to make a request similar to MySQL:

UPDATE table SET status_flag=1 WHERE status_flag=0 LIMIT 300000;

???

This is udate all documents
my_collection.update_many({“status_flag”: 0}, {"$set": {“status_flag”: 1}}).limit(300000)

How to make a limit ?

Hello @Kompaniiets_Denis, welcome to the MongoDB forum.

You can use Bulk Write Operations for this batch update. Using PyMongo:

bulk_request = [ ]

for doc in collection.find( { 'status_flag': 0 } ).limit( 3 ):
    bulk_request.append( UpdateOne( { '_id': doc['_id'] }, { '$set': { 'status_flag': 1 } } ) )

result = collection.bulk_write( bulk_request )

print(result.matched_count, result.modified_count) 

PyMongo Definitions:


NOTE: Having an index on the query filter field status_flag can improve the query performance.

index status_flag and ts_utc_start set

bulk_request = [ ]
for doc in col.find( { ‘status_flag’: count_old } ).sort( [ ( ‘ts_utc_start’, 1 ) ] ).limit( 10 ):
bulk_request.append( updateOne( { ‘_id’: doc[ ‘_id’ ] }, { ‘$set’: { ‘status_flag’: count_new } } ) )
result = col.bulk_write( bulk_request )

or
bulk_request.append( UpdateOne( { ‘_id’: doc[ ‘_id’ ] }, { ‘$set’: { ‘status_flag’: count_new } } ) )

gives an error message
Exception name ‘updateOne’ is not defined
or
Exception name ‘UpdateOne’ is not defined

Here is your Pyhton code with proper imports:

import pymongo
from pymongo import UpdateOne
client = pymongo.MongoClient()
collection = client.testDB.testColl

bulk_request = [ ]
for doc in collection.find( { 'status_flag': 0 } ).limit( 2 ):
    bulk_request.append(UpdateOne( { '_id': doc['_id'] }, { '$set': { 'status_flag': 1 } } ) )

result = collection.bulk_write( bulk_request )
print(result.matched_count, result.matched_count)
1 Like

@Prasad_Saya thanks for the help :+1:

300K documents

for doc in col.find({“status_flag”: 0}).sort( [(“ts_utc_start”, 1)] ).limit(300000):
res = col.update_many({"_id": doc["_id"]}, {"$set": {“status_flag”: 1}})

completed in 2min 04 seconds

bulk_request = [ ]
for doc in col.find( { ‘status_flag’: 0 } ).limit( 300000 ):
bulk_request.append(UpdateOne( { ‘_id’: doc[’_id’] }, { ‘$set’: { ‘status_flag’: 1 } } ) )
result = collection.bulk_write( bulk_request )

completed in 24 seconds

This is a cool result.

1 Like

2 posts were split to a new topic: Issue selecting 5k records from 17MM