Hi all. I was looking into some performance related tasks for an application that I am developing using Flask, served via Gunicorn. I am using MongoDB as my primary database and PyMongo as the driver to connect with MongoDB server via Python. Lately I felt there was something that was slowing down the overall calls to DB and to application server as well. I looked into it and tried to compare PyMongo with other drivers (Java). I came up with two scripts. My Python script looks like this:
import time
import pymongo
import multiprocessing.pool
if __name__ == '__main__':
client = pymongo.MongoClient('mongodb://username:pass@DB_URL2,DB_URL3,DB_URL4,DB_URL5/test-r-demo?replicaSet=rs0&retryWrites=true&readPreference=secondary')
collection = client['test-r-demo']['fpi_user']
TOTAL_OPS = 500000
C_THREADS = 50
def work(collection):
collection.find_one({'phone_number': '03052506670'}, {'id': 1})
with multiprocessing.pool.ThreadPool(C_THREADS) as p:
threads = []
for i in range(TOTAL_OPS):
threads.append(collection)
start_time = time.time()
ret = p.map(work, threads)
end_time = time.time()
print('Total {} operations, with {} threads, took {}s'.format(TOTAL_OPS, C_THREADS, round(end_time - start_time, 3)))
While executing the above script I went on the DB machine turned on mongostat
to observe the query
ops/sec. It barely crosses 1200 mark on each DB machine.
However, the results are totally different when using the Java MongoDB driver:
StressTest.java
import com.mongodb.*;
import com.mongodb.client.MongoClient;
import com.mongodb.client.MongoClients;
//import com.mongodb.MongoClientURI;
import com.mongodb.client.MongoCollection;
import com.mongodb.client.MongoDatabase;
import org.bson.Document;
import org.example.stresstest.StressTestThread;
import java.util.concurrent.ExecutorService;
import java.util.concurrent.Executors;
import java.util.concurrent.ThreadPoolExecutor;
import java.util.concurrent.TimeUnit;
import static com.mongodb.client.model.Filters.eq;
public class StressTest {
public static void main(String[] args) {
int NUM_OPS = 500000;
int NUM_THREADS = 50;
/*
initializing mongo client
*/
MongoClient client = MongoClients.create("mongodb://username:pass@DB_URL2,DB_URL3,DB_URL4,DB_URL5/test-r-demo?replicaSet=rs0&retryWrites=true&readPreference=secondary");
/*
initialize database
*/
MongoDatabase database = client.getDatabase("test-r-demo");
/*
initialize collection
*/
MongoCollection<Document> collection = database.getCollection("fpi_user");
/*
Initialize thread pool executor with fixed threads
*/
ThreadPoolExecutor threadPool = (ThreadPoolExecutor)Executors.newFixedThreadPool(NUM_THREADS);
/*
Loop for retrieving data from database using collection
for this we are creating threads and assign task to each thread
*/
long startTime = System.currentTimeMillis();
for (long i=0; i<NUM_OPS; i++) {
Runnable runnable = new StressTestThread(collection);
threadPool.execute(runnable);
}
threadPool.shutdown();
long totalTime = (System.currentTimeMillis() - startTime) / 1000;
String out = String.format("Completed %1s operations in %2s seconds with a an average of %3s ops/sec.", NUM_OPS, (totalTime), (totalTime / NUM_THREADS));
System.out.println(out);
}
}
StressTestThread.java
package org.example.stresstest;
import com.mongodb.client.MongoCollection;
import org.bson.Document;
import static com.mongodb.client.model.Filters.eq;
import static com.mongodb.client.model.Projections.*;
public class StressTestThread implements Runnable {
private MongoCollection<Document> collection;
public StressTestThread(MongoCollection<Document> collection){
this.collection = collection;
}
@Override
public void run() {
Document myDoc = collection.find(eq("phone_number", "03052506670")).projection(fields(include("id"), excludeId())).first();
//System.out.println(myDoc.get("id"));
}
}
Surprisingly the max ops/sec when using MongoDB’s Java driver, reaches to 13k easily. What could be the potential problem?