Good morning,
I am new to MongoDB and new here, too. I have a collection of over 25 million documents in a collection. Each document consists of a simple JSON with a numeric id, a URL and an integer to store the status of that document. I was able to use a cursor and the forEach method but I want to test the URL and update the status if it’s an existing URL. It seems that I can’t stop a forEach loop before the whole collection has been processed. A while loop inside the forEach loop also gave me an error at document #1000, cursor not found. Therefore I am using the hasNext() and next() cursor methods instead:
I created this function below:
function iterate(cursor, db, maxDocCount, callBack){
cursor.hasNext((error, result) => {
if(error) throw error;
if(result && docCount < maxDocCount){
cursor.next((error, url2) => {
if(error) throw error;
//*just displaying the URLs for now, eventually will need to test the URL, update the status and only iterate again after the updated (real URL) or skipped (non-existing URL)
console.log(rowIndex, url2);
docCount++;
iterate(cursor, db, callBack);
//*/
});
}else callBack();
});
};
The callback function is this:
() => {
db.close();
console.log("done in " + ((new Date() - start) / 1000) + " sec(s)");
}
The script works fine at first but then it stops at around document #565. It’s sometimes a few records earlier or a few ones later. The callback function never runs, it just stops and returns the command prompt. I have checked the MongoDB log file, it acknowledges the job but says nothing about why it stopped. I am on Windows 10 using Node in the command line.
I intend on using the request package to test the URLs. If the request returned a status code and no error then I want to update the status of that MongoDB document from 0 to 1. If that’s no valid URL I want to skip it. Then I increment the docCount variable and process the next document.
Am I doing something fundamentally wrong with my code? Is there a better way?
Thanks,
Alban