Searching for Blacklisted emails in huge collections

Hey,

I want to import (really) large amounts of users into a collection, after checking if their email appears in a blacklist collection. The blacklist collection is going to be very large also (talking about hundreds of thousands of documents each).

I’m trying to figure out the fastest ways to approach this problem, so far I have these possiblities in mind:

  1. Iterate each user before using InsertMany, and doing a find() on the blacklist collection. This seems like it would be very slow for large collection sizes.
  2. Before the import, query the database for the entire blacklist collection, and store them server side in an array. Then iterate over that array before inserting.
  3. Hack it - in my users collection I could also be uploading the “blacklist” users, with a flag such as “blacklisted: true”, and since the email is a unique field, attempting to insert any blacklisted user will return a duplication error. This feels like it could be very fast, but also messy, and lacks proper reporting.

So, are there any other potential solutions to my problem, and which solution should produce the fastest query results?

This might not work in all work load. But if the goal is to determine at run-time if a user email is blacklisted or not I would:

  1. Have one user collection with an index on the email address
  2. Have one blacklist collection with an index on the email address
  3. Use $lookup on both list with the email address, if $lookup is not empy the email is blacklisted