What is pseudo-random cursor?

What is pseudo-random cursor?

I was looking at the “$sample” manual and the word “pseudo-random cursor” appeared.

So I searched it, but I couldn’t find out.

If all the following conditions are met, $sample uses a pseudo-random cursor to select documents:

  • $sample is the first stage of the pipeline
  • N is less than 5% of the total documents in the collection
  • The collection contains more than 100 documents

Hi @Kim_Hakseon,

The pseudo-random description means that the underlying process of selecting documents for a $sample stage meeting the listed conditions has a deterministic sequence based on an internal seed value from a Pseudorandom number generator (PRNG) rather than being truly random (in the statistical sense).

However, this will be sufficiently random for the purposes of sampling (and successive queries will return different results).

The importance of the section you quoted is that this approach is generally more efficient and scalable than the alternative described immediately after:

If any of the above conditions are NOT met, $sample performs a collection scan followed by a random sort to select N documents. In this case, the $sample stage is subject to the sort memory restrictions.

Wikipedia has a more detailed description of “True” vs pseudo-random.

In particular:

While a pseudorandom number generator based solely on deterministic logic can never be regarded as a “true” random number source in the purest sense of the word, in practice they are generally sufficient even for demanding security-critical applications.

Regards,
Stennie

Thank you for your great explanation.

I was also curious about the difference between the random sort and the pseudo-random while reading the $sample, but both questions have been solved.

It was really interesting.

Thank you very much.
Happy New Year~ :smiley:

1 Like