Low performance while reading data from collection with archive

Denis_Stogniy · January 19, 2021, 8:45am

Hi there!

In my app I have history collection. I moved old records (older than 1 year) into DataLake archive, changed connection string to read data from both archive and common collection.

Is it ok that performance decreased even when reading unarchived data? Despite the fact that the query is executed using an indexed field and this datetime field is the key for archiving.

Pavel_Duchovny · January 19, 2021, 9:20am

Hi @Denis_Stogniy,

Welcome to MongoDB community

I would suggest with specific atlas workload problems to open a support case. Our team has better visibility into your clusters config and logs.

When opening a case provide run timings and specific cluster details as well as query explain plans.

Thanks
Pavel

Benjamin_Flast · January 19, 2021, 2:32pm

Hey @Denis_Stogniy ,

Thanks for raising this, I can probably shed a bit of light on this, but it also makes sense to open a case if you’d like some deeper analysis.

Regarding the performance on the “federated collection” (i.e. targeting archived and cluster data together), you should expect to see lower performance than connecting directly to your cluster but the degree of the performance impact is based on the type of query and how you optimized the archive.

One example would be a “streaming query”, something like a “find()”. We’ll start returning data as soon as the underlying storage returns it, so data coming back from the cluster will be immediately returned to you, and then data coming from the archive will be next (most likely). There will be a minor increase in latency as the data has to go from the cluster to the federated endpoint but it should be minimal.

On the other hand, a “blocking query” like a “sort” that requires all relevant data from the cluster and the archive to be brought together is going to be as slow as the slowest tier of storage queried which will most likely be the archive and that can be significantly slower than your cluster.

The last piece to remember is that when you setup Online Archive you select “Query Fields”. Queries that utilize those fields will have improved performance on the archival data, so a “find” on a field that was identified as a query field should perform better than a find on a field that was not identified as a query field.

I’m the PM for Online Archive and am happy to discuss further if it’s helpful, you can reach me at benjamin.flast@mongodb.com.

Best,
Ben

Max_Virchenko · August 18, 2022, 12:05pm

even without DataLake archive (I just connect to the cluster through the federation) i get the same performance as with DataLake

for example request through the federation 2 sec(without DataLake archive), same directly 100 ms

Benjamin_Flast · August 18, 2022, 12:26pm

Hey @Max_Virchenko that is expected behavior. When connecting to your cluster through the federation layer we see a bit of additional latency due to additional network hops and various other steps that occur. We see somewhere between 1 and 2 additional seconds of additional latency for any basic query through data federation, and that can go higher when combining data from multiple clusters.

system · August 23, 2022, 12:27pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.