Bucket Pattern: Gotchas with buckets

I would like more information on why random insertions and/or deletions in buckets are a problem.
Take for example, the following. IoT devices that measure temperature, humidity,etc. There is a document per device per hour.
{
“device_id”: …,
"type: …,
temp: { 0: 37, 1: 37, 2: 35 },
humidity: { 0: 70, 1: 67, 2: 65}
}

What are the negative effects of adding a new field and value every minute to embedded documents temp and humidity? I know it’s not exactly bucketing, but I want to know if its a trade-off I would take. In my case, data per minute needs to be available for analytics but I want to know if bucketing per hour is also an option for me if I want data per minute available.

Stephen,

Thanks for pointing it out.
This slide needs some clarification, which we will do when we re-record this lesson.

The use case you are bringing up is perfectly valid and it is often how these documents will be constructed.

Adding one piece of data per minute does not qualify as “frequently”, which we mention as a possible gotcha in the lesson. “Frequently” should appear on the slide.

As for bucketing per day, per hour, … it depends on your queries.
If you bucket per hour and have values per minute, these values per minute are still easy to access. You can use a simple array or you can use a dictionary with the minute number has the key.
A counter example would be to bucket per day, keeping values per minute in the array and most of the time needing values per hour. In this case, bucketing per hour may make more sense or alternatively using the computed pattern to add hourly metrics in the daily document.

Regards,
Daniel.

2 Likes