Write performance drops on 5-config replica, 5 shard (2 arbiter) cluster

Matthew_Zimmerman · April 12, 2020, 11:07pm

In an extremely write heavy intensive job for multiple days (with only ~4 small required indexes) mongodb will stop for minutes and do this over and over writing to disk heavily. Then when it’s done, it starts again but seemingly with more limited performance each time. What is happening and how can I prevent it?

2020-04-12T15:41:34.914-0400 I STORAGE [WTCheckpointThread] WiredTiger message [1586720494:914549][78231:0x7fd3ef33a700], file:collection-17–5766071557703571556.wt, WT_SESSION.checkpoint: Checkpoint has been running for 2021 seconds and wrote: 5435000 pages (179714 MB)

Matthew_Zimmerman · April 23, 2020, 2:38am

Let me clarify since I now understand a little further. This was a 5 member replicaset configured as one shard. I have since moved on from this configuration although the underlying message still appears. To essentially “get around” this, I have moved spun up additional instances/shards on the same physical server, thus the performance penalty of “stopping accepting writes while I write out pages to disk” is somewhat further distributed.

steevej · April 25, 2020, 11:28am

Without more information I would guess that there is not enough memory for your workload.

Matthew_Zimmerman · April 25, 2020, 11:57am

The workload is extreme write heavy with no indexes (will generate those after most content is inserted). Basically I’m trying to figure out why mongodb needs to pause/slow-down to write out checkpoint. Why wouldn’t it be constantly writing these out?

Other than adjusting the write concern journal to false and specifying a high maximum 500ms of https://docs.mongodb.com/manual/reference/configuration-options/#storage.journal.commitIntervalMs what else can I do to make it “batch writes”? I can’t turn off journaling anymore when you cluster (can’t run shards without replicasets (even of 1).

Matthew_Zimmerman · May 6, 2020, 8:22pm

It’s actually the exact opposite. Too much memory let too many dirty pages hang around and then all must be written at the same time. Thank you percona for writing this up: https://www.percona.com/blog/2020/05/05/tuning-mongodb-for-bulk-loads/