Write performance drops on 5-config replica, 5 shard (2 arbiter) cluster

In an extremely write heavy intensive job for multiple days (with only ~4 small required indexes) mongodb will stop for minutes and do this over and over writing to disk heavily. Then when it’s done, it starts again but seemingly with more limited performance each time. What is happening and how can I prevent it?

2020-04-12T15:41:34.914-0400 I STORAGE [WTCheckpointThread] WiredTiger message [1586720494:914549][78231:0x7fd3ef33a700], file:collection-17–5766071557703571556.wt, WT_SESSION.checkpoint: Checkpoint has been running for 2021 seconds and wrote: 5435000 pages (179714 MB)

Let me clarify since I now understand a little further. This was a 5 member replicaset configured as one shard. I have since moved on from this configuration although the underlying message still appears. To essentially “get around” this, I have moved spun up additional instances/shards on the same physical server, thus the performance penalty of “stopping accepting writes while I write out pages to disk” is somewhat further distributed.

Without more information I would guess that there is not enough memory for your workload.

The workload is extreme write heavy with no indexes (will generate those after most content is inserted). Basically I’m trying to figure out why mongodb needs to pause/slow-down to write out checkpoint. Why wouldn’t it be constantly writing these out?

Other than adjusting the write concern journal to false and specifying a high maximum 500ms of https://docs.mongodb.com/manual/reference/configuration-options/#storage.journal.commitIntervalMs what else can I do to make it “batch writes”? I can’t turn off journaling anymore when you cluster (can’t run shards without replicasets (even of 1).

It’s actually the exact opposite. Too much memory let too many dirty pages hang around and then all must be written at the same time. Thank you percona for writing this up: https://www.percona.com/blog/2020/05/05/tuning-mongodb-for-bulk-loads/