MongoDB.live, free & fully virtual. June 9th - 10th. Register Now MongoDB.live, free & fully virtual. June 9th - 10th. Register Now

Mongo 4.0.6 Crashed (OOM). Restarted Not Available and reapplying Statements since beginning of times

Hi All,

2 of my 3 DC not available. On a PSA configuration created a tmp arbiter. (Data node being backed up daily).

This morning primary data node crashed with a OOM (RHEL 7.4). (04:00 am)
Restarted the instance (09:00 am) but still not available. Seems that is replaying the entire content of oplog (??).

REPL     [repl writer worker 1] applied op: CRUD { ts: Timestamp(1585051988, 15)

WiredTigerLAS.wt is considerable big (5GB) as well as the oplog of the restart moment (20GB)(??)

I don’t have a clue about what is happening or how to make the instance available again.

Any ideas?

Thanks,
Jorge

Update:

Datacenter unavailable since 12 March. = Ts 1584021863

2020-04-07T09:30:43.600+0100 W STORAGE  [initandlisten] Detected unclean shutdown - /srv/mongodb/data/mongod.lock is not empty.
**2020-04-07T09:30:43.600+0100 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.**
2020-04-07T09:30:43.601+0100 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=1375M,cache_overflow=(file_max=0M),session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2020-04-07T09:30:44.818+0100 I STORAGE  [initandlisten] WiredTiger message [1586248244:818032][13214:0x7f3622eb6b80], txn-recover: Main recovery loop: starting at 1638/77632896 to 1639/256
2020-04-07T09:30:44.818+0100 I STORAGE  [initandlisten] WiredTiger message [1586248244:818578][13214:0x7f3622eb6b80], txn-recover: Recovering log 1638 through 1639
2020-04-07T09:30:44.873+0100 I STORAGE  [initandlisten] WiredTiger message [1586248244:873744][13214:0x7f3622eb6b80], file:sizeStorer.wt, txn-recover: Recovering log 1639 through 1639
2020-04-07T09:30:44.923+0100 I STORAGE  [initandlisten] WiredTiger message [1586248244:923821][13214:0x7f3622eb6b80], file:sizeStorer.wt, txn-recover: Set global recovery timestamp: 5e6a416700000001
**2020-04-07T09:30:44.988+0100 I RECOVERY [initandlisten] WiredTiger recoveryTimestamp. Ts: Timestamp(1584021863, 1)**
2020-04-07T09:30:44.988+0100 I STORAGE  [initandlisten] **Triggering the first stable checkpoint. Initial Data: Timestamp(1584021863, 1) PrevStable: Timestamp(0, 0) CurrStable: Timestamp(1584021863, 1)**