Rollback during upgrade from 3.6 to 4.0

Hello,
I am trying to upgrade a replicaSet with 4 data bearing members (PSSSA) from version 3.6 to 4.0 (and eventually to 4.2). However during upgrade two members upgraded successfully, but the other two members would not start.

When i try to start the mongod using 4.0 binary, below is the error message i see in mongod.log

2020-04-13T17:38:03.113+0000 I ROLLBACK [rsBackgroundSync] Finished waiting for background operations to complete before rollback
2020-04-13T17:38:03.113+0000 I ROLLBACK [rsBackgroundSync] finding common point
2020-04-13T17:38:03.314+0000 I ROLLBACK [rsBackgroundSync] Rollback common point is { ts: Timestamp(1586118344, 1), t: 167 }
2020-04-13T17:38:03.315+0000 I ROLLBACK [rsBackgroundSync] finding record store counts
2020-04-13T17:38:03.317+0000 I REPL     [rsBackgroundSync] Incremented the rollback ID to 107
2020-04-13T17:38:03.318+0000 I STORAGE  [rsBackgroundSync] closeCatalog: closing all databases
2020-04-13T17:38:03.336+0000 I STORAGE  [rsBackgroundSync] closeCatalog: closing storage engine catalog
2020-04-13T17:38:03.336+0000 I STORAGE  [WTOplogJournalThread] oplog journal thread loop shutting down
2020-04-13T17:38:03.337+0000 F ROLLBACK [rsBackgroundSync] RecoverToStableTimestamp failed.  :: caused by :: UnrecoverableRollbackError: No stable timestamp available to recover to. You must downgrade the binary version to v3.6 to allow rollback to finish. You may upgrade to v4.0 again after the rollback completes. Initial data timestamp: Timestamp(1586118674, 1), Stable timestamp: Timestamp(0, 0)
2020-04-13T17:38:03.337+0000 I ROLLBACK [rsBackgroundSync] Rollback summary:
2020-04-13T17:38:03.338+0000 I ROLLBACK [rsBackgroundSync]      start time: 2020-04-13T17:38:03.108+0000
2020-04-13T17:38:03.338+0000 I ROLLBACK [rsBackgroundSync]      end time: 2020-04-13T17:38:03.338+0000

The error message states that to faciliate rollback, i need to downgrade the binary to 3.6. When i start the mongod using 3.6 binary (3.6.17), i encounter the below error.

2020-04-13T17:28:39.615+0000 W STORAGE  [initandlisten] Recovering data from the last clean checkpoint.
2020-04-13T17:28:39.615+0000 I STORAGE  [initandlisten] wiredtiger_open config: create,cache_size=16384M,session_max=20000,eviction=(threads_min=4,threads_max=4),config_base=false,statistics=(fast),compatibility=(release="3.0",require_max="3.0"),log=(enabled=true,archive=true,path=journal,compressor=snappy),file_manager=(close_idle_time=100000),statistics_log=(wait=0),verbose=(recovery_progress),
2020-04-13T17:28:40.743+0000 E STORAGE  [initandlisten] WiredTiger error (-31802) [1586798920:743663][15291:0x7efdddc6ab80], connection: __log_open_verify, 1028: Version incompatibility detected: unsupported WiredTiger file version: this build requires a maximum version of 2, and the file is version 3: WT_ERROR: non-specific WiredTiger error
2020-04-13T17:28:40.748+0000 E -        [initandlisten] Assertion: 28595:-31802: WT_ERROR: non-specific WiredTiger error src/mongo/db/storage/wiredtiger/wiredtiger_kv_engine.cpp 486
2020-04-13T17:28:40.748+0000 I STORAGE  [initandlisten] exception in initAndListen: Location28595: -31802: WT_ERROR: non-specific WiredTiger error, terminating
2020-04-13T17:28:40.748+0000 I NETWORK  [initandlisten] shutdown: going to close listening sockets...
2020-04-13T17:28:40.748+0000 I NETWORK  [initandlisten] removing socket file: /tmp/mongodb-27032.sock
2020-04-13T17:28:40.749+0000 I CONTROL  [initandlisten] now exiting
2020-04-13T17:28:40.749+0000 I CONTROL  [initandlisten] shutting down with code:100

Now i am unable to start the mongod with neither 4.0 nor 3.6. Did anybody face the same issue or similar issue. If yes, how did you resolve it?

Thanks
Errythroidd.

Hello,
I was able to solve this when i have restarted the mongod process with latest version of 3.6 binary and in standalone mode. Apparently, the mongo member needed to be stopped cleanly and restarted with 3.6 binary to update all the files with 3.6 file version.

Lesson learnt:- Always upgrade to the highest minor version before upgrading to the next version.

Thanks
R

2 Likes