Mongod crashed on Debian 10 with Fatal Assertion 50853

Hi,

I’m running a single instance of mongod on single node with Debian 10. With package “mongodb-org/buster,now 4.2.3 amd64” installed. Recently mongod crashed with following error.

I found this assertion only once in connection with access issues of the WiredTiger.turtle file.

https://groups.google.com/d/msg/mongodb-user/QQC-HKx8LCo/ph1iHrVEAwAJ

Any suggestions are kindly appreciated! Thanks!

Is there further documentation regarding analyzing such crashes?

Thanks,

Marc

    2020-03-01T17:12:29.261+0100 E  STORAGE  [WTJournalFlusher] WiredTiger error (5) [1583407709:261464][6196:0x7f6e5b1ef700], WT_SESSION.log_flush: __posix_sync, 99: /var/lib/mongodb/journal/WiredTigerLog.0000000003: handle-sync: fdatasync: Input/output error Raw: [1583407709:261464][6196:0x7f6e5b1ef700], WT_SESSION.log_flush: __posix_sync, 99: /var/lib/mongodb/journal/WiredTigerLog.0000000003: handle-sync: fdatasync: Input/output error
    2020-03-01T17:12:32.214+0100 E  STORAGE  [WTJournalFlusher] WiredTiger error (-31804) [1583407712:214632][6196:0x7f6e5b1ef700], WT_SESSION.log_flush: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic Raw: [1583407712:214632][6196:0x7f6e5b1ef700], WT_SESSION.log_flush: __wt_panic, 490: the process must exit and restart: WT_PANIC: WiredTiger library panic
    2020-03-01T17:12:32.214+0100 F  -        [WTJournalFlusher] Fatal Assertion 50853 at src/mongo/db/storage/wiredtiger/wiredtiger_util.cpp 414
    2020-03-01T17:12:32.214+0100 F  -        [WTJournalFlusher] 

    ***aborting after fassert() failure

    2020-03-01T17:12:32.224+0100 F  -        [WTJournalFlusher] Got signal: 6 (Aborted).

Backtrace produced:

----- BEGIN BACKTRACE -----
{"backtrace":[{"b":"55D8C855A000","o":"281F591","s":"_ZN5mongo15printStackTraceERSo"},{"b":"55D8C855A000","o":"281ED8E"},{"b":"55D8C855A000","o":"281EE26"},{"b":"7F6E6233F000","o":"12730"},{"b":"7F6E6217E000","o":"377BB","s":"gsignal"},{"b":"7F6E6217E000","o":"22535","s":"abort"},{"b":"55D8C855A000","o":"CDEF3B","s":"_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj"},{"b":"55D8C855A000","o":"A268A6"},{"b":"55D8C855A000","o":"E617AB"},{"b":"55D8C855A000","o":"A33EC2","s":"__wt_err_func"},{"b":"55D8C855A000","o":"A34326","s":"__wt_panic"},{"b":"55D8C855A000","o":"E32F03"},{"b":"55D8C855A000","o":"E17976","s":"__wt_log_force_sync"},{"b":"55D8C855A000","o":"E1E28B","s":"__wt_log_flush"},{"b":"55D8C855A000","o":"E53A6B"},{"b":"55D8C855A000","o":"DDCB84","s":"_ZN5mongo22WiredTigerSessionCache16waitUntilDurableEbb"},{"b":"55D8C855A000","o":"DBA336","s":"_ZN5mongo18WiredTigerKVEngine24WiredTigerJournalFlusher3runEv"},{"b":"55D8C855A000","o":"26FA63F","s":"_ZN5mongo13BackgroundJob7jobBodyEv"},{"b":"55D8C855A000","o":"294519F"},{"b":"7F6E6233F000","o":"7FA3"},{"b":"7F6E6217E000","o":"F94CF","s":"clone"}],"processInfo":{ "mongodbVersion" : "4.2.3", "gitVersion" : "6874650b362138df74be53d366bbefc321ea32d4", "compiledModules" : [], "uname" : { "sysname" : "Linux", "release" : "4.19.0-8-amd64", "version" : "#1 SMP Debian 4.19.98-1 (2020-01-26)", "machine" : "x86_64" }, "somap" : [ { "b" : "55D8C855A000", "elfType" : 3, "buildId" : "C1E6FA2DCE46DBD4F26AF59B9ECD4DC451A187D5" }, { "b" : "7FFD5C3E5000", "path" : "linux-vdso.so.1", "elfType" : 3, "buildId" : "B89B19527F25345B43708CB3E56B29B343FE85F0" }, { "b" : "7F6E628A3000", "path" : "/lib/x86_64-linux-gnu/libcurl.so.4", "elfType" : 3, "buildId" : "B124C5E8D77B1B3F0CDDBF4E39B1F9132347E16C" }, { "b" : "7F6E62889000", "path" : "/lib/x86_64-linux-gnu/libresolv.so.2", "elfType" : 3, "buildId" : "026C3BA167F64F631EB8781FCA2269FBC2EE7CA5" }, { "b" : "7F6E625A0000", "path" : "/lib/x86_64-linux-gnu/libcrypto.so.1.1", "elfType" : 3, "buildId" : "E4D80B6A27F74CF1ABBD353A72622B7C5FDBA771" }, { "b" : "7F6E6250E000", "path" : "/lib/x86_64-linux-gnu/libssl.so.1.1", "elfType" : 3, "buildId" : "329B528F65883B62C397B42F1F0C3FB55E66C2E5" }, { "b" : "7F6E62509000", "path" : "/lib/x86_64-linux-gnu/libdl.so.2", "elfType" : 3, "buildId" : "D3583C742DD47AAA860C5AE0C0C5BDBCD2D54F61" }, { "b" : "7F6E624FF000", "path" : "/lib/x86_64-linux-gnu/librt.so.1", "elfType" : 3, "buildId" : "5DCF98AD684962BE494AF28A1051793FD39E4EBC" }, { "b" : "7F6E6237A000", "path" : "/lib/x86_64-linux-gnu/libm.so.6", "elfType" : 3, "buildId" : "885DDA4B4A5CEA600E7B5B98C1AD86996C8D2299" }, { "b" : "7F6E62360000", "path" : "/lib/x86_64-linux-gnu/libgcc_s.so.1", "elfType" : 3, "buildId" : "DE6B14E57AEA9BBEAF1E81EB6772E2222101AA6E" }, { "b" : "7F6E6233F000", "path" : "/lib/x86_64-linux-gnu/libpthread.so.0", "elfType" : 3, "buildId" : "E91114987A0147BD050ADDBD591EB8994B29F4B3" }, { "b" : "7F6E6217E000", "path" : "/lib/x86_64-linux-gnu/libc.so.6", "elfType" : 3, "buildId" : "18B9A9A8C523E5CFE5B5D946D605D09242F09798" }, { "b" : "7F6E6293B000", "path" : "/lib64/ld-linux-x86-64.so.2", "elfType" : 3, "buildId" : "F25DFD7B95BE4BA386FD71080ACCAE8C0732B711" }, { "b" : "7F6E62156000", "path" : "/lib/x86_64-linux-gnu/libnghttp2.so.14", "elfType" : 3, "buildId" : "11070FEAA71B4F7C2E5714A61B66028FA86EAE5E" }, { "b" : "7F6E62137000", "path" : "/lib/x86_64-linux-gnu/libidn2.so.0", "elfType" : 3, "buildId" : "93835C08B4818817E355044CEF05F7F5BA573386" }, { "b" : "7F6E61F18000", "path" : "/lib/x86_64-linux-gnu/librtmp.so.1", "elfType" : 3, "buildId" : "F8F137851A6C9F76F2AFB296C77499E3DB004E4B" }, { "b" : "7F6E61EEA000", "path" : "/lib/x86_64-linux-gnu/libssh2.so.1", "elfType" : 3, "buildId" : "4AEBD6D1D4181EACBCA6F6E30CB293A73FF25FD4" }, { "b" : "7F6E61ED7000", "path" : "/lib/x86_64-linux-gnu/libpsl.so.5", "elfType" : 3, "buildId" : "E7463248F4FD5ADA5D53F36A7F11BA66C9A7DA3C" }, { "b" : "7F6E61E8A000", "path" : "/lib/x86_64-linux-gnu/libgssapi_krb5.so.2", "elfType" : 3, "buildId" : "A8A22DB4384DFA17A6A486FF7960DB822976F74C" }, { "b" : "7F6E61DAA000", "path" : "/lib/x86_64-linux-gnu/libkrb5.so.3", "elfType" : 3, "buildId" : "118BE45FCDE6F2645A56C8027EE2F3A25A7EC083" }, { "b" : "7F6E61D76000", "path" : "/lib/x86_64-linux-gnu/libk5crypto.so.3", "elfType" : 3, "buildId" : "699B18B4849021A396E46FF7B435D6D7497649B3" }, { "b" : "7F6E61D6E000", "path" : "/lib/x86_64-linux-gnu/libcom_err.so.2", "elfType" : 3, "buildId" : "DFFD546CDF7248805473C118886139F88BF01415" }, { "b" : "7F6E61D1A000", "path" : "/lib/x86_64-linux-gnu/libldap_r-2.4.so.2", "elfType" : 3, "buildId" : "7A56C455C57C30F696306CA4FE639BAF28FDBBB0" }, { "b" : "7F6E61D09000", "path" : "/lib/x86_64-linux-gnu/liblber-2.4.so.2", "elfType" : 3, "buildId" : "F239F8CFD0087ACCEEECD2E93C5DF56104CFFA76" }, { "b" : "7F6E61AEB000", "path" : "/lib/x86_64-linux-gnu/libz.so.1", "elfType" : 3, "buildId" : "3AF7C4BCEB19B6C83F76E2822B9A23041D85F6D1" }, { "b" : "7F6E61967000", "path" : "/lib/x86_64-linux-gnu/libunistring.so.2", "elfType" : 3, "buildId" : "2B976CABA5F5BF345388917673C45EE626A576D0" }, { "b" : "7F6E617B9000", "path" : "/lib/x86_64-linux-gnu/libgnutls.so.30", "elfType" : 3, "buildId" : "20C08C96D01B993206BCA6CBFC919A5426726BCA" }, { "b" : "7F6E61780000", "path" : "/lib/x86_64-linux-gnu/libhogweed.so.4", "elfType" : 3, "buildId" : "B548A14003EE05ADA36686A3B48D1913BACD540D" }, { "b" : "7F6E61748000", "path" : "/lib/x86_64-linux-gnu/libnettle.so.6", "elfType" : 3, "buildId" : "696C145020FC52F49A604B409E80C0F604514CBE" }, { "b" : "7F6E616C5000", "path" : "/lib/x86_64-linux-gnu/libgmp.so.10", "elfType" : 3, "buildId" : "CF7737ED0FEB1A97D13F3EF9BBAD9AE2E0EEEF48" }, { "b" : "7F6E615A7000", "path" : "/lib/x86_64-linux-gnu/libgcrypt.so.20", "elfType" : 3, "buildId" : "C698702313BFDED270BF0C7C106B38C66AA46982" }, { "b" : "7F6E61598000", "path" : "/lib/x86_64-linux-gnu/libkrb5support.so.0", "elfType" : 3, "buildId" : "C8A3343E37DE6461A09AB7849F62A8C6CF01E551" }, { "b" : "7F6E6158F000", "path" : "/lib/x86_64-linux-gnu/libkeyutils.so.1", "elfType" : 3, "buildId" : "B33B7F30AEA5D2BC14A939FA750862D09A4AC80E" }, { "b" : "7F6E61572000", "path" : "/lib/x86_64-linux-gnu/libsasl2.so.2", "elfType" : 3, "buildId" : "99BF5A225908FD4124228D4F3E19C67D7138144F" }, { "b" : "7F6E61443000", "path" : "/lib/x86_64-linux-gnu/libp11-kit.so.0", "elfType" : 3, "buildId" : "6147AE8F2D6FA2184DA7D46016746D0DF0C77895" }, { "b" : "7F6E61230000", "path" : "/lib/x86_64-linux-gnu/libtasn1.so.6", "elfType" : 3, "buildId" : "9D60C41CEC3F57BC859B75C1E834187E04DF7C99" }, { "b" : "7F6E6120D000", "path" : "/lib/x86_64-linux-gnu/libgpg-error.so.0", "elfType" : 3, "buildId" : "0B8984CF2F0DD4F4901E9100CDB9410D7EBE7930" }, { "b" : "7F6E61201000", "path" : "/lib/x86_64-linux-gnu/libffi.so.6", "elfType" : 3, "buildId" : "9ED5213748F3F5D008D615DFF0368A6E38E1DE55" } ] }}
 mongod(_ZN5mongo15printStackTraceERSo+0x41) [0x55d8cad79591]
 mongod(+0x281ED8E) [0x55d8cad78d8e]
 mongod(+0x281EE26) [0x55d8cad78e26]
 libpthread.so.0(+0x12730) [0x7f6e62351730]
 libc.so.6(gsignal+0x10B) [0x7f6e621b57bb]
 libc.so.6(abort+0x121) [0x7f6e621a0535]
 mongod(_ZN5mongo32fassertFailedNoTraceWithLocationEiPKcj+0x0) [0x55d8c9238f3b]
 mongod(+0xA268A6) [0x55d8c8f808a6]
 mongod(+0xE617AB) [0x55d8c93bb7ab]
 mongod(__wt_err_func+0x90) [0x55d8c8f8dec2]
 mongod(__wt_panic+0x39) [0x55d8c8f8e326]
 mongod(+0xE32F03) [0x55d8c938cf03]
 mongod(__wt_log_force_sync+0x286) [0x55d8c9371976]
 mongod(__wt_log_flush+0xEB) [0x55d8c937828b]
 mongod(+0xE53A6B) [0x55d8c93ada6b]
 mongod(_ZN5mongo22WiredTigerSessionCache16waitUntilDurableEbb+0x2D4) [0x55d8c9336b84]
 mongod(_ZN5mongo18WiredTigerKVEngine24WiredTigerJournalFlusher3runEv+0x106) [0x55d8c9314336]
 mongod(_ZN5mongo13BackgroundJob7jobBodyEv+0x9F) [0x55d8cac5463f]
 mongod(+0x294519F) [0x55d8cae9f19f]
 libpthread.so.0(+0x7FA3) [0x7f6e62346fa3]
 libc.so.6(clone+0x3F) [0x7f6e622774cf]
-----  END BACKTRACE  -----

I/O Check your storage

A backtrace or stack trace is generally only meaningful for a developer to look at the server execution context when an exception is encountered. A stack trace can also be useful to differentiate execution paths for a similar assertion. For example: your error is definitely different from the issue you mentioned in the mongodb-user group.

Developers normally demangle stack traces with the help of debug symbols to map addresses to function calls. See Parsing Stack Traces on the MongoDB source code wiki.

Errors immediately preceding the stack trace are often a useful indication of the problem, but since those error codes and messages are returned directly from system libraries they can be somewhat opaque.

In your case, the key log line is:

2020-03-01T17:12:29.261+0100 E STORAGE [WTJournalFlusher] WiredTiger error (5) [1583407709:261464][6196:0x7f6e5b1ef700], WT_SESSION.log_flush: __posix_sync, 99: /var/lib/mongodb/journal/WiredTigerLog.0000000003: handle-sync: fdatasync: Input/output error Raw: [1583407709:261464][6196:0x7f6e5b1ef700], WT_SESSION.log_flush: __posix_sync, 99: /var/lib/mongodb/journal/WiredTigerLog.0000000003: handle-sync: fdatasync: Input/output error

This log line indicates that the WTJournalFlusher thread encountered an I/O error trying to flush changes to the journal file /var/lib/mongodb/journal/WiredTigerLog.0000000003 using the fdatasync() library function. Since the mongod process was unable to write essential data, the next action is a fatal assertion.

As @chris suggested, you should verify your storage as there may be filesystem or I/O errors.

If you restart mongod after an unexpected shutdown, it will try to recover and continue if possible. If your mongod process is unable to start and the reasons are unclear, please provide any additional log messages from the unsuccessful startup attempt.

Regards,
Stennie

Thanks for quick response. Yes after the restart of mongod the recovery was triggered and completed.
It runs normal now. Is the issue connected to EXT4 FS?
But I will continue to monitor it.

I’m running another single instance of mongod also on a virtual machine with Debian 10 using default ESX file system (but on internal plain ESX server). But without crashes.

Currently I’m reviewing documents concering running MongoDB on virtual Linux machines (Debian/Ubuntu).
Are there other recommended resources?

https://docs.mongodb.com/manual/administration/production-notes/#scheduling-for-virtual-or-cloud-hosted-devices
https://docs.mongodb.com/manual/administration/production-checklist-operations/#hardware

Again thanks for the fast feedback!

The only ext4 issue I’m aware of is SERVER-18314: Stall during fdatasync phase of checkpoints under WiredTiger and EXT4. This was the motivation for adding a startup warning in MongoDB 3.4+ if ext4 is detected for the current dbPath. We have not observed or had reports of similar stalls with XFS.

The issue you encountered was an unrecoverable I/O error which is different from the ext4 stalls that have been observed. In your case I expect the cause may have been a filesystem or hardware error.

Even if mongod successfully recovered after restarting, I would still advise verifying your filesystem and checking for storage errors.

The Production Notes & Operations Checklist you found are the usual general guidance we provide. These notes are aggregated from user feedback and field experience, so although they may not be an issue for all workloads there are common and impactful considerations.

There are also some MongoDB white papers on operational and planning topics that may be of interest.

Regards,
Stennie

It seems that I have the same issue (): 2020-08-02T00:08:05.698+0200 E STORAGE [WTJournalFlusher] WiredTiger error (5) [1596319685:698351][937:0x7f6b9f4c4700], WT_SESSION.log_flush: __posix_sync, 99: /var/lib/mongodb/journal/WiredTigerLog.0000000157: handle-sync: fdatasync: Input/output error

How to chech was there I/O issue on ubuntu os? I ran smartctl to check disk and it said the disk had no errors. Here is a link to reportedBug: https://jira.mongodb.org/browse/SERVER-50069?filter=-2

Hi @firstName_lastName,

In general, please start a new topic if you have a similar problem in a different environment. This will help keep details & discussion for each environment distinct.

The fdatasync: Input/output error message is returned by system libraries, so happens at a lower layer than MongoDB. You may be able to get more relevant advice on an Ubuntu or Linux site (for example, Ask Ubuntu).

The smartlctl utility reports information from the SMART controller for your drive. In my experience the SMART warnings generally aren’t insightful unless your drive is in imminent danger of failing, but look for attributes that are increasing significantly over time or approaching the warning threshold. Most SMART attributes & thresholds are specific to your hard drive vendor and/or model, and are meant to be predictive indicators of failure. You’ll have to compare those with other reports for the same drive models.

Other likely tools to use for Linux I/O issues include fsck and badblocks (which can also be invoked via fsck ). Check the fsck man page for available options in your version. fsck will report (and possibly resolve) filesystem errors, which may be logical errors rather than physical faults reported by smartctl . For example, files can be corrupted due to unexpected system or process restarts with active writes in progress. If your MongoDB instance is hosted in a VM or container, you will also want to check the host drive.

Unfortunately there isn’t much that can be done to repair random corruption in data files: “repair” in those cases generally means skipping over file segments that can’t be read (aka “salvage”), which will result in data loss unless those segments happen to be unused. In SERVER-49317 you mention fsck found and fixed some inconsistencies: blocks of some MongoDB data files may have been repaired to an unexpected state if they were part of the detected inconsistencies.

If there are no obvious errors on your drive or filesystem, another possibility to look into would be unexpected process or system restarts. MongoDB uses journalling and checksums to try to avoid corruption issues, but if you are seeing this problem frequently I would look into the stability of your environment. I would also make sure you are using a recommended filesystem (generally XFS for WiredTiger) mounted locally.

To mitigate risk in a production environment, we recommend deploying a replica set so that you have data redundancy and availability across multiple MongoDB instances (ideally on different physical hosts).

Regards,
Stennie