Inconsistency between size and storage size

I refactored my data model in order to reduce document size, and thus managed to gain about 20% in “size” per document. However looking at “storage size” the collection containing documents in the new structure weighs more than the original one!!

Even for a single document: old structure collection has size 30814 and storage-size 24576 while new structure collection has size 24864 and storage-size 28672

I am using WiredTiger + zlib, on MongoDB v4.2.13

Example documents (old, then new structure, I had to take out most of the data for it these to fit into this message, there should be 440 subkeys instead of 10 in each subsection):

{ “_id” : { “pi” : 1, “rn” : “1”, “vi” : “17917f26055000000000” }, “_class” : “R”, “ai” : { “AC” : 137, “MQRankSum” : 0.0, “filt” : “PASS”, “MQ” : 60.0, “AF” : 0.173, “InbreedingCoeff” : 0.8784, “MLEAC” : 144, “BaseQRankSum” : 1.75, “ExcessHet” : -0.0, “MLEAF” : 0.181, “DP” : 6018, “ReadPosRankSum” : 0.508, “AN” : 794, “FS” : 0.0, “QD” : 29.26, “SOR” : 0.723, “qual” : 39861.1, “ClippingRankSum” : 0.0 }, “ka” : [ “G”, “A” ], “rp” : { “ch” : “chr1”, “ss” : { “$numberLong” : “45229” } }, “sp” : { “1” : { “gt” : “0/0”, “ai” : { “AD” : “6,0”, “GQ” : 18, “DP” : 6, “PL” : “0,18,218” } }, “2” : { “gt” : “0/0”, “ai” : { “AD” : “6,0”, “GQ” : 15, “DP” : 6, “PL” : “0,15,225” } }, “3” : { “gt” : “0/0”, “ai” : { “AD” : “12,0”, “GQ” : 33, “DP” : 12, “PL” : “0,33,495” } }, “4” : { “gt” : “0/0”, “ai” : { “AD” : “16,0”, “GQ” : 48, “DP” : 16, “PL” : “0,48,569” } }, “5” : { “gt” : “1/1”, “ai” : { “AD” : “0,32”, “GQ” : 96, “DP” : 32, “PL” : “1085,96,0” } }, “6” : { “gt” : “0/0”, “ai” : { “AD” : “6,0”, “GQ” : 15, “DP” : 6, “PL” : “0,15,225” } }, “7” : { “gt” : “0/0”, “ai” : { “AD” : “7,0”, “GQ” : 21, “DP” : 7, “PL” : “0,21,240” } }, “8” : { “gt” : “0/0”, “ai” : { “AD” : “14,0”, “GQ” : 39, “DP” : 14, “PL” : “0,39,585” } }, “9” : { “gt” : “0/0”, “ai” : { “AD” : “8,0”, “GQ” : 24, “DP” : 8, “PL” : “0,24,307” } }, “10” : { “gt” : “0/0”, “ai” : { “AD” : “12,0”, “GQ” : 33, “DP” : 12, “PL” : “0,33,495” } } }, “ty” : “SNP”, “v” : { “$numberLong” : “0” } }

{ “_id” : { “pi” : 1, “rn” : “1”, “vi” : “179138f0da6000000000” }, “M” : { “AD” : { “1” : “6,0”, “2” : “6,0”, “3” : “12,0”, “4” : “16,0”, “5” : “0,32”, “6” : “6,0”, “7” : “7,0”, “8” : “14,0”, “9” : “8,0”, “10” : “12,0” }, “GQ” : { “1” : 18, “2” : 15, “3” : 33, “4” : 48, “5” : 96, “6” : 15, “7” : 21, “8” : 39, “9” : 24, “10” : 33 }, “DP” : { “1” : 6, “2” : 6, “3” : 12, “4” : 16, “5” : 32, “6” : 6, “7” : 7, “8” : 14, “9” : 8, “10” : 12 }, “PL” : { “1” : “0,18,218”, “2” : “0,15,225”, “3” : “0,33,495”, “4” : “0,48,569”, “5” : “1085,96,0”, “6” : “0,15,225”, “7” : “0,21,240”, “8” : “0,39,585”, “9” : “0,24,307”, “10” : “0,33,495” } }, “_class” : “R”, “a” : [ “G”, “A” ], “g” : { “1” : “0/0”, “2” : “0/0”, “3” : “0/0”, “4” : “0/0”, “5” : “1/1”, “6” : “0/0”, “7” : “0/0”, “8” : “0/0”, “9” : “0/0”, “10” : “0/0” }, “i” : { “AC” : 137, “MQRankSum” : 0.0, “filt” : “PASS”, “MQ” : 60.0, “AF” : 0.173, “InbreedingCoeff” : 0.8784, “MLEAC” : 144, “BaseQRankSum” : 1.75, “ExcessHet” : -0.0, “MLEAF” : 0.181, “DP” : 6018, “ReadPosRankSum” : 0.508, “AN” : 794, “FS” : 0.0, “QD” : 29.26, “SOR” : 0.723, “qual” : 39861.1, “ClippingRankSum” : 0.0 }, “p” : { “0” : { “ch” : “chr1”, “ss” : { “$numberLong” : “45229” } } }, “t” : “SNP” }

See https://docs.mongodb.com/manual/faq/storage/#how-do-i-reclaim-disk-space-in-wiredtiger-

I am pretty sure that it is the same for documents that shrinks in size. It is more efficient to keep the unused space allocated to the file. However, by having smaller documents, more of them fits in RAM so you have a bigger working set.

There are ways to claim back the unused space from within a file. It should be indicated how in the link above.