How the data come from Journal to WiredTiger

Hi Experts,
I have 1 question about the flow of data from MongoDB.
The scenario:

  1. Use MongoDB version: v4.2.8
  2. Put 50k documents to MongoDB.
  3. Check data in Journal and WiredTiger.

What I see in the data folder:

  • There are about 2k document in collection-8–8808730079379782049.wt
  • There are about 48k document in WiredTigerLog.0000000007

Could you please explain this case? I am expecting that the data will come all to WiredTiger after interval 60s from the document.

Hi @Duc_Bui_Minh and welcome in the MongoDB community :muscle:!

I did a little test.

First, I started a for loop and inserted 10K docs in my test.col collection. I did this at 22:40 and some seconds. Then took a screenshot at 22:41:09.

As you can see in this first screenshot, the WiredTigerLog.0000000003 has been update a few seconds ago but the collection and index files have not been changed even if I’m actively writing to MDB at this moment in time.

After a few seconds, my write was done and nothing changed. It was similar to the image above.

Then, after 30 seconds, I’m guessing MongoDB reached a checkpoint and I took the following snapshot:

My understanding is that MongoDB flushed its WiredTiger journal and its content went into the collection and indexes files (my collection has 2 indexes, _id and name).

As you can see, MongoDB’s journal size didn’t really change because of the pre-allocation. Also my current MongoDB is running with snappy’s compression algo so the file sizes are affected by this too.

So yes, from what I see, MongoDB is flushing its journal every 60 seconds or 2GB.

It’s also in the documentation.

So to answer your question, I think you are confused because of the pre-allocation. Your journal file is probably empty after 60sec if you stopped writing to MongoDB.

You can probably double check this by monitoring the file size like I did with watch -n 0.1 ls -l or something similar.

Cheers,
Maxime.

4 Likes

Hi @MaBeuLux88,
Awsome. Thanks so much for your detailed response.
I have checked again and all data come into WiredTiger after the interval 60s.

1 Like

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.