Client experienced a timeout when connecting to 'm103-repl-2' - check that mongod/mongos

Hi,
I have configured and started second shard m103-repl-2.

shards:
{ “_id” : “m103-repl”, “host” : “m103-repl/192.168.103.100:27001,192.168.103.100:27002,192.168.103.100:27003”, “state” : 1 }
{ “_id” : “m103-repl-2”, “host” : “m103-repl-2/192.168.103.100:27004,192.168.103.100:27005,192.168.103.100:27006”, “state” : 1 }

I can connect to it with admin user
mongo --port 27004 -u “m103-admin” -p “m103-pass” --authenticationDatabase “admin”

validate_lab_shard_collection returns following error :

Client experienced a timeout when connecting to ‘m103-repl-2’ - check that mongod/mongos
processes are running on the correct ports, and that the ‘m103-admin’ user
Thanks in advance

Are all mongods up and running?
3 config servers,6 replica servers and 1 mongos
You connected to one server in m103-repl-2
Can you connect to whole replicaset as below
mongo --host “m103-repl-2/192.168.103.100:27004” -u “m103-admin” -p “m103-pass” --authenticationDatabase “admin”

If everything above is correct it could be genuine timeout issue due to low resources on your system
Have you used wiredtiger memory parameters and authentication enabled
Please check our forum-You will get more details

1 Like

i am also getting the same error .

It seems my server lacks the resources, I have to recreate everything from zero. Even I can’t connect to servers

vagrant@m103:/var/mongodb/config$ mongo --port 27005
-bash: fork: Cannot allocate memory

Hi,

It seems my virtual server resources are full. I need to create everyting from zero.

Best Regards

Hi @chaitanya_07740 and @Bilen_75502,

Please check and confirm on the points mentioned by @Ramachandra_37567 .

Kindly get back to us if the issue still persists.

Thanks,
Muskan
Curriculum Support Engineer

Hi,

I will delete the vm and start from stratch. It seems server have no more resources.

Thanks and Best Regards

I have deleted everything and provision another environment and it works !

1 Like

Hey, team,

I’m getting the same error.

Everything (mongod-repl-[1-6], csrs[1-3], and mongos) is running.

I can connect to the whole replicaset with mongo --host “m103-repl-2/192.168.103.100:27004” -u “m103-admin” -p “m103-pass” --authenticationDatabase “admin”

I ran a loop (while true; do validate_lab_shard_collection; sleep 5; done) as advised in another thread, but it kept timing out for over half an hour. I shut down and restarted all servers and the message still occurred.

Given this has been happening regularly for over a year, and the “fix” is either “loop and pray” or “rebuild and pray”, and the issue at least appears to be resource-based, is there an argument for setting our virtual machines up with more than 2GB memory in the first place?

Cheers,
Sandy

Hi @Sandy_74274,

The issue is most probably due to connection issues or lack of resources.

For now, the solution is to rebuild your replica set/sharded cluster.
Also, can you also try changing the index key that you have chosen?

Please let me know if you still face any issues.

Thanks,
Muskan
Curriculum Support Engineer

Would this not solve the problem?
--host “m103-repl-2/192.168.103.100:27004,192.168.103.100:27005,192.168.103.100:27006”

With respect, if the issue is lack of resources, and we’ve known about this issue for over a year, we should set up the virtual machine with more resources. Learners shouldn’t have to perform gymnastics until enough space opens up in memory because they were told to set up an environment that wasn’t fit for purpose.

I did rebuild it; I guess I was unlucky, but I went through another 15-minute loop with no successes.

I didn’t change the index key to an incorrect key, because that would not have resolved the issue except incidentally.

In the end I got the validation key from someone’s blog. I wonder how many others have quietly done the same.

Can we remove the “Confirmed Solution”? There are a number of threads in this forum confirming that this isn’t a solution at all.

Hi @Sandy_74274,

I would suggest you to do what you have already done in trying and if it all fails, rebuild everything from scratch on the virtual box as the interim/short term measure for the course.
Kill all the mongod processes running, destroy vagrant box and provision and ssh into it again. Then re-launch all your processes.

Also, we are going to raise a ticket to get a better resourced VagrantBox created and added to the course going forward.

Let me know if you face any issues going forward.

Thanks,
Muskan
Curriculum Support Engineer

Things went better this morning; M103 complete.

Thanks for raising the ticket.

Hi @Sandy_74274,

Glad you were able to work it through.

Please feel free to reach out to us if you face any issues further.

Thanks,
Muskan
Curriculum Support Engineer