Lab - Writes with Failovers answer is wrong

  • The write operation will always return with an error, even if wtimeout is not specified.

If wtimeout is not specified, the write operation will be retried for an indefinite amount of time until the writeConcern is successful. If the writeConcern is impossible, like in this example, it may never return anything to the client.

This is not correct. I removed the wtimeout and got the same error. I feel like I should not have failed this lab. My answer was correct.

Without discussing specifics around this lab, please share the following:

  1. A screenshot of the prompt + command + error message
  2. Output of rs.status()

Ok, the question is a little ambiguous. It returns an error, but not a “writeConcernError”. So a bit of a trick question.

db.new_data.insert({“m103”: “very fun”}, { writeConcern: { w: 3}})
2020-03-26T16:55:20.756+0000 W NETWORK [ReplicaSetMonitor-TaskExecutor-0] Failed to connect to 192.168.103.100:27013, in(checking socket for error after poll), reason: Connection refused
2020-03-26T16:55:50.757+0000 W NETWORK [ReplicaSetMonitor-TaskExecutor-0] Failed to connect to 192.168.103.100:27013, in(checking socket for error after poll), reason: Connection refused

^C2020-03-26T16:56:03.770+0000 I CONTROL [main] shutting down with code:0

In actual fact the question is unambiguous.

You needed to:

  1. shutdown one node and have two nodes running
  2. reconnect to the replica set

But what you potentially did was shutdown the primary node that you ran the command from. Remember that when you shutdown a primary a re-election happens in which case you will no longer be connected to the replica set.

Confusing, but ok.

Basically, those aren’t error messages, they’re Warning (W) and Informational (I) messages. That’s what the “W” and “I” stand for, telling you that a connection can no longer be established on port 27013.

Consider the following topology:
27011 - Secondary
27012 - Secondary
27013 - Primary

If you connect to the replica set (--host replicaSetName/192.168.103.100:27011), you will automatically be re-directed to the Primary node 27013. Shutting down a Primary node will result in an election of a Primary. However, shutting down a Secondary node does not result in an election.

So in this lab, you should have shutdown a Secondary node instead. Always double check rs.status() before you shutdown a node.

1 Like

Hi @ryandpardey,

I hope you found @007_jb’s response helpful. If you are still having any doubts then please feel free to get back to us.

~ Shubham

Hi Shubham Ranjan,

Regardin the detailed explanation of the answer: “… When a writeConcernError *occurs, the document IS STILL written to the healthy nodes.
This is correct. The WriteResult object simply tells us whether the writeConcern was successful or not - it will not undo successful writes from any of the nodes.”

I think this assertion *is still written" is too strong, considering what the mongodb manual says about this in https://docs.mongodb.com/manual/core/replica-set-write-concern:

" A write operation that times out waiting for the specified write concern only indicates that the required number of replica set members did not acknowledge the write operation within the wtimeout time period. It does not necessarily indicate that the primary failed to apply the write. The data MAY exist on a subset of replica set nodes at the time of the write concern error, and can continue replicating until all nodes in the cluster have that data."

Even understanding that write operations and concerns are dissociated in the sense above explained, see there is is a subtle difference between IS STILL and MAY

Considering that the options are within the context of the lab simulation, perhaps it should say:

When the writeConcernError occurs, the document is still written to the healthy nodes.

In addition, the assertion here is a post error action whereas the excerpt from the doc that you’re referencing, "The data MAY exist on a subset of replica set nodes at the time of the write concern error", refers to the state of the nodes at the time an error occurs. Therefore, the statement from the lab is more aligned to "and can continue replicating until all nodes in the cluster have that data".

If this lab option were a generalised statement, then I would agree that “is still” is an overquantified assertion.

I think the wording of the lab answers is ambiguous. The debate above is ample evidence of this. If the purpose of the question is to verify students’ knowledge, I don’t believe it is effective.

I would like to register a vote for more clarity.

Hi @sma907,

Thanks for sharing the feedback.

I will have a quick sync with the team on this and see if we can make it more clear.

~ Shubham

1 Like

In the lectures on write concerns it is not specified what happens when the wtimeout is not specified, so answering this question is not really possible after only following the lectures.

Hey @Milan_de_Jong, I think you’re referring to this one:

“This write now blocks until the secondary comes back online which may take longer than is acceptable.”

I’m going to jump on the bandwagon. I too got this wrong and still don’t understand why.

If only two of the three nodes are online, why is wtimeout is of any concern regarding whether I’ll get an error? Wouldn’t the w of 3 give an error in any situation?

Also, should I have gotten an in-browser IDE with this page?