Contributing spelling fixes to mongo

Would anyone be interested in a subset of these changes?

1 Like

How did you generate this diff? I took a quick glance and I noticed that it’s not strictly spelling changes.

1 Like

Sorry, my link was wrong (too few dots, sorry, hand crafting URLs is dangerous, fixed) – there’s a big difference between .. and ... and it was late when I decided to post to this forum.

The noise you were seeing is because GitHub was showing divergence between master and the branch, instead of only the additive changes.


I have a tool which looks for substrings that are possibly misspelled. I iterate over its output, filtering out files / patterns:

Initially I use Google Sheet’s spell checker to pick replacements. Then I tend to run through the candidates looking for others. From there, I try to apply those changes. (Sometimes this goes a bit wrong.)
Then I try to review the changes to avoid being laughed at (because the above is basically a naive search+replace and can mis-match on a substring).
As I’m reviewing the hunks (which is quite painful), I try to make sure things still make sense / fit (e.g. if someone is using whitespace indentation and the replacement is longer/shorter).

@Daniel_Pasette: is the corrected link more reassuring?

Hi Josh.

Thanks for clarifying. I am interested in accepting the commit, mainly because it improves searchability of the code base, but there is some risk involved and validating the non-comment changes of the commit will be labor intensive. Most of the potential reviewers will be on vacation until Jan 4th, so we’ll get back to you then. Happy New Year and thanks for your contribution.

Dan

2 Likes

Thanks Dan,

  • Josh

Hope your Jan is going well. Let me know if you need anything.

Hi Josh,

I’m so sorry for the long delay in response, I was recently poked by our community manager to respond, which brought this to the top of my inbox.

Some thoughts:

  • Some of the changes reflect choices amongst spellings that we made deliberately (for whatever reason) in the past. In particular I’m thinking of “retriable” vs “retryable”
  • A non-trivial number of changes change the program output, either through changing attribute names in log lines, or the output of js tests. Some of those might be worth changing, but non-comment changes will need real review
  • Given how wide the change set is, and how easy it is to introduce them, I worry that this kind of change is akin to code formatting. I.e. something difficult to maintain and get right without ongoing tooling
    • Another way this is like code formatting is that if we only do this on master, and if we do it outside comments, we’re likely to create issues with backports unless we also backport the corrections.

My feeling is that this kind of change is only worth accepting if we can work it into our tooling and make it durable. If we can’t, I’m skeptical that it will be worth the .

I am going to discuss with the development team on how hard it would be to work this kind of spell checking into lint, but we would need to invest in producing such a tool.

2 Likes

Hi Daniel (and community manager),

word choices

As I leave individual words in individual commits, it’s fairly trivial for me to drop ones that are rejected by a project, and my tooling offers a place for such things (allow.txt enables one to supplement the dictionary, so I would just add a line with retriable and stop seeing it going forward).

real review

  • Certainly. I’d be really disappointed if any large project took a large change like this without review.
    Given its size, I would expect that to take a considerable period of time.
  • I’m happy to help to split the changes up. I can split by directory, file type, or change type (change type obviously gets a bit harder for larger change-sets).
    • Obviously, I wouldn’t want to do that until the project decides it’s interested and suggests which means it would like the work split. (I’d start by updating the branch to some designated point, I presume master, but you could suggest something else…)

tooling

Indeed, it’s really best to deploy some CI for this purpose once a project accepts a PR like this so that the codebase doesn’t revert.

I’m actively developing a tool for this purpose:

I tend to offer the CI after the fact, of the form “if you liked these fixes, you could use this tool to keep your repository clean”, but I try not to push the CI too hard (and typically only offer it if there seems to be a real interest, as opposed to just a neutral response to a submission), which is why it isn’t a core part of my spelling fix offerings.
For mongodb, I initially omitted mention entirely because I was trying to follow the submission guidelines and given how large the change was, it seemed better to first get a sense of whether there was general interest in spelling fixes. – That’s sort of the stage of your current reply, trying to decide if it’s worth it at all.

I have a few features coming for future versions that will make it much easier to use (automatically recommending files to exclude / automatically skipping files, offering a way to update a user’s branch’s word lists)
You can see some of how the tool works here:

The configuration is pretty flexible, and I’m generally fairly responsive to feedback/input.

Fwiw, it’s becoming easier and easier for me to update change-sets like this one – I just updated apache/hive which is of the same size in terms of corrections.

lint

I did initially write a tool that integrated with Travis, but I’ve found that it’s a lot easier for most users if the output is straight in GitHub. If a project wants to work with me on adapting my tooling for some other system, I’m happy to try.

wrt CIs other than GitHub Actions, eventually you should be able to use nektos/act to run check-spelling in some other CI system (I’ll need to think through how to handle its output, as right now its primary output is GitHub comments, which won’t work if the tool isn’t being run by GitHub) – I can’t predict precisely when although I spent some time this weekend trying to further its support.