Lab 3.1. $group and Accumulators: problem with regex

Hi there:

I’m trying to write a regex in order to create a variable called numOscars which I can later use as a filter ($gte = 1). Something like this:

/[0-9]{1,2}/

Does this make sense? (Both the try and the approach)

I’m getting 0 documents.

I guess filtering by documents that just include the word Won within awards would be enough, but I would like to know if my approach is feasible.

Remind me, does this particular lab require the use of regex? I’m logged in from my phone at the moment so I can’t check just yet.

Well, not really, but I have found several references to the usage of regex across the forum, including a post endorsed by the staff:

And googling an error message I got regarding $regex I even found a solution for this lab in Stack Overflow…

Anyway, not a close friend of regex, so if I can avoid their usage I’d be grateful…

Ok. I remember this lab now. Yes Regex is the way to go here.

That means you’re sorted?

Well, I’d like to check if my way is a good way.

Your Regex doesn’t know if it’s a win or a loss.

Hi @JavierBlanco,

Your approach here is correct. You will have to use regex to figure out “all the films that won at least 1 Oscar”.
Hint: As mentioned by @007_jb, you have to consider Won as well as number of Oscars won in the regex matching criteria.

Please let me know, if you have any questions.

Thanks,
Sonali

OK, I see strings might be like:

"awards" : "Won 1 Oscar. Another 2 nominations."

So I have changed mi regex to just /^Won/ in order to get all the movies that actually won oscars, and now I’m getting 1,262 documents, that seems to be a wrong number according to what I’m reading in the forum… But I can’t figure out why; if Won appears within awards, does it not mean that, at least, the film won 1 oscar?

Edit: OK, now I see that strings might be also like:

"Won 1 Golden Globe. Another 3 wins."

And now I’m getting 913 documents; according to the forum, it seems I’m missing just one…

You’re getting there. All you need to do is focus on creating a Regex that will meet the hint here… focus on this format alone:


Even more hints: Notice the similarities, number variations, and word variations

1 Like

It was a minor change in my regex, from [1-9]{1,2} to \d+, not sure why first one gets 913 docs and second one gets 914 docs.

That’s still not following the convention that’s required. Your code will match on sentences like:

1 win
13 loses
Won 1 Oscar
… etc

All your regex is doing now is matching one or more digits ranging between 0-9. Unless you’ve accounted for the words to the left and right of it?

Yes, I mean, my actual regex is more complex, but didn’t want to spoil the solution.

Please redact that bit from your last post then :wink:

Why does the standard deviation differs when you put the regex expression (based on the correct answer) as (1) with the $regex operator and (2) without the $regex operator as per the correct answer? Both gets 914 items.

I believe you missed “West Side Story” where it won 10 Oscars. :slight_smile: