Single Facet Query lecture

I got following error when i am using startup database:

In fact, I do not see such db

image

@Habeeba_Naaz_34696

Please go back and review the Chapter 0 Lecture “Atlas Requirement” and the associated lecture notes (at the bottom of the page) which says

Once you've connected, list the existing collections of the aggregations database. Your output should be similar to this one

Note that we are using the aggregations database, and that you are already connected to that, which you can verify with the db command in the shell.

Yes… first I checked db command only…then I got aggregations…but in single facet video… startups database is used…so I switched to startups database by use command and then it’s showing db command as startups only.If I type show collections I got that error.

@Habeeba_Naaz_34696

Ummm… yeah. That’s kind of super unclear, isn’t it? Particularly since Norberto says that “…this is a database (“startups”) and a collection (“companies”) that we’ve been using throughout the course…” – but we haven’t really. [My memory is that this lecture was originally in a different course…]

However, the required information is in the handouts: ‘collections.json’ and in ‘singleQueryFacets.sh’.

If you want to follow along, you’ll need to do a couple of things. First of all, spool up a local mongod instance on your personal system and download the two handouts. Then use the following command to load the matching data into your local instace (be sure to do the command in the same directory where you downloaded the files…)

mongoimport -d startups -c companies companies.json

This will load the startups database with the companies collection, and then you should be able to follow along.

Note that none of the graded assignments use this database or this collection; all the graded work uses aggregations and collections in there. But I agree that it’s better to follow along in detail to really understand what’s going on. Good luck.

3 Likes

Thank you for your information

I don’t have Mongo server installed locally, but I do have the Atlas sandbox cluster I created during M001. With time running out to complete the 2nd week of this course, I didn’t fancy trying to install Mongo server locally, but I should be able to create this collection in my Atlas sandbox cluster, right?

In the end I just watched the videos which used this companies collection without running all the commands myself, which isn’t ideal, but I completed this week’s chapters on time :slight_smile:

But I’d really like to know how to create a collection like this in my sandbox cluster. The command line

mongoimport -d startups -c companies companies.json

gives me an error:

2019-03-24T19:37:05.557+0000    [........................] startups.companies   0B/74.6MB (0.0%)
2019-03-24T19:37:08.556+0000    [........................] startups.companies   0B/74.6MB (0.0%)
2019-03-24T19:37:10.156+0000    [........................] startups.companies   0B/74.6MB (0.0%)
2019-03-24T19:37:10.157+0000    Failed: error connecting to db server: no reachable servers
2019-03-24T19:37:10.157+0000    imported 0 documents

Well, yeah, I haven’t told it where my sandbox cluster is hosted, so it won’t know where to import the data to.

So I created a batch file called importCompanies.bat and ran it from the Windows command shell (not the Mongo shell):

mongoimport ^
	--host <copied from the Hostname field of the favourite in Compass>:27017 ^
	/username:<copied from the Username field in Compass> ^
	/password:<copied from the .bat file I use to connect Mongo shell to my sandbox cluster> ^
	/authenticationDatabase:admin ^
	/db:startups ^
	/collection:companies ^
	companies.json

I’ve tried a few variations on this command line, but they all give me the same “no reachable servers” error. Where am I going wrong?

I also tried an alternative approach, copying the companies.json file as loadCompaniesCollection.js and editing my copy as follows:

Add 3 lines to the beginning:

db = db.getSiblingDB("startups");
db.companies.drop();
db.companies.insertMany([

Add a line to the end:

]);

Replace all occurences of regular expression }\n with },\n (basically add a comma to the end of each document so that I can treat them as members of an array), save the edited file, run the Mongo shell connected to my sandbox cluster and then

load('loadCompaniesCollection.js')

This gives me a different error:

log line attempted (30kB) over max size (10kB)

and:

    "writeConcernErrors" : [ ],
    "nInserted" : 0,
    "nUpserted" : 0,
    "nMatched" : 0,
    "nModified" : 0,
    "nRemoved" : 0,
    "upserted" : [ ]

So I’ve inserted no records due to this error, and insertMany() doesn’t seem to like large JSON documents. Yes, I could do an unordered insert, so that all the documents which aren’t too large for insertMany() get inserted, but then I don’t have the same collection as that used in the lecture.

There must be a way to import this collection into my sandbox cluster, but it eludes me at the moment. Any tips anyone can give me will be greatly appreciated…

You wrote:

and then you try to load the data in your local mongo wiith.

You have to specify the URI of your Atlas sandbox with --uri from:

https://docs.mongodb.com/manual/reference/program/mongoimport/

I’ve tried specifying the cluster to connect to. Yesterday I was trying various combinations of host, username, password etc parameters. Today I’ve tried the --uri parameter, and I’m making limited progress. Now I don’t get an error, it looks like I’ve connected to the cluster, but the import doesn’t seem to be progressing as the output looks like this:

2019-03-25T21:10:56.449+0000    no collection specified
2019-03-25T21:10:56.451+0000    using filename 'companies' as collection
2019-03-25T21:10:59.513+0000    [........................] startups.companies   0B/74.6MB (0.0%)
2019-03-25T21:11:02.510+0000    [........................] startups.companies   0B/74.6MB (0.0%)
2019-03-25T21:11:05.509+0000    [........................] startups.companies   0B/74.6MB (0.0%)
2019-03-25T21:11:08.510+0000    [........................] startups.companies   0B/74.6MB (0.0%)
2019-03-25T21:11:11.511+0000    [........................] startups.companies   0B/74.6MB (0.0%)
2019-03-25T21:11:14.510+0000    [........................] startups.companies   0B/74.6MB (0.0%)
2019-03-25T21:11:17.509+0000    [........................] startups.companies   0B/74.6MB (0.0%)
2019-03-25T21:11:20.510+0000    [........................] startups.companies   0B/74.6MB (0.0%)

I left it running for an hour earlier and it was still on 0B/74.6MB (0.0%) when I came back (and the same thing happens when I specify the collection name on the command line rather than letting it default to the name of the file I’m importing).

What was the URI you used?

My full command line:

mongoimport --uri mongodb://<username>:<password>@<hostname>:27017/startups companies.json

(Where the angle brackets are I’ve used the actual values of course)

I would try replacing --uri with the options --host, --username and --password options.

Alternatively if using --uri, I would try with the mongodb+srv style connection string.

I’ve picked this up again today after completing the course, and I’m still not having much joy.

If I use the --host --username – password options, I always seem to get the “no reachable servers” error.

If I use the --uri option with the mongodb:// style connection string then I get 0B/74.6MB messages repeatedly until I kill the command.

If I use the --uri option with the mongodb+srv:// style connection string then I get an error:

lookup _mongodb._tcp.<my host name>: dnsquery: DNS name does not exist.

So I went back to my JavaScript solution using the loadCompaniesCollection.js file and looked into the error I was getting, the important part of which appears to be

2019-03-31T16:50:27.624+0100 E QUERY    [js] warning: log line attempted (30kB) over max size (10kB), printing beginning and end ... BulkWriteError({
        "writeErrors" : [
                {
                        "index" : 0,
                        "code" : 52,
                        "errmsg" : "$oid is not valid for storage.",
                        "op" : {
                                "_id" : {
                                        "$oid" : "52cdef7c4bab8bd675297d8a"
                                },

This stackoverflow post tells me that I need to do some casting of that _id field before it can be imported.

By this point, I don’t really care whether or not the original _id values need to be preserved, so I went back to my loadCompaniesCollection.js file and did a regular expression find & replace, replacing (my text editor doesn’t seem to recognise [0-9a-f]{24} as a valid regular expression)

"_id" : { "$oid" : "[0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f][0-9a-f]" }, 

with an empty string. And that seems to have worked, or at least I’ve managed to import a collection of 18801 documents, which is more progress than I’ve made before.

Hope this information is of use to someone. I don’t have a complete answer yet, clearly I’m still on a learning curve…