Handling unstructure querying in document

Hi Team,

Working with the mongo aggregation on particular collection which consist of billions of documents with index on its primary keys and/or nested keys. Want to know whether this much of data in document can find over nested object aggregators that have unaware keys in request? If it is how this will results and how much time it will took to get final output. Thinking in example of NLP for unstructured data rendering from document collection. Please give suggestion, will help a lot.

Hi @Jitendra_Patwa,

Welcome to MongoDB community

In order to better answer your question. I would need more details on the data.

Is it unstructured in a way that you don’t know the structure and name of fields or is it dynamic and you can structure the data to better work for querying.

My main point is to understand whether you require an index strategies for :

  1. Wildcard indexing
  2. Attribute Pattern indexing
  3. Text search indexing or Atlas Search

The way you process the incoming data,their types and query pattern should allow us to locate the best method.

Thanks
Pavel

Hi Pavel,

Thanks for describing in more details for my needs. Basically, looking for the option for atlas search based on some aggregated queries which is not structure in documents, likewise NLP does.

Please find below fruit collection that have structured source and data key which consist of fruit suppliers in the country.

[
    {
        "_id": "5bfd1...",
        "source": "Citrus",
        "data": {
            "name": "Orange",
            "color": "Orange",
            "suppliers": {
                "name": "Punjab"
            },
            "quantity": "6T"
        }
    },
    {
        "_id": "5bfd2...",
        "source": "Citrus",
        "data": {
            "name": "limes",
            "color": "Yellow",
            "suppliers": {
                "name": "Gujarat"
            },
            "quantity": "5T"
        }
    },
    {
        "_id": "5bfd3...",
        "source": "Citrus",
        "data": {
            "name": "limes",
            "color": "Green",
            "suppliers": {
                "name": "Gujarat",
                "zone": "North",
                "cities": [
                    "A",
                    "B",
                    "C"
                ]
            },
            "quantity": "5T"
        }
    },
    {
        "_id": "5bfd4...",
        "source": "Tropical",
        "data": {
            "name": "bananas",
            "color": "Yellow",
            "suppliers": {
                "name": "Maharashtra",
                "zone": "NorthEast",
                "vendor": {
                    "vendorname": "Villare"
                }
            },
            "quantity": "5T"
        }
    },
    {
        "_id": "5bfd5...",
        "source": "Tropical",
        "data": {
            "name": "bananas",
            "color": "Yellow",
            "suppliers": {
                "name": "Maharashtra",
                "zone": "South",
                "vendor": {
                    "vendorname": "Robb"
                }
            },
            "quantity": "5T"
        }
    },
    {
        "_id": "5bfd6...",
        "source": "Berries",
        "data": {
            "name": "kiwifruit",
            "color": "Green",
            "suppliers": {
                "name": "TamilNadu",
                "zone": "South",
                "vendor": {
                    "vendorname": "Tamil",
                    "resides": ["Chennai","Thiruchi"],
                    "transport": {
                        "motor": "Truck",
                        "ferry": "Boat"
                    }
                }
            },
            "quantity": "5T"
        }
    }
]

I may want to query that in following ways,
1 Find the berries which supplied by boat in Chennai.
2 Suppliers who supply fruit from Gujarat
3 Find fruit which have 5T quantity supply
…etc

In above 3rd query, we know that in data field have “quantity” node which then find out or match by value 5T.

In 2nd we only know about the name of supplier but we doesn’t know in which name of fields in document it will belongs to also the same in 1st one where boat is in nested object and we only know the value “boat” , “berries” and “Chennai”. Here I faced problem if unstructure field or value is in incoming request how it will be used to query by only text or conditional text statements.

Thanks.

Hi @Jitendra_Patwa,

I am not a 100% sure I got your entire requirement or challenge.

But in Atlas search you can do a dynamic mapping of fields for a collection and therefore allow us to Search for words/text or phrases without necessarily know the fields we search on.

Moreover, we have several operators like wildcard searches or compound where we could point to a root path or subpath that we think the search should be performed.

For example:

[
   {
     "$search": {
       "wildcard": {
         "path": "suppliers*",
         "query": "Boat"
       }
     }
   }

https://docs.atlas.mongodb.com/reference/atlas-search/wildcard

This query with a dynamic index find documents where a value of one of the fields is boat … With the use of compound operator you can have several wildcard operations filters:
https://docs.atlas.mongodb.com/reference/atlas-search/compound

Let me know if this help.

Thanks
Pavel