Use case for storing pages of text like articles as key:value pairs

anjanesh · June 16, 2020, 12:16pm

I’ve been using MySQL since 2003.
I am now learning nodejs which uses mongodb (on cloud.mongodb.com and not a self-hosted one) as the database.
I understand mongodb is for storing key:value pair data.
I have an app which stores a lot of TEXT data (articles) into the MySQL table.

Is mongodb suitable to storing a page full of text (articles) as a value for a key ?
Is search fast ? (for searching strings in text)
Is the data compressed on mongodb or stored as plain-text ?

slava · June 16, 2020, 12:50pm

Yes. but there is a limitation of 16MB per document (article).
Search is good. If you want it to be fast, then use MongoDB with some text-search engine, like ElasticSearch.
Data is stored as plain text (uncompressed).

anjanesh · June 16, 2020, 4:29pm

I think 16MB per value is a lot, so in that case it should do.
Won’t the database be very big because of plain-text ? $25 per month for a 5GB storage is quite expensive for plain-text.

Stennie_X · June 17, 2020, 7:49am

Welcome to the community @anjanesh!

MongoDB stores structured data in documents. Key/value is an extremely simplified view as values can include more complex types like embedded documents and arrays.

You may choose to store an article or large text blog as a single value, but typically this is not the best approach if you also want to provide a search interface. For example, you would normally want to distinguish title, author, and other metadata from the body of an article. For efficient searching, you also want to consider how to index and prioritise different aspects of your content.

For a great introduction to MongoDB data patterns, I suggest reviewing Building with Patterns: A Summary and taking the free online course M320: Data Modelling at MongoDB University. The latest session of M320 just started this week and you have until August 18 to complete the course.

Search speed depends on several factors including how you’ve modelled your data, what sort of searches you are trying to perform, and the resources of your deployment. For example, if you are trying to perform case-insensitive regular expression matches against large text blobs, performance is unlikely to be acceptable because this will be a resource-intensive scan through all of your documents.

If you have basic text search requirements, MongoDB has a standard Text Search feature which is analogous to a MySQL FULLTEXT index.

If you have more complex text search requirements, definitely look into using Atlas Search which is available for MongoDB 4.2+ Atlas clusters.

If you need suggestions for improving search performance or your data model, I suggest starting a new topic with an example of your documents, indexes, and typical search queries. Please provide specific details and examples in order to get relevant advice.

All modern versions of MongoDB compress data and indexes by default. Storage compression was optional in MongoDB 3.0, but available if you changed the storage engine to WiredTiger (which has been the default storage engine since 3.2).

The limit of 16MB per document represents a significant amount of text. For example, this is about three times as much as The Complete Works of William Shakespeare in text format (ref: Project Gutenberg). If your document sizes are approaching 16MB I would give careful consideration to whether there is a more efficient schema design for your use case.

Atlas Search integrates Apache Lucene, which is the same same search library that Elastic builds on. Atlas Search has been in beta for the last year, but is now officially Generally Available (GA) as of early June.

Regards,
Stennie

anjanesh · June 17, 2020, 10:28am

Thank you for your comprehensive reply Stennie. This is really helpful.

One of my to-do apps is a word database which is available from WordNet (version 2) from :

in MySQL which is 380MB in size (says phpMyAdmin).
I intend to export this to json and push to mongodb.

Is this is a good use-case for mongodb ?

Stennie_X · June 17, 2020, 10:56am

Hi @anjanesh,

This is a great use case for MongoDB, but I would encourage you to think about how your data model might be adjusted to take advantage of MongoDB’s indexing and flexible schema rather than doing a direct 1:1 translation of an existing SQL schema. You could start with a direct translation, but this typically misses out on some benefits like easier querying and better performance.

A general difference in approach with MongoDB is that you should think about how your data will commonly be used rather than how the data will be stored. This is the opposite of traditional RBDMS data model design, where you first design a highly normalised schema and then work out how to write and optimise your queries.

For example, if your word application is built around finding synonyms and antonyms, it might make sense to combine related data in a single MongoDB collection instead of requiring multiple queries or $lookup aggregation to join data. You originally mentioned searching strings in text, so I’m guessing there is a specific subset of data (and type of searching) that you’d like to optimise.

The resources I suggested earlier will be helpful, and you should also check out some of the talks from our recent MongoDB.live conference. I’ve highlighted some of the interesting talks I’ve seen so far in another forum topic (see: MongoDB.live session highlights) and the first two talks happen to be about data modelling:

Regards,
Stennie

anjanesh · June 17, 2020, 11:07am

Thank you Stennie for your reply - I think I’ll go through the tutorials you’ve mentioned and then revisit the application side of it using NodeJS + Express + React.

system · June 20, 2020, 11:07am

This topic was automatically closed 3 days after the last reply. New replies are no longer allowed.