How to save cucumber feature reports (large JSON files)?

Hello everyone,

I’m starting with Mongo and even after reading the documentation I’m not sure if I’m in the right way.

I have more than 100 big JSONs (2 - 150Mb) with the resullt of Cucumber execution to save everyday.

If I save the JSON with GridFs I can’t query in keys inside the JSON, right? So, how can I save big files and keep some information of them to make queries?

Is MongoDB a right choice to save this kind of files?

Thank you in advance, it will help me a lot :blush:

It is a perfect choice.

JSON is the native format for Mongo. Do not save with GridFS. Just use https://docs.mongodb.com/database-tools/mongoimport/ and you will be able to query all the fields.

Thank you @steevej

I tried with mongoimport treating each json as a document inside a collection ‘executions’, it worked perfectely in files with less then 16Mb but I’m getting error when I try to import JSONs with more than 16Mb.3

Failed: an inserted document is too large

I do not know how the files are organized so it is hard to tell. However I suspect that 1 file is 1 document, and the size limit for one document is 16Mb. There is may be a way to split that one document into its sub-documents. For example, if the document looks like:

{ 
  "date" : .... ,
  "log_entries" :
  [
     { "time" : t1 , ... }
     { "time" : t2 , ... }
     ... 16Mb worth of log entries ...
     { "time" : tN , ... }
  ]
}

it is possible to remove the outer braces and brackets and insert only the log_entries.

Could you provide a link to the problematic file? Since it may contains sensible information, I can give you an upload link to my dropbox. May be you can redact the sensible information.

thank you @steevej

I’m importing all the documents inside the same collection

the structure is something like this

I’m using in this way to import the documents

here is an example of document that I’m trying to import

This is like I suspected. The whole file is a single document. The first fields, from datetime to duration, are some kind of an enveloped shared by features_report.

What I would do is use something like https://stedolan.github.io/jq/ to put the envelop fields in one document and then extract each *features_report in separate documents. The insert the envelop document in a collection (say envelops) and then each features_report into another collection (say reports) while making sure each features_report contains a reference to its envelop document.

Alternatively, with jq again duplicate the envelop fields in each features_report.

But honestly, if I look the different features_report, they all look alike. Somewhat akin to an infinite loop or recurring failure.

2 Likes

Yes, I think I’ll save it separately like you said.

Actually each document inside features_report key is the same just in this example, but the real execution all of then are completely different.

I was just making sure that saving the entire document wasn’t a choice.

These videos helped me a lot to understand more about Mongo concepts.



2 Likes

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.