How to save cucumber feature reports (large JSON files)?

Fabricio_Bedin · February 21, 2021, 7:29pm

Hello everyone,

I’m starting with Mongo and even after reading the documentation I’m not sure if I’m in the right way.

I have more than 100 big JSONs (2 - 150Mb) with the resullt of Cucumber execution to save everyday.

If I save the JSON with GridFs I can’t query in keys inside the JSON, right? So, how can I save big files and keep some information of them to make queries?

Is MongoDB a right choice to save this kind of files?

Thank you in advance, it will help me a lot

steevej · February 21, 2021, 10:49pm

It is a perfect choice.

JSON is the native format for Mongo. Do not save with GridFS. Just use https://docs.mongodb.com/database-tools/mongoimport/ and you will be able to query all the fields.

Fabricio_Bedin · February 22, 2021, 12:31pm

Thank you @steevej

I tried with mongoimport treating each json as a document inside a collection ‘executions’, it worked perfectely in files with less then 16Mb but I’m getting error when I try to import JSONs with more than 16Mb.3

Failed: an inserted document is too large

steevej · February 22, 2021, 2:05pm

I do not know how the files are organized so it is hard to tell. However I suspect that 1 file is 1 document, and the size limit for one document is 16Mb. There is may be a way to split that one document into its sub-documents. For example, if the document looks like:

{ 
  "date" : .... ,
  "log_entries" :
  [
     { "time" : t1 , ... }
     { "time" : t2 , ... }
     ... 16Mb worth of log entries ...
     { "time" : tN , ... }
  ]
}

it is possible to remove the outer braces and brackets and insert only the log_entries.

Could you provide a link to the problematic file? Since it may contains sensible information, I can give you an upload link to my dropbox. May be you can redact the sensible information.

Fabricio_Bedin · February 23, 2021, 7:49pm

thank you @steevej

I’m importing all the documents inside the same collection

the structure is something like this

I’m using in this way to import the documents

here is an example of document that I’m trying to import

gist.github.com

https://gist.github.com/fabriciobedin/521dcb585398d035bd34bda4a0aa69a2

execution_example.json

{
  "datetime": 1613433820,
  "execution_uri": "",
  "execution_type": "bvt",
  "sut": "api",
  "squad": "squad_name",
  "scenarios_id": ["unique scenario ID1", "unique scenario ID2", "unique scenario ID3"],
  "scenarios_uri": [
    "features/gherkins/ms/name/v1/endpoint.feature:126:116:85:95",
    "features/gherkins/ms2/name/v1/endpoint.feature:7",

This file has been truncated. show original

steevej · February 24, 2021, 1:50pm

This is like I suspected. The whole file is a single document. The first fields, from datetime to duration, are some kind of an enveloped shared by features_report.

What I would do is use something like jq to put the envelop fields in one document and then extract each *features_report in separate documents. The insert the envelop document in a collection (say envelops) and then each features_report into another collection (say reports) while making sure each features_report contains a reference to its envelop document.

Alternatively, with jq again duplicate the envelop fields in each features_report.

But honestly, if I look the different features_report, they all look alike. Somewhat akin to an infinite loop or recurring failure.

Fabricio_Bedin · February 24, 2021, 8:16pm

Yes, I think I’ll save it separately like you said.

Actually each document inside features_report key is the same just in this example, but the real execution all of then are completely different.

I was just making sure that saving the entire document wasn’t a choice.

Fabricio_Bedin · February 25, 2021, 12:36pm

These videos helped me a lot to understand more about Mongo concepts.

system · March 2, 2021, 12:48pm

This topic was automatically closed 5 days after the last reply. New replies are no longer allowed.