Aggregate protocol entries or not

Steffel86 · January 12, 2021, 8:03pm

Hello together,

we are designing a protocol system for an invoice handling software system.
Within the life cycle of an invoice, many so called protocol entries are created (about 50)

for example:

invoice uploaded
invoice edit
invoice checked for virus
invoice paid
invoice changed ( {newAmount:100$}…)

I want to monitor all this protocol entries for an invoice.
Our system will serve many millions of invoices and there for even more protocol entries

My big question is, how to store that protocol entries in mongodb.

There are two main ideas:

(1)
a mongo db document represents a protocol entry

    {
    "action_datetime": "2007-12-03T10:15:30+01:00",
    "issuer": "myusername-for-compliance",
    "action_trigger": "approval",
    "action_name": "changed",
    "details": {
	"new_amount": 123.45
    },
    "object_id": "12345678-1234-5678-12345-12345678912345",
    "object_type": "document"
}

(2)
a mongo db document represenets an invoice and has a set of protocol entries

    {
    	"invoice_id": "ABC45678-1234-5678-12345-12345678912345",
    	"protocol_entries":
    	[
    		{
    			"action_datetime": "2007-12-03T10:15:30+01:00",
    			"issuer": "myusername-for-compliance",
    			"action_trigger": "approval",
    			"action_name": "changed",
    			"details": {
    			"new_amount": 123.45
    			},
    			"object_id": "12345678-1234-5678-12345-12345678912345",
    			"object_type": "document"
    		},
    		{
    			"action_datetime": "2007-12-03T10:15:30+01:00",
    			"issuer": "myusername-for-compliance",
    			"action_trigger": "payment_system",
    			"action_name": "paid",
    			"details": {
    			"iban": "..."
    			},
    			"object_id": "12345678-1234-5678-12345-12345678912345",
    			"object_type": "document"
    		}
    	]
    }

Using (1) will cause only an insert for a new protocol entry with a reference field to invoice
Using (2) will cause a read and an update

Can you give me a hint where mongodb performs better?

MaBeuLux88 · January 12, 2021, 10:57pm

Hi @Steffel86 and welcome in the MongoDB Community !

Why are you saying that 2 requires a read + an update? You can use upsert I think here.

test:PRIMARY> db.coll.update({invoice_id: 123},{$push: {protocol_entries: {a:1, b:2, c:3}}})
WriteResult({ "nMatched" : 0, "nUpserted" : 0, "nModified" : 0 })
test:PRIMARY> db.coll.update({invoice_id: 123},{$push: {protocol_entries: {a:1, b:2, c:3}}}, {upsert:true})
WriteResult({
	"nMatched" : 0,
	"nUpserted" : 1,
	"nModified" : 0,
	"_id" : ObjectId("5ffe2708142dfb3a77a54e7d")
})
test:PRIMARY> db.coll.update({invoice_id: 123},{$push: {protocol_entries: {a:3, b:2, c:5}}}, {upsert:true})
WriteResult({ "nMatched" : 1, "nUpserted" : 0, "nModified" : 1 })
test:PRIMARY> db.coll.find().pretty()
{
	"_id" : ObjectId("5ffe2708142dfb3a77a54e7d"),
	"invoice_id" : 123,
	"protocol_entries" : [
		{
			"a" : 1,
			"b" : 2,
			"c" : 3
		},
		{
			"a" : 3,
			"b" : 2,
			"c" : 5
		}
	]
}

If it’s the very first update on this invoice, it will create the document (==insert operation). Each following update on this invoice will be an update operation as a matching invoice_id is found.

Also, I think I would recommend the second option ─ based on the information you provided ─ as it’s actually a bucket pattern. It will be fine as long as you know FOR SURE that you will never have a crazy invoice with 1000000 updates as this could go over the 16MB limit for a single document and generate an array way too large to be handled.

If you have 10M invoices, this will avoid to have a 50*10 = 500M documents collection to store these updates. This will help reduce the size of your indexes, etc.

I hope this helps.
Cheers,
Maxime.