MongoDB dump and restore with docker

| Comments

Here are two commands to take a partial dump of the collection from production database and put it in dev mongo instance running through docker-compose.

docker run -v `pwd`/:/dump mongo mongodump --gzip --archive=/dump/my_collection.agz --host <connection url> --ssl --username <username> --password <password> --authenticationDatabase admin --db <prod_db> --collection my_collection --query '{date: {$gte:  ISODate("2019-02-01T00:00:00.000+0000")}}'

docker-compose run -v `pwd`/my_collection.agz:/my_collection.agz mongo mongorestore --gzip --archive=/my_collection.agz --host mongo --nsFrom <prod_db>.my_collection --nsTo <dev_db>.my_collection

How to find the document with maximum size in MongoDB collection

| Comments

The command to get document size is Object.bsonsize. The next query is to find the document in a small collection, cause it can be slow:

db.getCollection('my_collection').find({}).map(doc => {
    return {_id: doc._id, size: Object.bsonsize(doc)};
}).reduce((a, b) => a.size > b.size ? a : b)

To do this faster with mongo mapReduce:

db.getCollection('my_collection').mapReduce(
    function() {
        emit('size', {_id: this._id, size: Object.bsonsize(this)});
    },
    function(key, values) {
        return values.reduce((a, b) => a.size > b.size ? a : b);
    },
    {out: {inline: 1}}
)

How to find number of MongoDB connections

| Comments

From the MongoDB side the current connections can be found with db.currentOp() command. Then they can be grouped by client ip, counted and sorted.

var ips = db.currentOp(true).inprog.filter(
        d => d.client
    ).map(
        d => d.client.split(':')[0]
    ).reduce(
        (ips, ip) => {
            if(!ips[ip]) {
                ips[ip] = 0;
            }
            ips[ip]++;
            return ips;
        }, {}
    );
Object.keys(ips).map(
        key => {
            return {"ip": key, "num": ips[key]};
        }
    ).sort(
        (a, b) => b.num - a.num
    );

The result will be like this:

[
    {
        "ip" : "11.22.33.444",
        "num" : 77.0
    },
    {
        "ip" : "11.22.33.445",
        "num" : 63.0
    },
    {
        "ip" : "11.22.33.344",
        "num" : 57.0
    }
]

Then if there are several Docker containers on client host, the connections can be found by netstat command in each of them. Suppose there are several MongoDB replicas with ips starting on 44.55... and 77.88..., the command to count all connections to the replicas is:

netstat -tn | grep -e 44.55 -e 77.88 | wc -l

MongoDB select fields after $lookup

When there is a $lookup stage to join a list of large documents, an error Total size of documents in ... matching ... exceeds maximum document size can arrive.

It’s possible to avoid this with $unwind stage right after $lookup. More explanations in the documentation. And then the documents can be regrouped with the required fields.

Order.objects.aggregate(
    {
        '$lookup': {
            'from': 'item',
            'localField': '_id',
            'foreignField': 'order_id',
            'as': 'items'
        }
    },
    {
        "$unwind": "$items"
    },
    {
        "$group": {
            "_id": "$_id",
            "date": {"$last": "$date"},
            "items": {
                "$push": {
                    "name": "$items.name",
                    "price": "$items.price"
                }
            }
        }
    }
)

How to find a change for a field with MongoDB aggregation

For example there is a collection device_status which stores the different states for the devices. The task is to find the devices which passed from off to on at least one time.

{ "device" : "device1", "state" : "on", "ts": ISODate("2018-06-07T17:05:29.340+0000") }
{ "device" : "device2", "state" : "off", "ts": ISODate("2018-06-08T17:05:29.340+0000") }
{ "device" : "device3", "state" : "on", "ts": ISODate("2018-06-09T17:05:29.340+0000")}
{ "device" : "device3", "state" : "shutdown", "ts": ISODate("2018-06-09T18:05:29.340+0000")}
{ "device" : "device2", "state" : "load", "ts": ISODate("2018-06-09T19:05:29.340+0000") }
{ "device" : "device2", "state" : "on", "ts": ISODate("2018-06-10T17:05:29.340+0000") }
{ "device" : "device3", "state" : "off", "ts": ISODate("2018-06-11T17:05:29.340+0000") }
{ "device" : "device1", "state" : "idle", "ts": ISODate("2018-06-11T18:05:29.340+0000") }
{ "device" : "device3", "state" : "on", "ts": ISODate("2018-06-12T17:05:29.340+0000") }
...

The first stage is to sort the data by device and date.

A bootstrap for a microservice based on Flask with MongoDB

Starting a new project is a common task in microservices architecture. To do this it’s better to have a some template. I put my version to the flask-mongoengine-bootstrap repository. The key point are:

  • very basic, only flask, flask-mongoengine and structlog in requirements
  • configuration through environment variables
  • configured logging in JSON format
  • marking log records with request_id
  • possibility to run development version make dev and tests make test through docker
  • a template for Makefile
  • examples of model, api route and tests

If you’ll use it, do not forget to change SECRET_KEY.

1/2 ยป