Mongoengine as_pymongo performance

| Comments

When you need to get only several fields from the list of complex objects, it works much faster with as_pymongo function.

In my situation I have a 16x increase:

%timeit list(SomeObject.objects.scalar('id')[0:500])
# 129 ms ± 11 ms per loop

%timeit list(o['_id'] for o in SomeObject.objects.scalar('id')[0:500].as_pymongo())
# 7.98 ms ± 849 µs per loop

How to find the document with maximum size in MongoDB collection

| Comments

The command to get document size is Object.bsonsize. The next query is to find the document in a small collection, cause it can be slow:

db.getCollection('my_collection').find({}).map(doc => {
    return {_id: doc._id, size: Object.bsonsize(doc)};
}).reduce((a, b) => a.size > b.size ? a : b)

To do this faster with mongo mapReduce:

db.getCollection('my_collection').mapReduce(
    function() {
        emit('size', {_id: this._id, size: Object.bsonsize(this)});
    },
    function(key, values) {
        return values.reduce((a, b) => a.size > b.size ? a : b);
    },
    {out: {inline: 1}}
)

How to find number of MongoDB connections

| Comments

From the MongoDB side the current connections can be found with db.currentOp() command. Then they can be grouped by client ip, counted and sorted.

var ips = db.currentOp(true).inprog.filter(
        d => d.client
    ).map(
        d => d.client.split(':')[0]
    ).reduce(
        (ips, ip) => {
            if(!ips[ip]) {
                ips[ip] = 0;
            }
            ips[ip]++;
            return ips;
        }, {}
    );
Object.keys(ips).map(
        key => {
            return {"ip": key, "num": ips[key]};
        }
    ).sort(
        (a, b) => b.num - a.num
    );

The result will be like this:

[
    {
        "ip" : "11.22.33.444",
        "num" : 77.0
    },
    {
        "ip" : "11.22.33.445",
        "num" : 63.0
    },
    {
        "ip" : "11.22.33.344",
        "num" : 57.0
    }
]

Then if there are several Docker containers on client host, the connections can be found by netstat command in each of them. Suppose there are several MongoDB replicas with ips starting on 44.55... and 77.88..., the command to count all connections to the replicas is:

netstat -tn | grep -e 44.55 -e 77.88 | wc -l

Maximum number of client connections in Flask-SocketIO with Eventlet

| Comments

It’s not mentioned in the docs for Flask-SocketIO that Eventlet has an option max_size which by default limits the maximum number of client connections opened at any time to 1024. There is no way to pass it through flask run command, so the application should be run with socketio.run, for example:

...
if __name__ == '__main__':
    socketio.run(app, host='0.0.0.0', port='8080', max_size=int(os.environ.get('EVENTLET_MAX_SIZE', 1024)))

How to split Celery tasks file

| Comments

Suppose there is a large tasks.py file, like this:

@celery.task()
def task1():
    ...

@celery.task()
def task2():
    ...
    task1.delay()
    ...

...

A good idea is to split it on the smaller files, but Celery auto_discover by default search tasks in package.tasks module, so one way to do this is to create a package tasks and import tasks from other files in __init__.py.

__init__.py

from .task1 import task1
from .task2 import task2

__all__ = ['task1', 'task2']

task1.py

@celery.task()
def task1():
    ...

task2.py

from .task1 import task1

@celery.task()
def task2():
    ...
    task1.delay()
    ...

CSRF exempt for Flask-RESTPlus API

| Comments

The @csrf.exempt method does not work with Resource methods or decorators, it should be done on Api level. Here is an example how to exclude resources from CSRF protection based on class:

def csrf_exempt_my_resource(view):
    if issubclass(view.view_class, MyResource):
        return csrf.exempt(view)
    return view

api_blueprint = Blueprint('api', __name__)
api = Api(api_blueprint, title='My API', decorators=[csrf_exempt_my_resource])

Or for all resources:

api_blueprint = Blueprint('api', __name__)
api = Api(api_blueprint, title='My Private API', decorators=[csrf.exempt])

Function as model column in Flask Admin

| Comments

I like the idea about separation of business logic from models and views into services. There are more details in these slides from the EuroPython talk.

Usually in Flask-Admin for a new column a new method with @property decorator is added and the model becomes fat. Also it’s not good to put something with queries in property. Another way is to put this function to the services and use column formatter in Flask-Admin.

def get_some_data(model_id)
    """Function in services."""
    return RelatedModel.objects(model=model_id).count()

class MyModelView(ModelView):
    column_formatters = {
        'related_model_count': lambda view, context, model, name: get_some_data(model.id)
    }    

    column_list = [..., 'related_model_count']

Docker healthcheck for Flask app with Celery

| Comments

With docker-compose it can be done the next way

flask_app:
    ...
    healthcheck:
      test: wget --spider --quiet http://localhost:8080/-/health
celery_worker:
    ...
    command: celery worker --app app.celeryapp -n worker@%h --loglevel INFO
    healthcheck:
      test: celery inspect ping --app app.celeryapp -d worker@$$HOSTNAME

Where /-/health is just a simple route

@app.route("/-/health")
def health():
    return 'ok'
1/9 »