use codegen to generate doument validation function. #139

colprog · 2014-10-30T03:02:33Z

this is significantly faster then old code for non trivial schema.

…antly faster then old code.

neumino · 2014-10-31T04:41:29Z

I like the idea of spiting validate in smaller chunks, I was thinking of spliting it in a different way though (group the check for undefined/null together).

That being said, I have a few questions:

It is really faster? Is validate a bottleneck for your use case now?
What's the reason behind using generate-function? To have less code?

colprog · 2014-10-31T07:38:28Z

In my current project. thinky is a major bottleneck, it is easily taking 70% of my cpu time. and validation is one of the hot spots both when parsing and saving.

I'm using generate-function to dynamically generate validation function for each model specifically, so for a simple model like this:

var User = thinky.createModel("user", {
        auth: [{type: String, id: String, password: String}],
        joinDate: {_type: Date, default: function () {return new Date();}},
        activated: {_type: Boolean, default: true},

        manRole: {admin: Boolean, editUser: Boolean, data: Boolean, announce: Boolean, debug: Boolean}
});

it generates a tailored validation function :

function _validate_user(doc, prefix, options) {
    var optionStack = [options];
    validateDocIsArray(doc.auth, options, prefix+'.auth');
    if (Array.isArray(doc.auth)) {
        for (var i=0;i<doc.auth.length;i++) {
            if (doc.auth[i] === undefined) throw new Error("The element in the array .auth (position "+i+") cannot be `undefined`.");
            validateDocNotNull(doc.auth[i], options, prefix+'.auth[i]');
            if (util.isPlainObject(doc.auth[i])) {
                validatePrimitive(doc.auth[i].type, options, 'string', prefix+'.auth[i].type', undefined, undefined);
                validatePrimitive(doc.auth[i].id, options, 'string', prefix+'.auth[i].id', undefined, undefined);
                validatePrimitive(doc.auth[i].password, options, 'string', prefix+'.auth[i].password', undefined, undefined);
            }
            validateExtraFields(doc.auth[i], schemas[".auth[i]"], options, prefix+".auth[i]");
        }
    }
    validateDate(doc.joinDate, options, prefix+'.joinDate');
    validatePrimitive(doc.activated, options, 'boolean', prefix+'.activated', undefined, undefined);
    validateDocNotNull(doc.manRole, options, prefix+'.manRole');
    if (util.isPlainObject(doc.manRole)) {
        validatePrimitive(doc.manRole.admin, options, 'boolean', prefix+'.manRole.admin', undefined, undefined);
        validatePrimitive(doc.manRole.editUser, options, 'boolean', prefix+'.manRole.editUser', undefined, undefined);
        validatePrimitive(doc.manRole.data, options, 'boolean', prefix+'.manRole.data', undefined, undefined);
        validatePrimitive(doc.manRole.announce, options, 'boolean', prefix+'.manRole.announce', undefined, undefined);
        validatePrimitive(doc.manRole.debug, options, 'boolean', prefix+'.manRole.debug', undefined, undefined);
    }
    validateExtraFields(doc.manRole, schemas[".manRole"], options, prefix+".manRole");
    validateExtraFields(doc, schemas.root, options, prefix);
}

thus making it fast. In my micro benchmark of parsing a User document. validation function runs 10x faster. making parsing 30% faster. this performance gain should be larger with a more complex schema.

neumino · 2014-11-01T02:59:07Z

Hum, I don't get why generating a function would make thing faster. Do you have any hint on that?

From what I can see, the reason it's faster is because the function __validate is currently not optimized by v8. When breaking it in smaller chunks, some of these happen to be optimized by v8.

I did spend quite some time optimizing rethinkdbdash for v8, but didn't do it for thinky because it was some work and no one had this problem before.
I just opened #141 to track these optimizations.

colprog · 2014-11-01T09:01:52Z

That's great. I enjoys using thinky but now it is way too slow for production use.

the generated version not only is more jit friendly, it also avoids the overhead of traveling the schema every time. it only contains what really needs to be done. I would think the latter is more important here. since thinky's schema language is flexible, __validate contains a lot of type checking code which are all avoided in the generated version.

This is very measurable, this single change increase my application throughput by 10%.

FYI, i know rethinkdb has got rid of protobuf. but you should check this out https://github.com/mafintosh/protocol-buffers, purely js but it outperforms node-protobuf by at least 5x.

neumino · 2014-11-10T07:52:38Z

Just a heads up:

I did optimize most of the things
I rewrote __validate in a human readable way -- oh boy, it was so messy...
I need to work a bit on save and delete -- I changed a bit the hooks, nothing hard to do.

I'm almost done, but for a little script that creates 40k documents, and validate them, I get the same performance but without using generated functions.
Overall, it's a bit more than 2 times faster than the current HEAD.

There is also one more thing that we could maybe do. Currently creating a new document copy the value given. So you can do something like

var default = { name: "John" }
var user1 = new User(default);
var user2 = new User(default);
var user3 = new User(default);

But it's not the most common case (I think?), and for big documents, doing deep copies can become expensive.

colprog · 2014-11-10T08:02:59Z

That is great news.
There is still a lot of room for improvements, as a user I would not want an ODM to be the bottleneck of my application. If you think thinky is now mature enough feature wise, maybe we could discuss some ideas for improvement performance wise.

colprog · 2014-11-26T07:45:26Z

got to test the new validator in 1.15.1. it is about 10-15% slower then the codegen version. but i think this is still a nice speed up! good work neumino!

neumino · 2014-11-26T17:51:17Z

Thanks @colprog!
Could you share the script (or schema) that you are using to test that? I should have some time this week end to work on it :)

neumino · 2014-12-01T05:05:21Z

If you have some time @colprog, could you test 1.15.2 -- or master?
Things should be faster in master now I think, at least with this model:

var model = thinky.createModel("model", {
    str: String,
    num: Number,
    bool: Boolean,
    nested: {
        str: String,
        num: Number,
        bool: Boolean,
        nested: {
            str: String,
            num: Number,
            bool: Boolean,
            nested: {
                str: String,
                num: Number,
                bool: Boolean,
            },
        },
    },
    ar: [Number]

}, {init: false});

colprog · 2014-12-02T12:12:20Z

I just refactored my test a few days ago, Now your version is about 10% faster then mine. but I still think code gen is the way to go for doing validation. I will try to integrate your recent changes to code gen and see how it goes.

colprog added 2 commits October 30, 2014 10:56

use codegen to generate doument validation function. this is signific…

8ce8ac0

…antly faster then old code.

cleanup

92f966c

colprog force-pushed the master branch from b9ccd8e to 92f966c Compare October 30, 2014 09:56

apply the same technique to generate default and virtuals

30843f6

colprog force-pushed the master branch from 793143b to 30843f6 Compare October 31, 2014 14:56

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

use codegen to generate doument validation function. #139

use codegen to generate doument validation function. #139

colprog commented Oct 30, 2014

neumino commented Oct 31, 2014

colprog commented Oct 31, 2014

neumino commented Nov 1, 2014

colprog commented Nov 1, 2014

neumino commented Nov 10, 2014

colprog commented Nov 10, 2014

colprog commented Nov 26, 2014

neumino commented Nov 26, 2014

neumino commented Dec 1, 2014

colprog commented Dec 2, 2014

use codegen to generate doument validation function. #139

Are you sure you want to change the base?

use codegen to generate doument validation function. #139

Conversation

colprog commented Oct 30, 2014

neumino commented Oct 31, 2014

colprog commented Oct 31, 2014

neumino commented Nov 1, 2014

colprog commented Nov 1, 2014

neumino commented Nov 10, 2014

colprog commented Nov 10, 2014

colprog commented Nov 26, 2014

neumino commented Nov 26, 2014

neumino commented Dec 1, 2014

colprog commented Dec 2, 2014