Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

use codegen to generate doument validation function. #139

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

colprog
Copy link
Contributor

@colprog colprog commented Oct 30, 2014

this is significantly faster then old code for non trivial schema.

@neumino
Copy link
Owner

neumino commented Oct 31, 2014

Thanks @colprog!

I like the idea of spiting validate in smaller chunks, I was thinking of spliting it in a different way though (group the check for undefined/null together).

That being said, I have a few questions:

  • It is really faster? Is validate a bottleneck for your use case now?
  • What's the reason behind using generate-function? To have less code?

@colprog
Copy link
Contributor Author

colprog commented Oct 31, 2014

In my current project. thinky is a major bottleneck, it is easily taking 70% of my cpu time. and validation is one of the hot spots both when parsing and saving.

I'm using generate-function to dynamically generate validation function for each model specifically, so for a simple model like this:

var User = thinky.createModel("user", {
        auth: [{type: String, id: String, password: String}],
        joinDate: {_type: Date, default: function () {return new Date();}},
        activated: {_type: Boolean, default: true},

        manRole: {admin: Boolean, editUser: Boolean, data: Boolean, announce: Boolean, debug: Boolean}
});

it generates a tailored validation function :

function _validate_user(doc, prefix, options) {
    var optionStack = [options];
    validateDocIsArray(doc.auth, options, prefix+'.auth');
    if (Array.isArray(doc.auth)) {
        for (var i=0;i<doc.auth.length;i++) {
            if (doc.auth[i] === undefined) throw new Error("The element in the array .auth (position "+i+") cannot be `undefined`.");
            validateDocNotNull(doc.auth[i], options, prefix+'.auth[i]');
            if (util.isPlainObject(doc.auth[i])) {
                validatePrimitive(doc.auth[i].type, options, 'string', prefix+'.auth[i].type', undefined, undefined);
                validatePrimitive(doc.auth[i].id, options, 'string', prefix+'.auth[i].id', undefined, undefined);
                validatePrimitive(doc.auth[i].password, options, 'string', prefix+'.auth[i].password', undefined, undefined);
            }
            validateExtraFields(doc.auth[i], schemas[".auth[i]"], options, prefix+".auth[i]");
        }
    }
    validateDate(doc.joinDate, options, prefix+'.joinDate');
    validatePrimitive(doc.activated, options, 'boolean', prefix+'.activated', undefined, undefined);
    validateDocNotNull(doc.manRole, options, prefix+'.manRole');
    if (util.isPlainObject(doc.manRole)) {
        validatePrimitive(doc.manRole.admin, options, 'boolean', prefix+'.manRole.admin', undefined, undefined);
        validatePrimitive(doc.manRole.editUser, options, 'boolean', prefix+'.manRole.editUser', undefined, undefined);
        validatePrimitive(doc.manRole.data, options, 'boolean', prefix+'.manRole.data', undefined, undefined);
        validatePrimitive(doc.manRole.announce, options, 'boolean', prefix+'.manRole.announce', undefined, undefined);
        validatePrimitive(doc.manRole.debug, options, 'boolean', prefix+'.manRole.debug', undefined, undefined);
    }
    validateExtraFields(doc.manRole, schemas[".manRole"], options, prefix+".manRole");
    validateExtraFields(doc, schemas.root, options, prefix);
}

thus making it fast. In my micro benchmark of parsing a User document. validation function runs 10x faster. making parsing 30% faster. this performance gain should be larger with a more complex schema.

@neumino
Copy link
Owner

neumino commented Nov 1, 2014

Hum, I don't get why generating a function would make thing faster. Do you have any hint on that?

From what I can see, the reason it's faster is because the function __validate is currently not optimized by v8. When breaking it in smaller chunks, some of these happen to be optimized by v8.

I did spend quite some time optimizing rethinkdbdash for v8, but didn't do it for thinky because it was some work and no one had this problem before.
I just opened #141 to track these optimizations.

@colprog
Copy link
Contributor Author

colprog commented Nov 1, 2014

That's great. I enjoys using thinky but now it is way too slow for production use.

the generated version not only is more jit friendly, it also avoids the overhead of traveling the schema every time. it only contains what really needs to be done. I would think the latter is more important here. since thinky's schema language is flexible, __validate contains a lot of type checking code which are all avoided in the generated version.

This is very measurable, this single change increase my application throughput by 10%.

FYI, i know rethinkdb has got rid of protobuf. but you should check this out https://github.com/mafintosh/protocol-buffers, purely js but it outperforms node-protobuf by at least 5x.

@neumino
Copy link
Owner

neumino commented Nov 10, 2014

Just a heads up:

  • I did optimize most of the things
  • I rewrote __validate in a human readable way -- oh boy, it was so messy...
  • I need to work a bit on save and delete -- I changed a bit the hooks, nothing hard to do.

I'm almost done, but for a little script that creates 40k documents, and validate them, I get the same performance but without using generated functions.
Overall, it's a bit more than 2 times faster than the current HEAD.

There is also one more thing that we could maybe do. Currently creating a new document copy the value given. So you can do something like

var default = { name: "John" }
var user1 = new User(default);
var user2 = new User(default);
var user3 = new User(default);

But it's not the most common case (I think?), and for big documents, doing deep copies can become expensive.

@colprog
Copy link
Contributor Author

colprog commented Nov 10, 2014

That is great news.
There is still a lot of room for improvements, as a user I would not want an ODM to be the bottleneck of my application. If you think thinky is now mature enough feature wise, maybe we could discuss some ideas for improvement performance wise.

@colprog
Copy link
Contributor Author

colprog commented Nov 26, 2014

got to test the new validator in 1.15.1. it is about 10-15% slower then the codegen version. but i think this is still a nice speed up! good work neumino!

@neumino
Copy link
Owner

neumino commented Nov 26, 2014

Thanks @colprog!
Could you share the script (or schema) that you are using to test that? I should have some time this week end to work on it :)

@neumino
Copy link
Owner

neumino commented Dec 1, 2014

If you have some time @colprog, could you test 1.15.2 -- or master?
Things should be faster in master now I think, at least with this model:

var model = thinky.createModel("model", {
    str: String,
    num: Number,
    bool: Boolean,
    nested: {
        str: String,
        num: Number,
        bool: Boolean,
        nested: {
            str: String,
            num: Number,
            bool: Boolean,
            nested: {
                str: String,
                num: Number,
                bool: Boolean,
            },
        },
    },
    ar: [Number]

}, {init: false});

@colprog
Copy link
Contributor Author

colprog commented Dec 2, 2014

I just refactored my test a few days ago, Now your version is about 10% faster then mine. but I still think code gen is the way to go for doing validation. I will try to integrate your recent changes to code gen and see how it goes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants