Skip to content
This repository has been archived by the owner on Jan 8, 2020. It is now read-only.

[WIP] [ZF3] Refactor input filter component #4772

Closed

Conversation

bakura10
Copy link
Contributor

@bakura10 bakura10 commented Jul 2, 2013

Hi everyone,

This is a branch I'm working on on my free time, to refactor the input filter component for ZF 3. I mostly push this for people who want to see the code and help.

It is not yet usable because it relies on some refactoring I'm doing in both Validator and Filter components. It completely embraces the service manager, so now we should not have the VERY annoying problem of adding custom validators/filters, and not being able to use them using the short name because it uses plugin managers that were not fetched from SM.

Under the hood, the component is completely rewritten and now uses SPL RecursiveFilter. From my first experiments, this allow an around x5-8 performance improvements and a memory consumption divided by 13 for complex input filters (magic happen here: https://github.com/bakura10/zf2/blob/943abe025ed578c4068cf19df079b34726d449a3/library/Zend/InputFilter/BaseInputFilter.php#L274-L300)

What I've seen is that the input filter component got more and more bloated, and it is now slow as hell. Checking the isValid method as it is now makes it quite clear where the problem is.

Therefore, for ZF3, I suggest that the base input filter to stay as simple as possible, and people wanting to have custom behaviors to extend the class and override the isValid for their use-cases (as a more general rule, I'd like this for most components too).

To help this, it also uses internally SPL RecursiveFilter. Basically, it allows to have very powerful validation group rules. For instance, this branch features a ValidationGroupRegex filter, and usage would be as follow:

$inputFilter = new InputFilter();
//populate the input filter with lot of fields

// You can use the as-usual validation group:
$inputFilter->setValidationGroup(array('username', 'email'));

// But also some more complex rules. Here it only validate fields that contain 'registration-'
// in their names
$validationGroupRegex = new ValidationGroupRegexFilter($inputFilter, '/registration-/i')
$inputFilter->setValidationGroupFilter($validationGroupRegex);

Those filters are really easy to write. They must extend AbstractValidationGroupFilter and implements the "accept" method (like this: https://github.com/bakura10/zf2/blob/943abe025ed578c4068cf19df079b34726d449a3/library/Zend/InputFilter/Filter/ValidationGroupRegexFilter.php)

What do you think ? :)

NOTE : not all features have been back ported yet.

@desem
Copy link

desem commented Jul 2, 2013

@bakura10
Copy link
Contributor Author

bakura10 commented Jul 2, 2013

Thanks :). I change a little bit the logic so now any recursive iterator filter can be specific, so you can use the RecursiveCallbackFilterIterator:

$filter = new RecursiveCallbackFilterIterator($inputFilter, function($value, $key)) {
    if ($key === 'toto') {
        return false;
    }

    return true;
}

This will only validate fields except the one different than "toto".

Pretty powerful :D.

@desem
Copy link

desem commented Jul 2, 2013

It's interesting. But I would start with a common forum for zend :)

@texdc
Copy link
Contributor

texdc commented Jul 4, 2013

Please see my notes.

@bakura10
Copy link
Contributor Author

bakura10 commented Jul 4, 2013

I answered :).

@bakura10
Copy link
Contributor Author

bakura10 commented Aug 9, 2013

I tried to add more exciting stuff to this refactor. Now you can add a callback recursive filter to act as a validation group:

$inputFilter = new InputFilter();
$recursiveFilter = new ValidationGroupCallbackFilter($inputFilter, function($value, $key, $iterator) {
   // This get called for each element, so you can reject easily
   if ($key === 'lol') {
       return false;
   }

   return true;
});

$inputFilter->isValid();

@bakura10
Copy link
Contributor Author

bakura10 commented Aug 9, 2013

Based on ocramius feedback, I've renamed the InputFilter class to InputCollection, which may be clearer for people. What do you think?

The only problem is that people may be confused between the InputCollection and CollectionInputCollection (previously CollectionInputFilter). I'll try to have a better idea for this.

@texdc
Copy link
Contributor

texdc commented Aug 9, 2013

I like the 'InputCollection' name! I've just started to use 'CollectionInputFilter', so I understand the issue. Why not just 'CollectionInput'?

On Aug 9, 2013, at 5:17 AM, Michaël Gallego [email protected] wrote:

Based on ocramius feedback, I've renamed the InputFilter class to InputCollection, which may be clearer for people. What do you think?

The only problem is that people may be confused between the InputCollection and CollectionInputCollection (previously CollectionInputFilter). I'll try to have a better idea for this.


Reply to this email directly or view it on GitHub.

@bakura10
Copy link
Contributor Author

@texdc , I'm not even sure the CollectionInputFilter is really needed. After all, the collection input filter does nothing more than applying the same input filter to a collection. Can't we simply run the same input filter against different set of data ? (if I managed to make the input filter stateless it's going to be even more true… but turning it stateless is really hard…)

@bakura10
Copy link
Contributor Author

By the way, here are some performance figures. I simply created a single input filter. Please note that my current implementation does a little more than the current one as each element are created using a factory (both input and input collection), so that now the global validator and filter plugin managers are always correctly injected (this was always hard in current implementation). Please note also that a major time during construction is instead spent on the service locator, therefore I've decided to create the element using new instead of using the plugin manager (after doing this test I realized how service manager is slow).

In order not to make the test wrong, no validator nor filters are attached to elements. This test just take into account the traversal time.

With 50 elements

ZF2 :
Input filter construction time: 0.0015020370483398 msec
Traversal time: 0.0062439441680908 msec
Memory consumption: 3 Mb

ZF3 :
Input filter construction time: 0.00066399574279785 msec
Traversal time: 0.0010840892791748 msec
Memory consumption: 2,75 Mb

With 500 elements

ZF2 :
Input filter construction time: 0.010568141937256 msec
Traversal time: 0.058469772338867 msec
Memory consumption: 6 Mb

ZF3 :
Input filter construction time: 0.0047299861907959 msec
Traversal time: 0.0099430084228516 msec
Memory consumption: 3,75 Mb

With 5000 elements

ZF2 :
Input filter construction time: 0.1059877872467 msec
Traversal time: 0.60562801361084 msec
Memory consumption: 37 Mb

ZF3 :
Input filter construction time: 0.048290014266968 msec
Traversal time: 0.10112690925598 msec
Memory consumption: 11,75 Mb

We can expect the difference to be even more when using nested input filters (which is quite common), but already, the implementation that uses RecursiveIteratorIterator, once most important features were back ported, traversal is around 6 times faster and construction 2 times faster, and consumes a lot less memory with a lot of elements.

Furthermore, all the validation group features, even with complex ones like Callback, come nearly for free with RecursiveIteratorIterator.

@bakura10
Copy link
Contributor Author

Yesterday I thought about named validation groups. This would allow to reuse the same input filter in a lot of different situations by allowing multiple validation groups. The syntax would be something like:

$validationGroup = new ValidationGroup();
$inputCollection->registerValidationGroup($validationGroup, 'myName');

The second parameter could be optional with a default value of "default".

When you runAgainst, you can optionally choose the name validation group as a second parameter:

$inputCollection->runAgainst($data, 'myName');

Do you like the idea?

@texdc
Copy link
Contributor

texdc commented Dec 23, 2013

👍

On Dec 23, 2013, at 8:38 AM, Michaël Gallego [email protected] wrote:

Yesterday I thought about named validation groups. This would allow to reuse the same input filter in a lot of different situations by allowing multiple validation groups. The syntax would be something like:

$validationGroup = new ValidationGroup();
$inputCollection->registerValidationGroup($validationGroup, 'myName');
The second parameter could be optional with a default value of "default".

When you runAgainst, you can optionally choose the name validation group as a second parameter:

$inputCollection->runAgainst($data, 'myName');
Do you like the idea?


Reply to this email directly or view it on GitHub.

@bakura10
Copy link
Contributor Author

I've done it. Currently the only downside as that contrary to ZF2, when you register a validation group, you must explicitly use it when running the input filter:

// Create a validation group
$arrayValidation = new ArrayValidationGroup(['username', 'password']);

// Register it into the input collection
$inputCollection->registerValidationGroup($arrayValidation, 'registration');

// Use it! (last parameter is the validation group name)
$inputCollection->runAgainst($data, $context, 'registration');

Thoughts?

@bakura10
Copy link
Contributor Author

As for before, it comes with three built-in validation groups: ArrayValidationGroup (aka ZF2 validation group), RegexValidationGroup (based on the key) or CallbackValidationGroup to specify your own callback.

@macnibblet
Copy link
Contributor

Makes sense, but what is the $context ?

@bakura10
Copy link
Contributor Author

Context is most often the object. For instance some validators like Identical needs the context. Most of the time it's not needed but iirc the form always give the current object as context.

The problem is that runAgaisnt is defined in the "InputInterface", and InputCollectionInterface extends InputInterface, and I couldn't modify runAgainst, so it's here as a third, optional parameter.

@macnibblet
Copy link
Contributor

Hmm i see, but since context is not actually required that would be a minor inconvenience against all the benefits it would provide. But i would say we are missing a interface for providing different validation groups ?

@texdc
Copy link
Contributor

texdc commented Dec 23, 2013

It seems fine. If we specify a distinct group, we'd need to specify when to use it.

Also, I don't think the verb runAgainst works. Perhaps process would be more appropriate.

public function process($data, $withContext = null);

@bakura10
Copy link
Contributor Author

@macnibblet , I cannot swap the order of those parameters unfortunately. The context is indeed needed, we should not remove it. What do you mean by "we are missing an interface for providing different validation groups"? This is already the case, you can provide multiple named validation group! :)

Anyway, in the context of form, we'll likely have a shortcut, so that the form will make the calls for us. When dealing manually with the input filter, yes, you'll have to pass null or an object as second parameter. Minor annoyance, tbh.

@texdc , I'm still open for the naming. runAgainst was found after a lot of time... I actually like the runAgainst now.

@texdc
Copy link
Contributor

texdc commented Dec 23, 2013

What is runAgainst, the input or the data?

input is runAgainst data
... or
data is runAgainst input

input processes data makes more sense

@macnibblet
Copy link
Contributor

We had a rather long discussion about the naming of that method can i forgotten the reasoning behind it, since we had the discussion on irc many months ago.

@bakura10, do you think you could post a example of a input filter class for a simple user with a few simple fields ?

@bakura10
Copy link
Contributor Author

If you read it as a sentence it makes sense: "runAgainst ($data)". run against data :).

@macnibblet
Copy link
Contributor

You are running the data against the input collection ?

@texdc
Copy link
Contributor

texdc commented Dec 23, 2013

I (input) run against data? That kind of makes sense, but I process data is more common.

@bakura10
Copy link
Contributor Author

@macnibblet , it works a bit like before, except it supports a bit more options and is a lot faster:

class UserInputCollection extends InputCollection
    {
        public function __construct(
            FilterChain $filterChain = null,
            ValidatorChain $validatorChain = null,
            Factory $factory = null
        ) {
            parent::__construct($filterChain, $validatorChain, $factory);

            $this->add([
                'name'    => 'username',
                'filters' => [
                    // your filters
                ]
            ]);
        }
    }

Please note that now, as all the input filters are actually pulled from SM, all the dependencies are specified into the constructor (this will finally remove all the problem of asking why an input filter does not have the custom filters you added...).

It may be a bit hard to use, so maybe we could provide a init method hook (but contrary to ZF2, the init would be called by the constructor, and note by the service manager, so in the init you already have all the dependencies set up).

@bakura10
Copy link
Contributor Author

Yes, the input run against data.

@macnibblet
Copy link
Contributor

IRC please ?

@bakura10
Copy link
Contributor Author

I'd like to talk about a thing I've discovered today about ZF2 input filters, which I think is wrong, but I'd like to discuss about it with you so I can think about making the changes to ZF3.

Imagine the following use case: you have a user input filter that contains first name, last name and email. When registering, all those fields are required.

However, when updating, all the fields are optional (by using setRequired to false, for instance), so that people can update only the fields they need (I know that PUT in REST is supposed to resend the whole resource in theory, but in practice this is often not convenient and much simpler to allow partial updates - sorry REST ayatollah -).

When using ZF2 input filters, when you call "isValid", the input filter will first get the raw values: https://github.com/zendframework/zf2/blob/master/library/Zend/InputFilter/BaseInputFilter.php#L197

This will, in turn, iterate through all the input and get their RAW value (https://github.com/zendframework/zf2/blob/master/library/Zend/InputFilter/BaseInputFilter.php#L529). In our case, this will create an array with first_name, last_name and email whose values are null.

Now, the input filter validation logic will include various checks, but at the end of the process, if the value has not been validated, it fallbacks to null: https://github.com/zendframework/zf2/blob/master/library/Zend/InputFilter/BaseInputFilter.php#L325

This means that, at the end of the process, if you do a PUT with an empty body (no values), if you call "getValues" of the input filter, it will simply return an array with the following values:

['first_name' => null, 'last_name' => null, 'email' => null];

This is actually very problematic because if you use that to hydrate an existing object, then those three fields will become null and when persisted to database, you will loose all this data.

I think this is a very unintuitive behaviour for input filter. I think that the correct behaviour would be to actually NOT include missing values into the validation result.

You now may say: "what if I want to unset values"? Well... if you want to unset values, you need to explicitly pass the parameter with a null value!

For me, the correct behaviour should be:

  1. If you submit an empty body to an input filter with those three fields, it should return [] as result.
  2. If you submit a body with {"first_name": null}, it should return {"first_name": null} (of course, if the configured filters allow a null value!).
  3. If you submit a body with {"first_name": "foo", "last_name": "bar"} with an input filter that has a validation group with ONLY first_name configured, it should return {"first_name": "foo"}

What do you think?

@macnibblet
Copy link
Contributor

I'm not in the mindset to discuss if this is an issue with the input filter but what i do to support partial updates of entities is the following

public function update(array $data, CampaignEntity $campaign)
{
    $data = array_merge($this->hydrator->extract($campaign), $data);

    $this->inputFilter->setData($data);
    if (! $this->inputFilter->isValid()) {
        throw Exception\FailedDataValidationException::create($this->inputFilter->getMessages());
    }

    $values = $this->inputFilter->getValues();
    $this->hydrator->hydrate($values, $campaign);

    $this->objectManager->persist($campaign);
    $this->objectManager->flush();
}

@bakura10
Copy link
Contributor Author

I see this as a hack tbh =). I had to do something simlar (I've taken another approach in that I filter null values before hydration).

This is something that should be solved directly in the input filter I think. Modules like ZfrRest (and Apigility I suppose) abstract all those steps.

@devosc
Copy link
Contributor

devosc commented May 14, 2014

Not sure it exists somewhere, below seems like it would be useful and support you're suggestion when no strategy is provided? E.g a db table's meta data strategy to filter into rdbms values?

public function extract($object, $strategy = null)

@texdc
Copy link
Contributor

texdc commented May 14, 2014

@bakura10 that's a model issue, not a filter or hydrator issue. You're concerned about update business logic for a particular model. Framework code should not attempt to solve that problem. ZfcUser would be a better place for that.

@bakura10
Copy link
Contributor Author

No that's definitely an issue to the input filter, not a model issue. A problem is really dumb, it just eats which data you give him. The way input filter works is currently not intuitive. It does not make sense for an input filter to return data that you do not give.

@bakura10
Copy link
Contributor Author

Hi everyone,

Just had an idea regarding an annoying user experience issue: in this refactor, each input collection have various required parameters in the constructor: https://github.com/bakura10/zf2/blob/refactor-input-filter-component/library/Zend/InputFilter/InputCollection.php#L44

This has the nice side-effects that because the Factory, ValidatorChain and FilterChain are pulled from the SM, they are properly initialized with any custom filters, validators... In ZF2, we need to implement an "init" method, which is called after object has been created, but it sounds more like a hack.

However, this nice thing comes with an annoying issue from a user point of view: having invokables input collections is not really possible anymore, which means that for each of your input filters, even the most stupidest and simplest one, you need to create a specific factory, that pulls three dependencies. That's just stupid and complicated.

I'm thinking about something: http://fr2.php.net/manual/fr/reflectionclass.newinstance.php

ReflectionClass allows us to create instance, so why not overriding the "get" method of the InputFilterPluginManager, to automatically pulls the required parameters if they are not here, and injecting them into the constructor?

I know a lot of people will scream about the "magic", but I don't see this really as magic, but just a way to solve a very annoying problem from a user point of view.

@Ocramius
Copy link
Member

This is waaaay too naive (and un-performant).

We did it with Zend\Di and it was an unpleasant journey
On 18 May 2014 20:55, "Michaël Gallego" [email protected] wrote:

Hi everyone,

Just had an idea regarding an annoying user experience issue: in this
refactor, each input collection have various required parameters in the
constructor:
https://github.com/bakura10/zf2/blob/refactor-input-filter-component/library/Zend/InputFilter/InputCollection.php#L44

This has the nice side-effects that because the Factory, ValidatorChain
and FilterChain are pulled from the SM, they are properly initialized with
any custom filters, validators... In ZF2, we need to implement an "init"
method, which is called after object has been created, but it sounds more
like a hack.

However, this nice thing comes with an annoying issue from a user point of
view: having invokables input collections is not really possible anymore,
which means that for each of your input filters, even the most stupidest
and simplest one, you need to create a specific factory, that pulls three
dependencies. That's just stupid and complicated.

I'm thinking about something:
http://fr2.php.net/manual/fr/reflectionclass.newinstance.php

ReflectionClass allows us to create instance, so why not overriding the
"get" method of the InputFilterPluginManager, to automatically pulls the
required parameters if they are not here, and injecting them into the
constructor?

I know a lot of people will scream about the "magic", but I don't see this
really as magic, but just a way to solve a very annoying problem from a
user point of view.


Reply to this email directly or view it on GitHubhttps://github.com//pull/4772#issuecomment-43448064
.

@localheinz
Copy link
Member

@bakura10

TL;DR, so: it'd be awesome if for ZF3, hierarchical input were as easy to filter and validate as with the Symfony validator component: https://github.com/symfony/Validator/blob/master/README.md.

@GeeH
Copy link

GeeH commented Jun 28, 2016

This issue has been moved from the zendframework repository as part of the bug migration program as outlined here - http://framework.zend.com/blog/2016-04-11-issue-closures.html
New issue can be found at: zendframework/zend-inputfilter#111

@GeeH GeeH closed this Jun 28, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.