Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix bug with prefetching on different nesting levels #519

Open
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

hjerichen
Copy link

Fix bug of calling prefetch method only once with not all type objects when prefetching is on different nesting levels.

Running the new test without the fix shows that prefetchPosts is only called once with Blog 1 and 2. But it should be called with Blog 1, 2, 10, and 20. Therefor Blog 10 and 20 have no Posts, but they should (see expected results).
With this fix, prefetchPosts gets called two time. Once with Blog 1,2 and once with Blog 1,2,10,20. This solves the Problem, but is not the best solution. It wohl be better it gets called only once with all Blogs. But that i could not get working.

…s when prefetching is on different nesting levels.
@codecov-commenter
Copy link

codecov-commenter commented Oct 28, 2022

Codecov Report

Base: 95.72% // Head: 95.71% // Decreases project coverage by -0.00% ⚠️

Coverage data is based on head (416ecc2) compared to base (fed9ffe).
Patch coverage: 100.00% of modified lines in pull request are covered.

Additional details and impacted files
@@             Coverage Diff              @@
##             master     #519      +/-   ##
============================================
- Coverage     95.72%   95.71%   -0.01%     
  Complexity     1769     1769              
============================================
  Files           154      154              
  Lines          4276     4274       -2     
============================================
- Hits           4093     4091       -2     
  Misses          183      183              
Impacted Files Coverage Δ
src/PrefetchBuffer.php 100.00% <100.00%> (ø)
src/AnnotationReader.php 94.47% <0.00%> (-0.04%) ⬇️
src/Mappers/Parameters/TypeHandler.php 98.07% <0.00%> (-0.03%) ⬇️
src/TypeRegistry.php 100.00% <0.00%> (ø)
src/Mappers/Root/MyCLabsEnumTypeMapper.php 100.00% <0.00%> (ø)

Help us with your feedback. Take ten seconds to tell us how you rate us. Have a feature suggestion? Share it here.

☔ View full report at Codecov.
📢 Do you have feedback about the report comment? Let us know in this issue.

@oojacoboo
Copy link
Collaborator

@hjerichen thanks for the PR! Please allow some time for review.

@hjerichen
Copy link
Author

hjerichen commented Oct 28, 2022

I thought about my solution again and I think it is a mistake. In the $sources I use for the PrefetchBuffer there my be unserializable objects, and that would result in an exception. No one should do that in my opinion, but it is possible.
The only correct way is indeed to get all Blog Objects on the first call.

@oojacoboo
Copy link
Collaborator

I see. How's that any different from how it is currently with a serialization issue for objects? I actually haven't made use of prefetch as we tend to rely on ORM and Repository level hydration in many cases.

What is the shape of the $args variable look like?

@hjerichen
Copy link
Author

hjerichen commented Oct 29, 2022

That is too bad. I was hoping one of you active maintainers have more insights on how prefetching is working under the hood.
In the case of prefetchPosts $sources are the Blog objects for which we need the posts.
$args is a assoc array of the arguments from the graphql query:

        query {
            contacts {
                name
                company
                uppercaseName
                repeatName(prefix:"foo", suffix:"bar")
                repeatInnerName
                ... on User {
                    email
                }
            }
        }

If we go the way with $sources, we need the key in PrefetchBuffer to be a combination with $args. That I did wrong here. But the issue remains with the unserializable objects in $sources. Serializing $args was and is no problem.

I will try to dig deeper when i have time.

@oojacoboo
Copy link
Collaborator

oojacoboo commented Oct 30, 2022

@hjerichen where does the unserialization come into play? Looking at the PrefetchBuffer the serialized object is only being used to generate a hash for the object array key. Am I overlooking something?

@hjerichen
Copy link
Author

hjerichen commented Oct 30, 2022

@oojacoboo maybe I have expressed myself incorrectly.
If you have a look to the method computeHash

private function computeHash(array $arguments): string
{
    return md5(serialize($arguments));
}

After my change $arguments would be, for instance, an array of Blog objects. If someone would store a service with a PDO object to a property of that Type, then serialize would throw an exception.

I found a more logical approach to that problem. It is still not perfect, but maybe acceptable.
I will change the pull request accordingly and you can have a look.

@hjerichen
Copy link
Author

My reasoning here is that registering a new object to the prefetch buffer (for instance a Blog type object) should invalidate the corresponding result cache.

@hjerichen
Copy link
Author

hjerichen commented Oct 30, 2022

If you agree with me here, I will add more unit tests to the PrefetchBufferTest class.

@oojacoboo
Copy link
Collaborator

@hjerichen can you explain why invalidating the result cache is necessary or beneficial?

@hjerichen
Copy link
Author

Assume that you want to query a list of posts and each post as several comments.
The PrefetchBuffer::register method is called for each post. Lets assume the hash key is 'abc'.
This results to $this->objects['abc'] containing those posts.
After that, the QueryField will fetch those posts from the PrefetchBuffer and calls the prefetch method to fetch all comments of those posts. Then it calls PrefetchBuffer::storeResult to store the comments for the registered posts.
This results to $this->results['abc'] containing the comments of all the registered posts.

In some special cases, like in my test, PrefetchBuffer::register is called again for some additional posts.
When this happens and the result cache is not invalidated, it keeps only the comments of the first posts.
This results in the wrong number of comments when calling PrefetchBuffer::getResults.

On the other side, if the result cache gets invalidated when a new post gets registered, QueryField will fetch the comments for all the posts (including the new ones) again with the prefetch method and caches them in the PrefetchBuffer. Therefor the registered posts and the cached comments are in sync again.

@oojacoboo
Copy link
Collaborator

@hjerichen thanks for that. Unfortunately, I'm not really following you.

$this->objects stores the Post and $this->results stores the Comments?

@hjerichen
Copy link
Author

Yes. In $this->results are the comments for the posts in $this->objects.

@hjerichen
Copy link
Author

@oojacoboo Did you have time to take a look at it? Would be great to have this fixed.

@oojacoboo
Copy link
Collaborator

oojacoboo commented Nov 21, 2022

@hjerichen so, this invalidation in the register method just throws red flags for me. I'm trying to get a good wrap on what's happening here, but the location of an invalidation in a register method almost surely seems like a bad design decision.

If the hash is being accurately generated and unique, I don't understand how the results aren't accurate per the hash? What's causing them to get out of "sync"? Is this an issue in your case because "subBlog" and "blog" are actually the same object (Blog), just different GraphQL types? So, therefore, the arguments are the same? Can we compute the hash as a composite of the $object and $arguments so that it's guaranteed to be unique? Would that help?

@hjerichen
Copy link
Author

hjerichen commented Nov 24, 2022

Maybe an other example:

query {
    managers {
        id
        addresses {
            id
            street
        }
        purchasers {
            id
            addresses {
                id
                street
            }
        }
    }
}

Lets say purchasers are from type User and managers are from Type Manager extends User. The prefetch method for addresses are in the type User. In this situation the example query will never return any address for the purchasers.
Of course it is debatable whether to use inheritance or not. But this is not the issue here.

The Problem is, that the prefetching of addresses is only called for the managers. The PrefetchBuffer holds at first only the managers in $this->objects und only the manager addresses in $this->results. After a while, the purchasers will be registers in the PrefetchBuffer. But the addresses of them will not be fetched, because $this->results already has matching results: the manager addresses. This is definitely a bug that should be fixed.

If you do not think this is a bug, then at least throw an error or warning when register is called, after the result is already filled.
This is better as letting the developer debug, why the purchasers have no addresses.

Removing the results when new objects are registered to force a new prefetching is one way. Not the best one.
How about that the results stay but we somehow get the prefetching called again for only the new objects (in this case the purchasers) and append the purchaser addresses to the result.

@oojacoboo
Copy link
Collaborator

@hjerichen I'm suggesting that we find a way to uniquely hash the objects, such that removing the results isn't a concern. If it's indeed different and needs different results, the hash created for the cache should be able to be different. We shouldn't have to be adding and removing based on the order of operations.

@hjerichen
Copy link
Author

How about using the internal IDs of the objects to create a Hash?

@oojacoboo
Copy link
Collaborator

@hjerichen not sure what you mean by internal IDs. If you're talking about an object's id property, that's not going to work, since not everyone has objects with an id. Why not try a composite hash of the object and results?

@hjerichen
Copy link
Author

Every object in PHP has an unique ID.
https://www.php.net/manual/en/function.spl-object-id.php

I would think that graphqlite does not destroy his type objects before everything ist fetched. The reuse of the ID after destroy should not be problem.

@oojacoboo
Copy link
Collaborator

@hjerichen I thought you might be talking about using the spl_object_id. It's possible that could work. I do wonder though if there is any concern about an object being destroyed and another object taking that memory space and receiving the same id. That could render incorrect results.

What's wrong with the composite? Would that not work?

@hjerichen
Copy link
Author

@oojacoboo Sorry for not responding so long.

What do you mean exactly with composite.
Hashing the entire array of objects?

This will work as long as there is no object with a property that holds a PDO Instance.
Hashing means using serialize and such objects can not be serialized.
I do not think it is a good Idea to build output type objects that hold such services in properties. But I am sure some people are doing it. Because I have seen it.

Besides that, I think this could be a performance lost for many big objects.

@oojacoboo
Copy link
Collaborator

@hjerichen A composite key or compound key is a key that consists of more than one identifier. So something like: hash($a->getId() . '-' . $b->getId()) for instance.

@oojacoboo
Copy link
Collaborator

Can someone please confirm that this issue is resolved now by #702

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants