Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Using a dataloader inside resolve_reference does not work #205

Open
col opened this issue Aug 16, 2022 · 10 comments
Open

Using a dataloader inside resolve_reference does not work #205

col opened this issue Aug 16, 2022 · 10 comments

Comments

@col
Copy link
Contributor

col commented Aug 16, 2022

It's important that when a subgraph service receives an _entities query from the gateway that it can resolve all of the references efficiently. You would expect that using a data loader inside the resolve_reference method for each entity would achieve this but unfortunately it does not work as expected. Currently the data loader is being invoked for each individual reference rather than as a single batch.

I've created a small example application here: https://github.com/col/apollo-federation-example

Expected Output:

Product.resolve_reference reference: {:__typename=>"Product", :id=>"1"}
Product.resolve_reference reference: {:__typename=>"Product", :id=>"2"}
ProductsByUpc.fetch ["1", "2"]

Actual Output:

Product.resolve_reference reference: {:__typename=>"Product", :id=>"1"}
ProductsByUpc.fetch ["1"]
Product.resolve_reference reference: {:__typename=>"Product", :id=>"2"}
ProductsByUpc.fetch ["2"]

At scale this becomes a significant problem as we can receive hundreds of references in a single _entities query.

@geshwho
Copy link
Collaborator

geshwho commented Aug 16, 2022

I'm not quite as well versed on this issue as @daemonsy, but I'll dig into it this week (@daemonsy is currently OOO)

@col Am I correct that this is the same issue (or very similar to) as #149?

@col
Copy link
Contributor Author

col commented Aug 17, 2022

@geshwho Yes you are correct, this is the same issue as #149. Sorry I really should have added this as a comment on that issue.

@daemonsy
Copy link
Contributor

daemonsy commented Aug 17, 2022

Hey @col, should have documented the other issue better, a little hazy now. I believe I didn't actually find a direct solution to the problem, it just wasn't a problem anymore on the project that's using it.

Had a window to take a look but couldn't get the example to run. I 'd take a look at def _entities and the after_lazy block. Are we doing something that causes execution right away?

@rmosolgo
Copy link

I think this basically boils down the the difference between GraphQL-Batch and Dataloader:

  • With GraphQL-Batch, user code would return a Promise to GraphQL-Ruby (or an Array of Promises), then GraphQL-Ruby would resolve them at some point
  • With Dataloader, .load(...) calls actually pause execution until the data is available, then it resumes.

So inside a .map { ... } (like this one), Dataloader doesn't Just Work ™️ -- each call to .load causes the fiber to pause until data is fetched, so each iteration of the loop requires its own data fetching.

To address this, I can think of a couple possibilities:

  • Support resolve_references (plural) which gives all references to the user-defined method. Then, the user-defined method can use Dataloader's .load_all method to do its own batching.

  • Support returning Dataloader::Requests (from .request) from that method, for example:

    dataloader_requests = {} 
    loaded_refs = representations.each_with_index.map do |(reference, ref_idx)|
       # ... 
       if type_class.respond_to?(:resolve_reference)
          result = type_class.resolve_reference(reference, context)
          if result.is_a?(GraphQL::Dataloader::Request) 
            dataloader_requests[ref_idx] = result 
          end 
          result 
        else
          result = reference
        end
        # ... 
      end
      
      
      # Now that all loads have been requested, tell Dataloader to start fetching things:
      dataloader_requests.each do |(ref_idx, request)| 
        loaded_refs[ref_idx] = request.load 
      end 

There might also be a way to call dataloader methods like .append_job, then gather all the results, which is how GraphQL-Ruby handles this situation when executing queries, but honestly, I'd recommend one of those above.

What do you think of those approaches?

@col
Copy link
Contributor Author

col commented Aug 20, 2022

Thanks for your input @rmosolgo!

I've been trying to get this to work using append_job but have been unsuccessful.

Both the approaches you've suggested sound like they'd work.

Option 1 - resolve_references
I think this would be my preferred approach. From the users perspective it's easy for them to reason about. They don't have to understand the implementation for them to work with it. I think we could also provide backwards compatibility and deprecate the singular resolve_reference method.

Option 2 - return data loader requests
It's not going to be intuitive to the user that they need to call request instead of load. Of course it can be documented but I fear most people wouldn't read the docs until after they discover there is an issue.

@daemonsy Let me know what you think. I'm happy to put a PR together for Option 1 sometime next week.

@daemonsy
Copy link
Contributor

Apologies from the late reply, just returned from a holiday in Singapore :)

Hey @rmosolgo it's been a while, hope you're doing well. Thanks for the clear explanation, that rang a big bell. I've personally bumped into surprises with pausing execution.

@col I'm going to do a spike on the suggestions on our codebase. One consideration is that we've kept to an implicit convention keeping it close to Apollo's reference implementation to avoid creating new concepts/APIs.

@col
Copy link
Contributor Author

col commented Aug 31, 2022

Hey @daemonsy, live in Singapore, I could have taken you out for a drink! Maybe next time!

Anyway, I'd still be very happy to see a better solution for this that keeps the original API intact. I had tried a few things but none of the approaches I tried worked.

I'm wondering if it would be better to lazily resolve the references while resolving the the Entity type rather than trying to do it as a loop in _entities. This would fit better into the existing DataLoader pattern and allow all the Entity resolvers to be paused until the data it fetched. Just a thought.

@daemonsy
Copy link
Contributor

daemonsy commented Sep 2, 2022

Hey @col 😱 definitely my loss. I'm from Singapore but reside in NY with my family. It was good to be back home for a while, despite the weather :)

it would be better to lazily resolve the references while resolving the the Entity type rather than trying to do it as a loop in _entities.

Do you have an example or pseudo code for how this might look?

@col
Copy link
Contributor Author

col commented Feb 10, 2023

I'd suggest we close this issue at this point. I wasn't able to find a fix that didn't require adding the resolve_references method. With the resolve_references workaround in place I would consider this closed.

@kamal
Copy link

kamal commented Sep 18, 2023

What makes it slightly tricky is if you support multiple keys, for example, id and email. Some references will have { "_typename": "User", "id": 123 } while others might have { "__typename": "User", "email": "[email protected]" }. Now you're going to have to make two dataloader calls, effectively things batching yourself, and return the results in the same order it came in.

With the 2nd option proposed by @rmosolgo, you can let dataloader handle the batching by using batch_key_for and fetch(keys).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants