Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feature request: backfill throttling #275

Open
lexomis opened this issue Nov 3, 2024 · 3 comments
Open

feature request: backfill throttling #275

lexomis opened this issue Nov 3, 2024 · 3 comments

Comments

@lexomis
Copy link

lexomis commented Nov 3, 2024

N.b., I'm happy to work up a PR for this, but before doing so I want to check your openness to the feature.

Problem: shovel is lightening fast. Often it's too fast during backfills and can overload my database's post processing capabilities (well less overload, more takes over my database). Also I blow out my credits very very quickly if I'm not extremely careful during backfill operations.

Current solution: I split my shovel into two instances, one that runs my "normal" operations and another that runs my backfills. I then manually turn my backfill on for a tranche during the day and watch my credits / database performance. When the credits start to get red for the day then I have to manually turn backfilling off and wait. Repeat the next day. FWIW, I've tried bringing down the batch_size and while that helps slightly in terms of not taking over my DB, what it really does is just end up being less efficient from a rate limit perspective with my node providers (e.g., it's more cost effective to get 100 blocks at a time than 20).

Proposed solution: currently there is the poll_duration setting for when the blocks are at head to keep rate limits within most providers requirements and to allow operators to tune to blocktime (no sense in pinging every second if the chain has a 12 second blocktime). It would be very useful to my operations if I could set a backfill_batch_duration or some other similar naming that would allow me to tune a time distance in between batches.

I'm happy to send over a PR but I want to check this would be a feature you'd be interested in before I work on it. If this isn't within the product philosophy you have for shovel that's perfectly fine. I can continue to backfill in the way I've become accustomed to.

@ryandotsmith
Copy link
Member

I'm not opposed to introducing a delay. However, before we go down that road, I'm curious to know if you've tried setting concurrency to 1.

@lexomis
Copy link
Author

lexomis commented Nov 4, 2024

Yes of course. Was the first thing....

@jnmclarty
Copy link

+1 to experiencing the same problem, albeit a competing solution would be to express a ceiling in terms of requests per second to throttle the source to match whatever hosting tier you happen to be using.

For instance, max_requests_per_second: 125

A delay might actually not fix the issue comprehensively, if shovel still fires off 100s of requests concurrently, even if they are all delayed.

I easily exceeded quicknode's $40/month plan (125 reqs/s cap), with just 1 integration (with concurrency at 1 and batch_size under 100).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants