-
Notifications
You must be signed in to change notification settings - Fork 30
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rate-limit makes the plugin unusable #398
Comments
@cbruno10 maybe you could add more information |
Hello, @electriquo. Sorry for the delayed response. Before we proceed with the fixes, I have a couple of questions, could you please take a look?
I attempted to replicate the error by querying the
Thank you! |
@ParthaI Install the github modules (steampipe-mod-github-compliance, steampipe-mod-github-insights, steampipe-mod-github-sherlock) and navigate to the GitHub Default Branch Protection Report. I have more than 2000 active (non-archived) repositories. |
Thanks, @electriquo, for the detailed information, I will give it a try to reproduce the issues, and make the necessary changes. |
Hello, @electriquo, following our previous discussion, I set up my local environment to try and replicate the error. Despite my attempts, I couldn't reproduce the rate limit error. The compliance/insight mods are functioning correctly across approximately For your information, the plugin initially used the REST API to generate results, which was prone to errors, particularly those related to rate limits. As a remedy, we've implemented GraphQL query support. Additionally, I experimented with the plugin code in my local environment, making around Query Result:
Thanks! |
@ParthaI You were experimenting, which is a reproduction best effort. Yet, it does not say that there is no issue :) As you can see in the description, the rate limit error is clear. |
Hello @electriquo, Thank you for your patience and for highlighting the issue once more. Indeed, the rate-limit error you mentioned is a significant concern, when navigating to the GitHub Default Branch Protection Report, given the extensive number of active (non-archived) repositories you manage, exceeding 2000. In an effort to meticulously replicate this scenario, I conducted tests involving over 2000 API calls within a single query. Regrettably, these tests did not trigger the same rate-limit error, which suggests the situation might be influenced by specific conditions or a huge number of repositories in the organization. Understanding the importance of accurately diagnosing and resolving this issue, I'd like to delve deeper. The dashboard you referred to leverages the Your cooperation and insights are greatly appreciated! |
No issue here but when I open |
Thank you, @electriquo, for conducting those tests and sharing your findings. Replicating the rate-limit error appears to be challenging with less set of data set in our environment. According to the documentation, using a Personal Access Token (PAT) allows for In comparison, our environment hosts approximately Here's the GraphQL query to check the rate limit: query {
viewer {
login
}
rateLimit {
limit
remaining
used
resetAt
}
} You can find more details on how rate limits are calculated for GraphQL queries in the GitHub documentation. Upon reviewing the GitHub insights mod, specifically the I'm further investigating the plugin and the insight mod to understand this behavior better and will update you with any progress. Thank you once more for your assistance! |
To clear any doubts, I am using a dedicate PAT; meaning that only Steampipe is using this PAT. Thus, the rate-limit must come when using Steampipe only. Could it be that not only the amount of repositories are the factor but the amount of metadata (such as a big Git history/commits)? |
Absolutely, @electriquo, you're correct. However, it's worth noting that within the Steampipe Dashboard, API calls are not initiated until you actively click on any of the hyperlinks. |
My repositories are reach, they are big with many metadata. |
Hello @electriquo, I hope this message finds you well. I am currently working on this issue aimed at reducing the number of API calls while quering the table
Please note, these modifications are specifically for the If you're willing to test the scenario, here are the steps:
Your feedback on these changes would be invaluable to me. Thank you very much for your cooperation and assistance in this matter. |
Hi @electriquo, have you had the chance to try it out yet? |
@ParthaI Sorry, will try it soon. |
$ steampipe query
Welcome to Steampipe v0.22.1
For more information, type .help
> .timing on
> select * from github_my_repository
Error: github: non-200 OK status code: 403 Forbidden body: "{\n \"documentation_url\": \"https://docs.github.com/free-pro-team@latest/rest/overview/rate-limits-for-the-rest-api#about-secondary-rate-limits\",\n \"message\": \"You have exceeded a secondary rate limit. Please wait a few minutes before you try again. If you reach out to GitHub Support for help, please include the request ID 279A:5B4F:18264BC:2CAC1C2:65F97C91.\"\n}" (SQLSTATE HV000)
Time: 47.1s. Rows fetched: 250. Hydrate calls: 17,250.
Where is the install command for |
I believe as per our previous conversation it was working fine.
Please follow the following steps: Steampie Plugin changes:
Steampipe Github Insights Mod:
|
@ParthaI Haven't forgot, didn't find time this :( |
Hey @electriquo , I was able to reproduce the behaviours you were seeing (albeit with a much lower repository count) with I looked into the plugin code, and found our error handling code does retry secondary rate limit errors ( steampipe-plugin-github/github/errors.go Lines 39 to 43 in ec93282
steampipe-plugin-github/github/errors.go Lines 54 to 62 in ec93282
If you're directly running a query, one suggestion is to only select the columns you want to retrieve. If you're running a mod dashboard/benchmark/control, you could also try using rate limiters to slow down Steampipe. For instance, you can try adding this into your # stay well under the 100 hydrate/list/get functions concurrently based on limits in https://docs.github.com/en/graphql/overview/rate-limits-and-node-limits-for-the-graphql-api#secondary-rate-limits
plugin "github" {
limiter "github_global_concurrency" {
max_concurrency = 30
}
} And then see if that helps with running the dashboards or benchmarks you were trying before. I think For instance, with the limiter settings above, I avoided getting the secondary abuse rate limits, but eventually got this unhelpful 502 Bad Gateway error from them with no additional information:
I don't believe a lot of our dashboards run this type of query though. |
Does it mean that you are working on a fix and I should wait for a new release?
Where in the docs did you find about |
We are looking into if we can improve the error handling for rate limit errors, but we're not actively working on an identified fix and don't have a schedule on when it will be released. I'd suggest still trying to use a In the example I sent, the |
Thanks
In the future, how one may know the variables that can be set in a plugin to handle concurrency and rate limiting?
Maybe it should be documented in steampipe.io/docs/guides/limiter.
If you apply the block in |
@electriquo With that block in my Were you able to give the limiter a try, either with max concurrency of 30 (or a different number)? If so, did you see any more consistency in getting results back? Also, we should probably mention rate limiters and at least link to the doc on steampipe.io, which has some examples, so we'll add that to our backlog to see where that section belongs. |
It is on my list for tomorrow :)
Promise to keep you posted
Awesome, thanks you :) |
$ steampipe --version
Steampipe v0.22.2
$ steampipe plugin list --output json | jq -r '.installed[].name'
hub.steampipe.io/plugins/turbot/[email protected]
$ powerpipe mod list --output json | jq -r '.[].dependency_path'
github.com/turbot/[email protected]
github.com/turbot/[email protected]
github.com/turbot/[email protected]
$ cat ~/.steampipe/config/github.spc
connection "github" {
plugin = "hub.steampipe.io/plugins/turbot/[email protected]"
}
plugin "github" {
limiter "github_global_concurrency" {
max_concurrency = 30
}
} When I navigate to the GitHub Default Branch Protection Report dashboard, the rate-limit error still pops. But if I repeat #398 (comment) does not return a rate-limit error message $ steampipe query
Welcome to Steampipe v0.22.2
For more information, type .help
> .timing on
> select count(*) from github_my_repository
...
Time: 7.6s. Rows fetched: 1,088. Hydrate calls: 0. And then navigating to the GitHub Default Branch Protection Report dashboard, the rate-limit error does not pops but besides the repository list, all other columns are empty :( I look at look at the plugin log, and I found that it seems not to use (but do honor) the max concurrency connections. Here are some log lines
Also tried with |
@cbruno10 Hi, any insights? |
@cbruno10 Hello, do you have any new insights? |
Apologies for the radio silence on this issue @electriquo !! I think the reason why you didn't see the limiter tags getting honored could be indicative of a caching issue. Could you please retry the queries by launching a fresh instance of Steampipe and killing any old instances? Command that you could try before running
|
@misraved That's what I always did :)
Seems like firing in all directions rather than base things on data. |
Thanks for the clarification @electriquo !! We are actively exploring options to enhance the rate-limiting capabilities within our plugin. Currently, there is a limited selection of solutions that offer a straightforward method for managing errors originating from the API. |
@misraved Could you please follow #398 (comment)? |
Hi @electriquo , sorry for losing track of this thread.
|
The rate-limit error should be handled generally regardless the component that causes it, especially when there is a clear protocol for rate-limit. Specifically, GitHub has Rate limits for the REST API.
No. $ steampipe query "select distinct name from steampipe_plugin_limiter"
+---------------------------------------------------+
| name |
+---------------------------------------------------+
| aws_servicequotas_list_service_quotas |
| aws_servicequotas_list_aws_default_service_quotas |
| aws_servicequotas_list_tags_for_resource |
+---------------------------------------------------+ Why would you expect to see the limiter appear here when the plugin log clearly states so?
From your words, I understand that dashboards are useless :(
Received the same error across few GitHub's dashboards. |
Looking at your limiter configurations, can you please update it to:
Since you're using a specific plugin version, the Afterward, can you please restart Steampipe, If it does, can you please try running the single query you shared above first and see if that still executes correctly? If so, can you please then run the dashboard again? You may still get throttling errors, and if so, can you please try lowering Thanks! |
@cbruno10 Although I am cooperating, I am not the Steampipe QA team :)
This doesn't sound correct for me, just another shot in the dark. After you took the toll to create and environment to:
I'd be happy to continue and assist. |
@electriquo Thanks for all of your testing efforts and sharing so far, we appreciate all of the info you've shared! We try to reproduce users' environments as much as we can, but sometimes that's difficult as we don't have direct access to see how many repositories, branches, users, branch protection settings, etc., that are in those environments, so we're not able to reproduce them exactly. We do have a GitHub organization with a fairly large number of repositories, but definitely not as large as the one you're querying, so any information you're able to share is helpful for us. In regards to the rate limiter, I was able to reproduce the rate limiter not loading when a specific plugin version was installed, but once I installed GitHub plugin v0.39.1 and created a limiter I shared above:
Then I could see my limiter being loaded and used by the plugin. |
This holds. Had to set the |
I ran into this issue today. I have access to 3800 repos and 10 orgs but I’m only interested in a handful of those orgs for querying. I haven't tried the limiter yet but my first inclination was to see if there was any way to scope down the repos/orgs that get queried in the config file, similar to how the AWS plugin handles it. I think that coupled with the limiter would cover my use case. |
@jmreicha No matter what I did, I couldn't make this work as I was always facing the rate limit. |
Hi @jmreicha , are you running standalone queries or benchmarks/dashboards from mods? If running queries, there are ways to structure the queries sometimes to reduce the number of requests and work better within the rate limit. For the mods, I think we can add better filtering capabilities to optimize the queries, so if there are specific ones you're interested in (along with any you're interested in @electriquo ), please let us know and we can have a look at those first. |
Hi @cbruno10, Thank you. I cannot tell what would I like to see since I cannot explore the plugin/mods due to the rate limit. Thus, I cannot share much about the usage
|
Hey @cbruno10
I noticed the issue when I was playing around with dashboards, specifically the organization best practices. That would be my main use case probably. I was also triggering the rate limit pretty easily doing queries but I will play around with that more and see if I can make better queries. |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
Is it really stale or there is no fix? |
Hi @electriquo Thank you for your patience and understanding. We have explored various options to bypass the strict rate limits imposed by the GitHub API, but unfortunately, we have not been able to find a viable solution that would allow us to overcome these restrictions effectively. However, we would recommend looking into Turbot Pipes, which might offer a more scalable approach to managing these limitations. Turbot Pipes is designed to handle API interactions efficiently and could provide the flexibility needed for your use case. Please let us know if you have any other questions or if there is anything else we can assist you with. |
@misraved Thank you for the effort :) |
@jmreicha That's a lot of repos! I like the idea of scoping down the repos and orgs for queries. I don't know if we'd add that capability to the GitHub config file, but instead maybe in the GitHub Compliance mod itself. For instance, we could add variables that allow users to specify which orgs and repos they want to include in the control/dashboard queries, and then those are added to the I don't have any specific timeline on when we'd add this capability, but it is on our radar to see how to make the mod more usable at larger scale. |
That sounds pretty reasonable, thanks for the update. |
This issue is stale because it has been open 60 days with no activity. Remove stale label or comment or this will be closed in 30 days. |
Describe the bug
github rate-limit make the plugin unusable
and prone to getting banned by github
Steampipe version (
steampipe -v
)Example: v0.21.2
Plugin version (
steampipe plugin list
)Example: v0.39.0
Expected behavior
github specifies exactly how to handle these status code, we should honor by implementing it.
Additional context
The text was updated successfully, but these errors were encountered: