Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Limit data collection to public organizations/groups #481

Open
bzg opened this issue Mar 7, 2024 · 9 comments
Open

Limit data collection to public organizations/groups #481

bzg opened this issue Mar 7, 2024 · 9 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@bzg
Copy link

bzg commented Mar 7, 2024

https://code.gouv.fr/public/#/repos collects data from repositories of public GitHub organizations and public GitLab groups.

My understanding is that https://repos.ecosyste.ms collects data from all groups, public and private.

Can we configure repos so that it only considers public groups?

If so, can we spare the need for (GitLab) tokens?

Upvote & Fund

  • We're using Polar.sh so you can upvote and help fund this issue.
  • We receive the funding once the issue is completed & confirmed by you.
  • Thank you in advance for helping prioritize & fund our backlog.
Fund with Polar
@bzg bzg changed the title Limit data collection to public organizations/groups? Limit data collection to public organizations/groups Mar 7, 2024
@bzg
Copy link
Author

bzg commented Mar 7, 2024

cc @simkim

@andrew
Copy link
Member

andrew commented Mar 8, 2024

You'll still need a public token for GitLab as the api doesn't allow read access even for public data without a token, unlike gitea and github

@andrew andrew added enhancement New feature or request help wanted Extra attention is needed labels Mar 8, 2024
@andrew
Copy link
Member

andrew commented Mar 8, 2024

Right now repos will try to crawl a whole forge, and if it's given a token that can see private repos I suspect it may find some, although I've not tested that, we can definitely add a check to reject repositories that are have a private flag.

Groups/orgs are called "owners" in the repos service.

@bzg
Copy link
Author

bzg commented Mar 8, 2024

Groups/orgs are called "owners" in the repos service.

Good to know, thanks.

we can definitely add a check to reject repositories that are have a private flag.

Yes, that will be useful.

You'll still need a public token for GitLab as the api doesn't allow read access even for public data without a token

Are you sure? This script collects metadata from GitLab instances without the need for a token. Or maybe I misunderstand what is the token needed for exactly?

@andrew
Copy link
Member

andrew commented Mar 8, 2024

Are you sure? This script collects metadata from GitLab instances without the need for a token. Or maybe I misunderstand what is the token needed for exactly?

I'm not 100% on that, will need to double check, but I remember some GitLab API endpoints needing a token, I can't recall which ones though, it was a little while ago that I set that up.

@bzg
Copy link
Author

bzg commented Mar 8, 2024

Okay, thanks.

Because we want to crawl a lot for GitLab forges and because obtaining/renewing tokens can be a chore, we would love to have an option to crawl GitLab forges without tokens, even if it means that we don't get all the data we have when crawling with a token.

@simkim
Copy link
Contributor

simkim commented Mar 29, 2024

fully agree on the usefullness to ignore repo flagged as private.

@andrew
Copy link
Member

andrew commented Apr 2, 2024

PR to ignore private repos over here: #492

@andrew
Copy link
Member

andrew commented Apr 9, 2024

I've merged #492 now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

3 participants