-
Notifications
You must be signed in to change notification settings - Fork 114
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add faceted search / custom filters / heterogenous monorepo indexation #62
Comments
What do you want to search for exactly? Per-repository data or per-file data? Let's call them tags Do you want full-text stringsearch of the tags, regex search, or only exact matches? languages are already supported, see https://cs.bazel.build/search?q=lang%3Apython |
Hi, Thanks for the reply ! In fact both as I would like to index my $GOPATH/src directory as I clone any kind of repos in it; nodejs/java/python repositories... It allows me to keep my repositories organized by repo uris; so I just wanted to have on the left side, some filters allowing me to filter my vcs provider, owner, project name. To make it simple, just wanted to upgrade zoekt to a webui closer to searchcode-server (https://github.com/boyter/searchcode-server) Example of left filtering blocks of matched files: Filter by VCS provider:
Filter by namespace (org/user):
Filter by languages:
Filter by filetypes:
Filter by topics:
Hope i made more clear my idea, thanks in advance for your time and reply. Cheers, |
"just wanted to upgrade zoekt to a webui closer to searchcode-server " I don't know enough about building Web UIs that I could pull that off, but I'm happy to review changes. I could add something to the individual results to add restrictions (this repo, this directory, this language, this branch). Would that help? |
Yes, that would... I can do the webui... That s not a problem... Summary: Eg. I could use the go-github package to fetch topics defined for a repository and enrich the restrictions of search results based on the owner defined topics. (ref. https://github.com/google/go-github/blob/master/github/repos.go#L58). Then I will do the disambiguation with the jargon package. This pipeline could be queued and triggered separately, but the most important is to have some methods in zoekt to manage these extra data, for a repo or a file, in an already existing Zoekt's index file. I guess that it would be complicated to rebuild the index each time if you index more than 1000 repos... If you do not mind, let me draft you an example/poc, in my forked version of zoekt ^^, of my poc, so I will send you a link in 1 to 2 hours... :-) Thanks for your patience |
please send me a change through gerrit, as described here: |
for per-repository data, things are simple. There is already a pipeline for inserting metadata, Line 196 in 8e284ca
it only needs a query operator to implement it. And you have to find a way to ingest this data from a given (git) repository. Currently, only git-config settings are imported as repo metadata. |
Hi guys,
Hope you are all well !
I was wondering how it would the best to add some faceting to search results to display with zoekt web-server, like filtering by language or by some custom user defined filters like matching examples below.
I just want to extend zoekt to filter a large heterogenous code monorepo (mainly all my local repositories > 500 repos). And, I am struggling to assess if I should create a blevesearch index, after zoekt indexation, or it could be possible to add some post/pre processing plugins while indexing the code with zoekt.
I found this cool tokenizer package, from a stackoverflow employee,
https://github.com/clipperhouse/jargon, for recognizing canonical and synonymous dev/tech terms, that I wanted to chain in parallel, as a plugin, for post-processing of the topics extracted from the code indexation.
Question:
What would be the best approach to build a quick poc with these external filtering bots/plugins ?
Cheers,
Rosco
Examples:
or
refs.
The text was updated successfully, but these errors were encountered: