PATH WALK I: The path-walk API #1818

gitgitgadget · 2024-11-01T13:13:49Z

On the Git mailing list, karthik nayak wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes: > From: Derrick Stolee <[email protected]> > > In anticipation of a few planned applications, introduce the most basic form > of a path-walk API. It currently assumes that there are no UNINTERESTING > objects, and does not include any complicated filters. It calls a function > pointer on groups of tree and blob objects as grouped by path. This only > includes objects the first time they are discovered, so an object that > appears at multiple paths will not be included in two batches. > > These batches are collected in 'struct type_and_oid_list' objects, which > store an object type and an oid_array of objects. > > The data structures are documented in 'struct path_walk_context', but in > summary the most important are: > > * 'paths_to_lists' is a strmap that connects a path to a > type_and_oid_list for that path. To avoid conflicts in path names, > we make sure that tree paths end in "/" (except the root path with > is an empty string) and blob paths do not end in "/". > > * 'path_stack' is a string list that is added to in an append-only > way. This stores the stack of our depth-first search on the heap > instead of using recursion. > > * 'path_stack_pushed' is a strmap that stores path names that were > already added to 'path_stack', to avoid repeating paths in the > stack. Mostly, this saves us from quadratic lookups from doing > unsorted checks into the string_list. > > The coupling of 'path_stack' and 'path_stack_pushed' is protected by the > push_to_stack() method. Call this instead of inserting into these > structures directly. > > The walk_objects_by_path() method initializes these structures and > starts walking commits from the given rev_info struct. The commits are > used to find the list of root trees which populate the start of our > depth-first search. Isn't this more of breadth-first search? Reading through the code, the algorithm seems something like: - For each commit in list of commits (from rev_info) - Tackle each root tree, add root path to the stack. - For each path in stack left - Call the callback provided by client. - Find all its first level children, add each to the stack. So wouldn't this go through the tree in level by level basis? Making it a BFS? Apart from this, the patch itself looks solid. I ended up writing a small client to play with this API, and was very pleased how quickly I could get it running. [snip]

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/1/24 9:12 AM, karthik nayak wrote: > "Derrick Stolee via GitGitGadget" <[email protected]> writes: > >> From: Derrick Stolee <[email protected]> >> >> The walk_objects_by_path() method initializes these structures and >> starts walking commits from the given rev_info struct. The commits are >> used to find the list of root trees which populate the start of our >> depth-first search. > > Isn't this more of breadth-first search? Reading through the code, the > algorithm seems something like: > > - For each commit in list of commits (from rev_info) > - Tackle each root tree, add root path to the stack. > - For each path in stack left > - Call the callback provided by client. > - Find all its first level children, add each to the stack. > > So wouldn't this go through the tree in level by level basis? Making it > a BFS? While we are adding all children to the stack, we only pop off the top of the stack, making it a DFS. (We do visit the paths in reverse- lexicographic order, though.) To make it a BFS, we would need to visit the paths in the order they are added to the list. Instead, we visit them in Last-In First-Out order. I initially had built it as a BFS, but ran into memory issues when running it on very large repos. Thanks, -Stolee

gitgitgadget · 2024-11-01T13:43:28Z

On the Git mailing list, karthik nayak wrote (reply to this):

karthik nayak <[email protected]> writes: > "Derrick Stolee via GitGitGadget" <[email protected]> writes: > >> From: Derrick Stolee <[email protected]> >> >> In anticipation of a few planned applications, introduce the most basic form >> of a path-walk API. It currently assumes that there are no UNINTERESTING >> objects, and does not include any complicated filters. It calls a function >> pointer on groups of tree and blob objects as grouped by path. This only >> includes objects the first time they are discovered, so an object that >> appears at multiple paths will not be included in two batches. >> >> These batches are collected in 'struct type_and_oid_list' objects, which >> store an object type and an oid_array of objects. >> >> The data structures are documented in 'struct path_walk_context', but in >> summary the most important are: >> >> * 'paths_to_lists' is a strmap that connects a path to a >> type_and_oid_list for that path. To avoid conflicts in path names, >> we make sure that tree paths end in "/" (except the root path with >> is an empty string) and blob paths do not end in "/". >> >> * 'path_stack' is a string list that is added to in an append-only >> way. This stores the stack of our depth-first search on the heap >> instead of using recursion. >> >> * 'path_stack_pushed' is a strmap that stores path names that were >> already added to 'path_stack', to avoid repeating paths in the >> stack. Mostly, this saves us from quadratic lookups from doing >> unsorted checks into the string_list. >> >> The coupling of 'path_stack' and 'path_stack_pushed' is protected by the >> push_to_stack() method. Call this instead of inserting into these >> structures directly. >> >> The walk_objects_by_path() method initializes these structures and >> starts walking commits from the given rev_info struct. The commits are >> used to find the list of root trees which populate the start of our >> depth-first search. > > Isn't this more of breadth-first search? Reading through the code, the > algorithm seems something like: > > - For each commit in list of commits (from rev_info) > - Tackle each root tree, add root path to the stack. > - For each path in stack left > - Call the callback provided by client. > - Find all its first level children, add each to the stack. > > So wouldn't this go through the tree in level by level basis? Making it > a BFS? My bad here, thinking more about it, it is DFS indeed. Although we add all the children of a level to the stack, we pop each of them from the stack and end up traversing down that level. > > Apart from this, the patch itself looks solid. I ended up writing a > small client to play with this API, and was very pleased how quickly I > could get it running. > > [snip]

gitgitgadget · 2024-11-01T14:28:40Z

On the Git mailing list, karthik nayak wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes: > From: Derrick Stolee <[email protected]> > > The rev_info that is specified for a path-walk traversal may specify > visiting tag refs (both lightweight and annotated) and also may specify > indexed objects (blobs and trees). Update the path-walk API to walk > these objects as well. > > When walking tags, we need to peel the annotated objects until reaching > a non-tag object. If we reach a commit, then we can add it to the > pending objects to make sure we visit in the commit walk portion. If we Nit: s/in/it in/ [snip] > + case OBJ_BLOB: > + if (!info->blobs) > + continue; > + if (pending->path) { > + struct type_and_oid_list *list; > + char *path = pending->path; > + if (!(list = strmap_get(&ctx->paths_to_lists, path))) { > + CALLOC_ARRAY(list, 1); > + list->type = OBJ_BLOB; > + strmap_put(&ctx->paths_to_lists, path, list); > + } > + oid_array_append(&list->oids, &obj->oid); > + } else { > + /* assume a root tree, such as a lightweight tag. */ Shouldn't this comment be for tagged blobs? > + oid_array_append(&tagged_blobs->oids, &obj->oid); > + } > + break; [snip]

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/1/24 10:25 AM, karthik nayak wrote: > "Derrick Stolee via GitGitGadget" <[email protected]> writes: > >> From: Derrick Stolee <[email protected]> >> >> The rev_info that is specified for a path-walk traversal may specify >> visiting tag refs (both lightweight and annotated) and also may specify >> indexed objects (blobs and trees). Update the path-walk API to walk >> these objects as well. >> >> When walking tags, we need to peel the annotated objects until reaching >> a non-tag object. If we reach a commit, then we can add it to the >> pending objects to make sure we visit in the commit walk portion. If we > > Nit: s/in/it in/ thanks! >> + case OBJ_BLOB: >> + if (!info->blobs) >> + continue; >> + if (pending->path) { >> + struct type_and_oid_list *list; >> + char *path = pending->path; >> + if (!(list = strmap_get(&ctx->paths_to_lists, path))) { >> + CALLOC_ARRAY(list, 1); >> + list->type = OBJ_BLOB; >> + strmap_put(&ctx->paths_to_lists, path, list); >> + } >> + oid_array_append(&list->oids, &obj->oid); >> + } else { >> + /* assume a root tree, such as a lightweight tag. */ > > Shouldn't this comment be for tagged blobs? Yes. This is a copy-paste error. Thanks for the careful reading. -Stolee

gitgitgadget · 2024-11-21T22:47:49Z

On the Git mailing list, Taylor Blau wrote (reply to this):

On Sat, Nov 09, 2024 at 07:41:10PM +0000, Derrick Stolee via GitGitGadget wrote: > diff --git a/path-walk.c b/path-walk.c > index 24cf04c1e7d..2ca08402367 100644 > --- a/path-walk.c > +++ b/path-walk.c > @@ -98,6 +98,10 @@ static int add_children(struct path_walk_context *ctx, > if (S_ISGITLINK(entry.mode)) > continue; > > + /* If the caller doesn't want blobs, then don't bother. */ > + if (!ctx->info->blobs && type == OBJ_BLOB) > + continue; > + I was going to ask why we're not reusing the existing property of the rev_info struct to specify what object type(s) we do/don't want, but the answer is obvious: we don't want that to change the behavior of the commit-level walk which is used to seed the actual path walk based on its results. However, it would be kind of nice to have a single place to specify how you want to traverse objects, and using the rev_info struct seems like a good choice there since you can naively ask Git to parse command-line arguments like --blobs, --trees, --objects, etc. and set the appropriate bits. I wonder if it might make sense to do something like: unsigned int tmp_blobs = revs->blob_objects; unsigned int tmp_trees = revs->tree_objects; unsigned int tmp_tags = revs->tag_objects; if (prepare_revision_walk(revs)) die(_("failed to setup revision walk")); /* commit-level walk */ revs->blob_objects = tmp_blobs; revs->tree_objects = tmp_trees; revs->tag_objects = tmp_tags; /* path-level walk */ I don't have strong feelings about it, but it feels like this approach could be cleaner from a caller's perspective. But I can see an argument to the contrary that it does introduce some awkwardness with the dual-use of those fields. Thanks, Taylor

gitgitgadget · 2024-11-01T13:49:32Z

On the Git mailing list, karthik nayak wrote (reply to this):

"Derrick Stolee via GitGitGadget" <[email protected]> writes: [snip] > diff --git a/t/t6601-path-walk.sh b/t/t6601-path-walk.sh > new file mode 100755 > index 00000000000..1f277b88291 > --- /dev/null > +++ b/t/t6601-path-walk.sh > @@ -0,0 +1,118 @@ > +#!/bin/sh > + > +test_description='direct path-walk API tests' > + > +. ./test-lib.sh > + > +test_expect_success 'setup test repository' ' > + git checkout -b base && > + > + mkdir left && > + mkdir right && > + echo a >a && > + echo b >left/b && > + echo c >right/c && > + git add . && > + git commit -m "first" && > + > + echo d >right/d && > + git add right && > + git commit -m "second" && > + > + echo bb >left/b && > + git commit -a -m "third" && > + > + git checkout -b topic HEAD~1 && > + echo cc >right/c && > + git commit -a -m "topic" > +' > + Nit: Since the root level tree is already special cased out, we only check one level of path here, would be nice to add another level of tree to this. > +test_expect_success 'all' ' > + test-tool path-walk -- --all >out && > + > + cat >expect <<-EOF && > + TREE::$(git rev-parse topic^{tree}) > + TREE::$(git rev-parse base^{tree}) > + TREE::$(git rev-parse base~1^{tree}) > + TREE::$(git rev-parse base~2^{tree}) > + TREE:left/:$(git rev-parse base:left) > + TREE:left/:$(git rev-parse base~2:left) > + TREE:right/:$(git rev-parse topic:right) > + TREE:right/:$(git rev-parse base~1:right) > + TREE:right/:$(git rev-parse base~2:right) > + trees:9 > + BLOB:a:$(git rev-parse base~2:a) > + BLOB:left/b:$(git rev-parse base~2:left/b) > + BLOB:left/b:$(git rev-parse base:left/b) > + BLOB:right/c:$(git rev-parse base~2:right/c) > + BLOB:right/c:$(git rev-parse topic:right/c) > + BLOB:right/d:$(git rev-parse base~1:right/d) > + blobs:6 > + EOF > + > + test_cmp_sorted expect out > +' Isn't the order deterministic? Why do we need to sort it?

gitgitgadget · 2024-11-01T22:26:38Z

On the Git mailing list, Jonathan Tan wrote (reply to this):

I haven't looked thoroughly at the rest of the patches yet, but had a comment about this test. Rearranging: "Derrick Stolee via GitGitGadget" <[email protected]> writes: > +test_expect_success 'all' ' > + test-tool path-walk -- --all >out && > + > + cat >expect <<-EOF && > + TREE::$(git rev-parse topic^{tree}) > + TREE::$(git rev-parse base^{tree}) > + TREE::$(git rev-parse base~1^{tree}) > + TREE::$(git rev-parse base~2^{tree}) > + TREE:left/:$(git rev-parse base:left) > + TREE:left/:$(git rev-parse base~2:left) > + TREE:right/:$(git rev-parse topic:right) > + TREE:right/:$(git rev-parse base~1:right) > + TREE:right/:$(git rev-parse base~2:right) > + trees:9 [snip rest of "expect"] The way you're testing this, wouldn't the tests pass even if the OIDs aren't emitted in path order? (E.g. if topic:right and base~1:right were somehow grouped into two different groups, even though they have the same path.) I would have expected the test output to be something like: TREE:right/ $(rp :right topic base~1 base~2) where rp is a function that takes in a suffix and one or more prefixes - I haven't figured out its contents yet, but echo $(git rev-parse HEAD^^ HEAD^ HEAD | sort) gives us a space-separated list, so it doesn't seem too difficult to define such a function. > +static int emit_block(const char *path, struct oid_array *oids, > + enum object_type type, void *data) > +{ > + struct path_walk_test_data *tdata = data; > + const char *typestr; > + > + switch (type) { > + case OBJ_TREE: > + typestr = "TREE"; > + tdata->tree_nr += oids->nr; > + break; > + > + case OBJ_BLOB: > + typestr = "BLOB"; > + tdata->blob_nr += oids->nr; > + break; > + > + default: > + BUG("we do not understand this type"); > + } > + > + for (size_t i = 0; i < oids->nr; i++) > + printf("%s:%s:%s\n", typestr, path, oid_to_hex(&oids->oid[i])); Then here, you would print typestr and path before the "for" loop. In the "for" loop you would add oid_to_hex() results to a sorted string list, have another "for" loop that prints each element preceded by a space, then print a "\n" after both "for" loops.

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/1/24 6:23 PM, Jonathan Tan wrote: > I haven't looked thoroughly at the rest of the patches yet, but had a > comment about this test. Rearranging: > > "Derrick Stolee via GitGitGadget" <[email protected]> writes: >> +test_expect_success 'all' ' >> + test-tool path-walk -- --all >out && >> + >> + cat >expect <<-EOF && >> + TREE::$(git rev-parse topic^{tree}) >> + TREE::$(git rev-parse base^{tree}) >> + TREE::$(git rev-parse base~1^{tree}) >> + TREE::$(git rev-parse base~2^{tree}) >> + TREE:left/:$(git rev-parse base:left) >> + TREE:left/:$(git rev-parse base~2:left) >> + TREE:right/:$(git rev-parse topic:right) >> + TREE:right/:$(git rev-parse base~1:right) >> + TREE:right/:$(git rev-parse base~2:right) >> + trees:9 > > [snip rest of "expect"] > > The way you're testing this, wouldn't the tests pass even if the OIDs > aren't emitted in path order? (E.g. if topic:right and base~1:right > were somehow grouped into two different groups, even though they have > the same path.) You are correct that if the path-walk API emitted multiple batches with the same path name, then we would not detect that via the current testing strategy. The main reason to use the sort is to avoid adding a restriction on the order in which objects appear within the batch. Your recommendation to group a batch into a single line does not strike me as a suitable approach, because long lines become hard to read and difficult to parse diffs. (Also, the order within the batch becomes baked in as a requirement.) The biggest question I'd like to ask is this: do you see a risk of a path being repeated? There are cases where it will happen, such as indexed objects that are not reachable anywhere else. The way I would consider modifying these tests to reflect the batching would be to associate each batch with a number, causing the order of the paths to become hard-coded in the test. Something like 0:COMMIT::$(git rev-parse ...) 0:COMMIT::$(git rev-parse ...) 1:TREE::$(git rev-parse ...) 1:TREE::$(git rev-parse ...) 2:TREE:right/:$(git rev-parse ...) 3:BLOB:right/a:$(...) 4:TREE:left/:$(git rev-parse ...) 5:BLOB:left/b:$(...) This would imply some amount of order that maybe should become a requirement of the API. Thanks, -Stolee

On the Git mailing list, Jonathan Tan wrote (reply to this):

Derrick Stolee <[email protected]> writes: > You are correct that if the path-walk API emitted multiple batches > with the same path name, then we would not detect that via the current > testing strategy. > > The main reason to use the sort is to avoid adding a restriction on > the order in which objects appear within the batch. > > Your recommendation to group a batch into a single line does not > strike me as a suitable approach, because long lines become hard to > read and difficult to parse diffs. (Also, the order within the batch > becomes baked in as a requirement.) The hashes in a line can be abbreviated if line length is a concern. Also, note that I am suggesting sorting the OIDs within a line (that is, a batch), and also sorting the lines (batches) as a whole. > The biggest question I'd like to ask is this: do you see a risk of > a path being repeated? There are cases where it will happen, such as > indexed objects that are not reachable anywhere else. I was thinking that the whole point of this feature is that we group objects by path, so it seems desirable to test that paths are not repeated. (Or repeated as little as possible, if it is not possible to avoid repetition e.g. in the case you describe.) > The way I would consider modifying these tests to reflect the batching > would be to associate each batch with a number, causing the order of > the paths to become hard-coded in the test. Something like > > 0:COMMIT::$(git rev-parse ...) > 0:COMMIT::$(git rev-parse ...) > 1:TREE::$(git rev-parse ...) > 1:TREE::$(git rev-parse ...) > 2:TREE:right/:$(git rev-parse ...) > 3:BLOB:right/a:$(...) > 4:TREE:left/:$(git rev-parse ...) > 5:BLOB:left/b:$(...) > > This would imply some amount of order that maybe should become a > requirement of the API. > > Thanks, > -Stolee If we're willing to declare an order in which we will return paths to the user, that would work too. (I'm not sure that we need to declare an order, though.)

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/4/24 6:39 PM, Jonathan Tan wrote: > Derrick Stolee <[email protected]> writes: > >> The biggest question I'd like to ask is this: do you see a risk of >> a path being repeated? There are cases where it will happen, such as >> indexed objects that are not reachable anywhere else. > > I was thinking that the whole point of this feature is that we group > objects by path, so it seems desirable to test that paths are not > repeated. (Or repeated as little as possible, if it is not possible > to avoid repetition e.g. in the case you describe.) In addition to determining the order of the batches, it can be helpful to demonstrate that we don't call the path_fn with an empty batch! I discovered this while making the appropriate changes today and putting the fixes in the right places. Thanks, -Stolee

gitgitgadget · 2024-11-06T14:09:20Z

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Thu, Oct 31, 2024 at 06:27:00AM +0000, Derrick Stolee via GitGitGadget wrote: [snip] > +int cmd__path_walk(int argc, const char **argv) > +{ > + int res; > + struct rev_info revs = REV_INFO_INIT; > + struct path_walk_info info = PATH_WALK_INFO_INIT; > + struct path_walk_test_data data = { 0 }; > + struct option options[] = { > + OPT_END(), > + }; > + > + initialize_repository(the_repository); > + setup_git_directory(); > + revs.repo = the_repository; > + > + argc = parse_options(argc, argv, NULL, > + options, path_walk_usage, > + PARSE_OPT_KEEP_UNKNOWN_OPT | PARSE_OPT_KEEP_ARGV0); > + > + if (argc > 1) > + setup_revisions(argc, argv, &revs, NULL); > + else > + usage(path_walk_usage[0]); > + > + info.revs = &revs; > + info.path_fn = emit_block; > + info.path_fn_data = &data; > + > + res = walk_objects_by_path(&info); > + > + printf("trees:%" PRIuMAX "\n" > + "blobs:%" PRIuMAX "\n", > + data.tree_nr, data.blob_nr); > + > + return res; > +} This function is leaking memory. I'd propose to add below patch on top to plug them, which makes t6601 pass with the leak sanitizer enabled. Patrick diff --git a/t/helper/test-path-walk.c b/t/helper/test-path-walk.c index 06b103d876..fa3bfe46b5 100644 --- a/t/helper/test-path-walk.c +++ b/t/helper/test-path-walk.c @@ -85,7 +85,6 @@ int cmd__path_walk(int argc, const char **argv) OPT_END(), }; - initialize_repository(the_repository); setup_git_directory(); revs.repo = the_repository; @@ -110,5 +109,6 @@ int cmd__path_walk(int argc, const char **argv) "tags:%" PRIuMAX "\n", data.commit_nr, data.tree_nr, data.blob_nr, data.tag_nr); + release_revisions(&revs); return res; }

On the Git mailing list, Derrick Stolee wrote (reply to this):

On 11/6/24 9:04 AM, Patrick Steinhardt wrote: > On Thu, Oct 31, 2024 at 06:27:00AM +0000, Derrick Stolee via GitGitGadget wrote: > [snip] > This function is leaking memory. I'd propose to add below patch on top > to plug them, which makes t6601 pass with the leak sanitizer enabled. Thanks! Applied for the next version. -Stolee

gitgitgadget · 2024-11-21T22:41:45Z

On the Git mailing list, Taylor Blau wrote (reply to this):

On Sat, Nov 09, 2024 at 07:41:09PM +0000, Derrick Stolee via GitGitGadget wrote: > From: Derrick Stolee <[email protected]> > > Add some tests based on the current behavior, doing interesting checks > for different sets of branches, ranges, and the --boundary option. This > sets a baseline for the behavior and we can extend it as new options are > introduced. > > Store and output a 'batch_nr' value so we can demonstrate that the paths are > grouped together in a batch and not following some other ordering. This > allows us to test the depth-first behavior of the path-walk API. However, we > purposefully do not test the order of the objects in the batch, so the > output is compared to the expected output through a sort. > > It is important to mention that the behavior of the API will change soon as > we start to handle UNINTERESTING objects differently, but these tests will > demonstrate the change in behavior. > > Signed-off-by: Derrick Stolee <[email protected]> > --- Nice. I like the approach of implementing the API in a single commit, and then demonstrating a trivial "caller" by way of a custom test helper. I think that the artifact of having a test helper here is useful on its own, but it also serves as a good example of how to use the API, and provides something to actually test the implementation with. I'm going to steal this pattern the next time I need to work on something that necessitates a complex new API ;-). > +static int emit_block(const char *path, struct oid_array *oids, > + enum object_type type, void *data) > +{ > + struct path_walk_test_data *tdata = data; > + const char *typestr; > + > + switch (type) { > + case OBJ_TREE: > + typestr = "TREE"; > + tdata->tree_nr += oids->nr; > + break; > + > + case OBJ_BLOB: > + typestr = "BLOB"; > + tdata->blob_nr += oids->nr; > + break; > + default: > + BUG("we do not understand this type"); > + } I think you could write this as: if (type == OBJ_TREE) tdata->tree_nr += oids->nr; else if (type == OBJ_BLOB) tdata->blob_nr += oids->nr; else BUG("we do not understand this type"); typestr = type_name(type); Which DRYs things up a bit and uses the type_name() helper. That will give you strings like "tree" and "blob" instead of "TREE" and "BLOB", but I'm not sure if the casing is important. Thanks, Taylor

-Original file line number
+Diff line change
@@ -0,0 +1,63 @@
+    Path-Walk API
+    =============
+    The path-walk API is used to walk reachable objects, but to visit objects
+    in batches based on a common path they appear in, or by type.
+    For example, all reachable commits are visited in a group. All tags are
+    visited in a group. Then, all root trees are visited. At some point, all
+    blobs reachable via a path `my/dir/to/A` are visited. When there are
+    multiple paths possible to reach the same object, then only one of those
+    paths is used to visit the object.
+    Basics
+    ------
+    To use the path-walk API, include `path-walk.h` and call
+    `walk_objects_by_path()` with a customized `path_walk_info` struct. The
+    struct is used to set all of the options for how the walk should proceed.
+    Let's dig into the different options and their use.
+    `path_fn` and `path_fn_data`::
+    	The most important option is the `path_fn` option, which is a
+    	function pointer to the callback that can execute logic on the
+    	object IDs for objects grouped by type and path. This function
+    	also receives a `data` value that corresponds to the
+    	`path_fn_data` member, for providing custom data structures to
+    	this callback function.
+    `revs`::
+    	To configure the exact details of the reachable set of objects,
+    	use the `revs` member and initialize it using the revision
+    	machinery in `revision.h`. Initialize `revs` using calls such as
+    	`setup_revisions()` or `parse_revision_opt()`. Do not call
+    	`prepare_revision_walk()`, as that will be called within
+    	`walk_objects_by_path()`.
+    +
+    It is also important that you do not specify the `--objects` flag for the
+    `revs` struct. The revision walk should only be used to walk commits, and
+    the objects will be walked in a separate way based on those starting
+    commits.
+    `commits`, `blobs`, `trees`, `tags`::
+    	By default, these members are enabled and signal that the path-walk
+    	API should call the `path_fn` on objects of these types. Specialized
+    	applications could disable some options to make it simpler to walk
+    	the objects or to have fewer calls to `path_fn`.
+    +
+    While it is possible to walk only commits in this way, consumers would be
+    better off using the revision walk API instead.
+    `prune_all_uninteresting`::
+    	By default, all reachable paths are emitted by the path-walk API.
+    	This option allows consumers to declare that they are not
+    	interested in paths where all included objects are marked with the
+    	`UNINTERESTING` flag. This requires using the `boundary` option in
+    	the revision walk so that the walk emits commits marked with the
+    	`UNINTERESTING` flag.
+    Examples
+    --------
+    See example usages in:
+    	`t/helper/test-path-walk.c`

-Original file line number
+Diff line change
@@ Expand Up / @@ -818,6 +818,7 @@ TEST_BUILTINS_OBJS += test-parse-options.o @@
     TEST_BUILTINS_OBJS += test-parse-pathspec-file.o
     TEST_BUILTINS_OBJS += test-partial-clone.o
     TEST_BUILTINS_OBJS += test-path-utils.o
+    TEST_BUILTINS_OBJS += test-path-walk.o
     TEST_BUILTINS_OBJS += test-pcre2-config.o
     TEST_BUILTINS_OBJS += test-pkt-line.o
     TEST_BUILTINS_OBJS += test-proc-receive.o
@@ Expand Down Expand Up / @@ -1094,6 +1095,7 @@ LIB_OBJS += parse-options.o @@
     LIB_OBJS += patch-delta.o
     LIB_OBJS += patch-ids.o
     LIB_OBJS += path.o
+    LIB_OBJS += path-walk.o
     LIB_OBJS += pathspec.o
     LIB_OBJS += pkt-line.o
     LIB_OBJS += preload-index.o
@@ Expand Down @@

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

PATH WALK I: The path-walk API #1818

Diff view

Diff view

There are no files selected for viewing

gitgitgadget bot Nov 1, 2024

gitgitgadget bot Nov 1, 2024

gitgitgadget bot Nov 1, 2024

gitgitgadget bot Nov 1, 2024

gitgitgadget bot Nov 4, 2024

gitgitgadget bot Nov 21, 2024

gitgitgadget bot Nov 1, 2024

gitgitgadget bot Nov 1, 2024

gitgitgadget bot Nov 4, 2024

gitgitgadget bot Nov 4, 2024

gitgitgadget bot Nov 8, 2024

gitgitgadget bot Nov 6, 2024

gitgitgadget bot Nov 8, 2024

gitgitgadget bot Nov 21, 2024

PATH WALK I: The path-walk API #1818

Are you sure you want to change the base?

PATH WALK I: The path-walk API #1818

Diff view

Diff view

There are no files selected for viewing

gitgitgadget bot Nov 1, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 1, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 1, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 1, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 4, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 21, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 1, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 1, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 4, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 4, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 8, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 6, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 8, 2024

Choose a reason for hiding this comment

gitgitgadget bot Nov 21, 2024

Choose a reason for hiding this comment