Robj/task system config #497

rtjohnso · 2022-12-04T22:50:04Z

This cleans up several things about the task system:

Allow using background and foreground threads at the same time
Create a separate config for whether to do foreground work
Introduce a task_system_config and all associated initialization, defaults, etc in tests
Generalize the --num-bg-threads command-line option in driver_test splinter_test to be available for all of driver_test.

netlify · 2022-12-04T22:50:08Z

✅ Deploy Preview for splinterdb canceled.

Name	Link
🔨 Latest commit	`782c1bf`
🔍 Latest deploy log	https://app.netlify.com/sites/splinterdb/deploys/63b76924e6378f0009542567

src/task.h

src/task.c

ajhconway

Very happy with these changes.

The main thing I want to mention is that you can't actually increase the fingerprint size in the filters. It'll actually get autoconfig'd down to 4 anyway.

src/task.c

src/task.h

tests/config.c

src/task.c

rtjohnso · 2022-12-06T21:58:06Z

src/task.c

+                        bool                use_stats,
+                        const uint64        num_bg_threads[NUM_TASK_TYPES])
+{
+   if ((num_bg_threads[TASK_TYPE_NORMAL] == 0)


Maybe relax this constraint if they have foreground threads?

tests/unit/task_system_test.c

gapisback · 2023-01-04T05:06:39Z

@rtjohnso - I am done with this review. I found it a bit difficult to review as the diff / diff-blocks were quite a few.

Before applying any corrections or changes to address review feedback, it will be helpful if you could, perhaps, squash all 15 commits into one commit. Then, apply your changes, so I can see the delta-diffs in just one new commit.

Hopefully, I'll get one more shot for a quick pass on the delta-diffs after you've made all changes. Let me know when your next round of updates are done.

Thanks, for all this work!

rtjohnso · 2023-01-04T21:35:31Z

The change today responds to all of Aditya's comments and also tweaks the way foreground threads do background work so that there is no longer a constraint on how background threads for different task groups are configured in the task system.

src/task.c

tests/config.c

src/task.c

src/task.h

tests/functional/splinter_test.c

gapisback

HI, @rtjohnso - I've taken a quick look at the latest round of diffs you pushed based on my earlier feedback from last night.

I scribbled some minor cleanup comments / suggestion in this last pass, and will leave it up to you to address / incorporate as appropriate.

Thank you very much for patiently plodding through all comments and incorporating changes.

Overall, this whole fg-/bg-threads stuff is a lot better now, so am looking forward to running more tests once this integrates.

include/splinterdb/splinterdb.h

ajhconway

LGTM

Minor formatting stuff only

ajhconway · 2023-01-05T22:30:28Z

include/splinterdb/splinterdb.h

+   // task system
+   // Background threads configuration: Both have to be non-zero in order for
+   // background threads to be started. (It is an error for one to be zero
+   // while the other is non-zero.)
+   //
+   // - Memtable bg-threads work on Memtables tasks which are short but latency
+   //   sensitive. A rule of thumb is to allocate around 1 memtable bg-thread
+   //   for every 10 threads performing insertions. Too few memtable threads
+   //   can cause some insertions to have high latency.
+   // - Normal bg-threads work on task such as compacting branches in the trunk
+   //   and building filters. These tasks take longer and are less latency
+   //   sensitive. A rule of thumb is to allocate 1-2 normal background
+   //   threads for every thread performing insertions. Too few "normal"
+   //   background threads can cause disk I/O bandwidth to go underutilized.


Use /* */ comments for multi-line

ajhconway · 2023-01-05T22:42:11Z

src/task.c

   platform_heap_id heap_id;
 } thread_invoke;

 /*
- * -----------------------------------------------------------------------------


We tend to use these around big comments. We should standardize though.

ajhconway · 2023-01-05T22:45:25Z

src/task.c

-   platform_condvar_unlock(&group->cv);
-   platform_thread_cleanup_pop(0);
+   /*
+    * It is important to update the current_executing_tasks while


Insert leading space

ajhconway · 2023-01-05T22:49:22Z

src/task.h

+ * Possibly performs one background task if there is one waiting,
+ * based on the specified queue_scale_percent.  Otherwise returns
+ * immediately.  Returns STATUS_TIMEDOUT to indicate that it did not
+ * run any task.  STATUS_OK indicates that it did run a task.
+ * queue_scale_percent specifies how big the queue must be, relative
+ * to the number of background threads for that task group, for us to
+ * perform a task in that group.  So, for example,


Use some paragraphs to make this less of a wall of text

ajhconway · 2023-01-05T22:52:15Z

tests/functional/filter_test.c

+   task_system *ts = NULL;
+   rc              = task_system_create(hid, io, &ts, &task_cfg);
+   platform_assert_status_ok(rc);
+


Was this not happening before?

This commit applies some minor clean-up to task system as a follow-on to the larger rework done under PR #497. Consistently use STATUS_BUSY, as a way to report when all concurrent threads are in-use. Minor changes to task system unit-test code & cleanup of comments.

This commit applies some minor clean-up to task system as a follow-on to the larger rework done under PR #497. Consistently use STATUS_BUSY, as a way to report when all concurrent threads are in-use. Minor changes are done to task system unit-test code & cleanup of comments.

commit 369bf55 Author: Rob Johnson <[email protected]> Date: Wed May 22 16:53:01 2024 -0700 Robj/onetrust scriptid (#627) update to Broadcom's OneTrust script ID commit 655988f Author: Rob Johnson <[email protected]> Date: Wed May 22 13:16:55 2024 -0700 Robj/deukyeon deadlock (#626) fix several compact_bundle bugs when index node has split commit 05654ab Author: Rob Johnson <[email protected]> Date: Sat May 18 17:11:11 2024 -0700 Robj/print node (#625) cleanup trunk printing and add node ids commit 6a4afab Author: Rob Johnson <[email protected]> Date: Thu May 16 13:30:23 2024 -0700 Robj/fallocate blockdev fix (#623) don't fallocate on block devices commit ac426e4 Author: Rob Johnson <[email protected]> Date: Thu May 16 10:43:48 2024 -0700 Robj/leaf split fix (#624) Fix bug where a leaf split that resulted in a single leaf would cause an assertion failure in a subsequent compaction. The cause of the bug was that the compaction code detected node splits by checking whether the upper-bound key of the node had changed since the compaction request was created. However, in a leaf split into 1 leaf, the upper bound key doesn't change, but the leaf does get rebundled, meaning that the bundle in the compaction request is no longer live. This meant that the compaction thought it was seeing a dead bundle in a node that had not split. This should only occur if the bundle gets flushed from a parent to a child between the enqueuing of the compaction and its execution. But that can only occur if the node is not a leaf. Hence the compaction code asserted that the node was not a leaf in this case. To fix the bug, we improve the way node splits are detected. Every node has a unique id in its header. IDs change whenever a node is split, which makes detecting splits trivial. commit a46705b Author: Rob Johnson <[email protected]> Date: Wed Apr 17 14:10:54 2024 -0700 Fix O_DIRECT and multi-threaded io_context issues (#621) * enable --set-O_DIRECT flag in large_inserts_stress_test * workaround linux aio oddity: with O_DIRECT, io_getevents with a NULL timeout may still return 0 even though there are in-flight IOs. * switch to per-process io contexts commit 9359c9a Author: Rob Johnson <[email protected]> Date: Thu Feb 29 08:38:51 2024 -0800 Robj/badge (#619) add badge to README.md commit 77f8fc9 Author: Rob Johnson <[email protected]> Date: Wed Feb 28 01:19:01 2024 -0800 Set up github actions CI (#617) * fix some memory leaks * fix some build-system bugs and add a memory-safety assert * remove databases during tests to reduce disk usage * Create run-tests.yml commit 2eacefa Author: Aditya Gurajada <[email protected]> Date: Fri Jan 26 15:36:52 2024 -0800 (#599) Replace calls to getpid() with platform'ized platform_getpid() This commit cleans-up the `#include <unistd.h>` from several .c files that needed access to get pid(). Instead, change the code to invoke platform_getpid(), now defined in platform_inline.h commit 23f5e4f Author: Aditya Gurajada <[email protected]> Date: Wed Jan 24 05:55:16 2024 -0800 Refactor to save / print shmem usage stats using common struct. This commit refactors shared memory usage stats fields to drive-off shminfo_usage_stats{} struct entirely. Add platform_save_usage_stats(), used by platform_shm_print_usage(). This refactoring paves the way for upcoming PR #569 which is adding more memory-usage stats fields. commit 4c0a225 Author: Aditya Gurajada <[email protected]> Date: Sun Dec 10 11:49:54 2023 -0800 shmem.c: Rename variables relating to large-fragment handling. Upcoming PR #569 is bringing-in support for handling small fragments. This commit renames existing variables, field names and a few function names that deal with large-fragment support to consistently use 'large' in the name. This clears the way in the namespace for code changes coming from small-fragment changes. Some examples: - struct shm_frag_info -> struct shm_large_frag_info - E.g., shm_frag_addr -> frag_addr, shm_frag_size -> frag_size ... - shm_frag_info shm_mem_frags[] -> shm_large_frag_info shm_large_frags[] - shm_num_frags_tracked -> shm_nlarge_frags_tracked - platform_shm_find_free() -> platform_shm_find_large() NOTE: No other code-/logic-changes are done with this commit. commit 634df21 Author: Rob Johnson <[email protected]> Date: Wed Jan 24 04:24:36 2024 -0800 Robj/fallocate (#601) catch fallocate failure commit ef74125 Author: Aditya Gurajada <[email protected]> Date: Sat Dec 9 22:30:15 2023 -0800 (#604) Cache page-size in local variables to reduce multiple lookups. In debug-build test runs, profiling shows interfaces like clockcache_page_size(), clockcache_config_page_size() bubbling up to the top of 'perf top' output. This commit replaces multiple calls to lookup functions that retrieve the page-size by caching the page-size once per function where it's used multiple times. The affected interfaces are: btree_page_size(), clockcache_page_size(), cache_page_size(), cache_config_page_size(), trunk_page_size() and a few similar ones. These changes add up to saving few seconds of test-execution (out of few mins of run-time) in debug-build mode, esp for BTree-related tests. commit 3e259ca Author: Aditya Gurajada <[email protected]> Date: Sun Dec 10 07:58:21 2023 -0800 test.sh: Enable running by named-function w/o shared memory. This commit adds minor improvements / bug-fixes to test.sh: - The capability to run individual test-function(s) by name was not working without the "--use-shmem" flag. Rework the parameter parsing in all test-functions so that we can now invoke this driver script as: $ INCLUDE_SLOW_TESTS=true ./test.sh run_slower_unit_tests This will run individually named test-functions with default memory configuration. - run_other_driver_tests() had a bug where tests run by this function were not honoring '--use-shmem' arg. Fix this so that cache_test, log_test, filter_test can now also be run with "--use-shmem" enabled. - Introduce Use_shmem global to parse-out --use-shmem arg. Rework minion test-functions to drive off of global variable. - Update elapsed-time tracking to separately track the test execution run-times w/o and w/ shared memory configured. commit c340a0e Author: Aditya Gurajada <[email protected]> Date: Sun Dec 10 06:43:01 2023 -0800 Rename large_inserts_bugs_stress_test.c -> large_inserts_stress_test.c Upcoming PR #569 is overhauling large-inserts stress test. To simplify examining the diffs of this test case as part of that review, this commit is renaming the test file to large_inserts_stress_test.c . Make appropriate changes to the build Makefile and test files, to pickup new file. commit ae4636f Author: Aditya Gurajada <[email protected]> Date: Thu Dec 7 15:08:05 2023 -0800 CI: Bump timeout from 3h to 4h. shmem-tests cause debug test runs to take longer. After addition of new large_inserts_stress_test, being done as part of PR #569 (free/memory mgmt support for shared memory), CI-debug jobs are timing out at current timeout=3h. Bump this timeout limit to 4h, to see if test-jobs complete. commit c169d5e Author: Gabe Rosenhouse <[email protected]> Date: Mon Nov 20 15:51:24 2023 -0800 Another fix for an old CI problem (#602) This field is documented as optional and defaulting to "latest" [0] but for some reason, setting it explicitly seems to matter [1]. [0]: https://github.com/concourse/registry-image-resource#source-configuration [1]: https://vmware.slack.com/archives/CEUC18KQA/p1689754476227109?thread_ts=1689753436.440569&cid=CEUC18KQA commit c64f68d Author: Aditya Gurajada <[email protected]> Date: Mon Oct 10 15:22:07 2022 -0700 Support for multi-process execution, with processes using shared memory This commit extends core shared memory support to now allow for a multi-process execution model, where multiple processes can now attach to Splinter shared memory. Core thread-specific concurrency primitives are modified, slightly, to now also support a multi-process execution model. - This commit sets up the stage to support fork()'ed or other OS-processes running with --use-shmem option, where each process will [in future] masquerade as a Splinter thread. A core change needed to move to that execution model is to support thread-specific IO-context structures. Otherwise, if an/other OS-process tries to do IO using AIO-context established by the main thread (i.e. by the process that started up SplinterDB), we will immediately run into hard IO-system call errors. This commit: - Performs an io_setup() for each thread / process - Manages this AIO-context tightly bound to Splinter's thread context - Does required book-keeping to keep this IO-context state kosher in conjunction with thread registration / de-registration. - Updates existing io_apis_test to deal with thread-specific IO context handles. An alternative could be localize this change-in-behaviour (of setting up thread-specific IO-context structs) only when the process-model of execution comes around. That execution model requires configuring SplinterDB with shared-memory support. But, just by looking at --use-shmem (or corresponding config setting), we cannot be sure that the process-model will be used or if we are just re-running rest of the test suites with shared-segment enabled. So, without trying to further complicate this choice-making, with this commit we will always set up thread-specific AIO-context structures. Collection of lower-level changes to move to this execution model: - platform_buffer_init() that mmap()s' memory for the buffer cache will now use MAP_SHARED (v/s MAP_PRIVATE). The issue is that some parts of structures, e.g. buffer cache, are allocated using mmap(). The flags for this were MAP_PRIVATE, which means this memory is only accessible to the main process that set up Splinter. All child threads work on a COW-version of this mapped memory. So the changes done by the child process to the BTree in the buffer cache are not visible to the parent process. - Convert synchronization primitives to be shared across processes. This commit reworks core synchronization APIs to use interfaces that allow the sync-hook across child processes. This affects: - platform_mutex_init() - platform_spinlock_init() - platform_condvar_init() - platform_semaphore_init() - Add corresponding API-exerciser unit-tests for sanity coverage - Now that we have thread-specific IO-context setup, as part of thread register / deregister, we now also do io_register_thread(), io_deregister_thread(). This is basically book-keeping state of the thread w.r.t IO setup & context. Testing changes added: - Support --fork-child to test execution options. Some new tests will honor this argument, and will exercise activity using a forked-process execution model. - New test splinterdb_forked_child_test added: This covers the cases to show that IO errors could be repro'ed when running Splinter activity from a forked child process. Many other cases are added to this framework to exercise different cases of forked process doing SplinterDB activity. Much code/dev stabilization was achieved through this single new test. - Add case test_seq_key_seq_values_inserts_forked to large_inserts_stress test. - Existing functional io_apis_test to run with --fork-child option, thereby creating the scenario(s) of forked processes exercising the basic IO APIs. - Add new & extended tests to test.sh, for extended coverage using shared-memory and multi-process execution. - Add support for --wait-for-gdb and wait_for_gdb_hook() function. To debug forked child processes, add support for new command-line flag: --wait-for-gdb . And add a looping function where we can set a breakpoint, wait_for_gdb_hook(). Use this facility in splinterdb_forked_child_test.c, which has helped debug errors seen while running test_multiple_forked_process_doing_IOs(). Changes arising from review comments: Mostly cleanup: - splinterdb.c: Redefine testing-accessor methods to return correct <data type> *, rather than void * - platform.c: Clean-up error handling in platform_condvar_init(), using goto labels. Add missing pthread_mutex_destroy() in one case. - platform_shmcreate() will now return heap-ID as start address of allocated shared segment. Adjust platform_heap_id_to_shmaddr() appropriately. - rename tests/splinterdb_test_apis.h -> src/splinterdb_tests_private.h - Rename test config 'num-forked-processes' -> 'num-processes'. Adjust tests accordingly. commit 42799b1 Author: Aditya Gurajada <[email protected]> Date: Mon Sep 26 09:45:25 2022 -0700 Core changes to support running Splinter with allocated shared memory. Support to run SplinterDB with shared memory configured for most memory allocation is an -EXPERIMENTAL- feature added with this commit. This commit brings in basic support to create a shared memory segment and to redirect all memory allocation primitives to shared memory. Currently, we only support a simplistic memory mgmt; i.e. only-allocs, and a very simplistic handling of free() of the very last memory piece allocated. With shared segments of 1-2 GiB we can run all functional and unit tests. The high-points of the changes are: - External configuration: splinterdb_config{} gains a few new visible fields to configure and troubleshoot shared memory configuration. - Boolean: use_shmem: Default is OFF - size_t : shmem_size: - The main driving change is the re-deployment of platform_heap_id 'hid' arg that appears in all memory-related interfaces. If Splinter is configured for shared memory use, 'hid' will be an opaque handle to the shared segment. Most memory allocation will be redirected to new shmem-based alloc() / free() interfaces. - Formalize usages of PROCESS_PRIVATE_HEAP_ID: A small number of clients that wish to repeatedly allocate large chunks of memory tend to cause OOMs. The memory allocated by these clients is not shared across threads / processes. For such usages, introduce PROCESS_PRIVATE_HEAP_ID as an alias to NULL, defaulting to allocating memory from the heap. - Manage handling of heap-ID to platform_get_heap_id() to correctly return the handle to shared memory. (Otherwise, it would return NULL by default.) - BTree pack allocates large fingerprint-array. This also causes large tests to run into OOMs. For threaded execution, it's ok if the memory for this array is allocated from the heap. But for multi-process execution, when one process (thread) allocates this finger print array, another thread may pick up the task to compact a bundle and will try to free this memory. So, this memory has to come from shared memory. To cope with such repeated allocations of large chunks of memory to build fingerprint, a small scheme for recycling such "free"-large-memory chunks scheme is supported by shmem module. Applied this technique to recycle memory allocated for iterators also. They tend to be big'gish, so can also cause shmem-OOMs. - All existing functional and unit-tests have been enhanced to now support "--use-shmem" argument. This will create Splinter with shared memory configured, and tests are run in this mode. This change brings-in quite a good coverage of existing testing for this new feature. - New test: large_inserts_bugs_stress_test -- added to cover the primary use-case of concurrent insert performance benchmarking (that this feature is driving in prior integration effort). - test.sh enhanced to run different classes of test with the "--use-shmem" option. - Diagnostis & Troubleshooting: - Shmem-based alloc/free interfaces extended to print name of object and other call-site info, to better pinpoint source code-flow leading to memory issues. - Add shared memory usage metrics, including for large-fragment handling. Report summary-line of metrics when Splinter is shutdown. Print stats on close. - Add various utility diagnostic helper methods to validate that addresses within shared memory are valid. Unit-tests and some asserts use these. - minor #include cleanups Changes arising through review cycle and stabilization v/s /main: - In test.sh/run_slower_unit_tests(), re-enable execution of large_inserts_bugs_stress_test, but bracketted under "set +e" / "set -e" settings. If this test fails in CI (as it does randomly), hopefully, this SET toggling will allow the rest of the script to still run. CI job should not fail immediately. (Some deeper stabilization is needed for these test cases.) - Purged the heap_handle * in shmem.h/.c module and through the rest of the Splinter code. Only heap-ID is a valid handle anymore. - Fix race condition bug in platform_shm_alloc() - Added Micro-optimization to recycle last-allocated frag being freed. - Add config_parse_use_shmem() as parsing interface to see if "--use-shmem" was supplied. Apply to many unit-/functional-tests. Rework unit-tests to use config_parse_use_shmem() to support --use-shmem parsing. Re-enable large_inserts_bugs_stress_test execution. commit 2fb4d7c Author: Deukyeon Hwang <[email protected]> Date: Tue Aug 15 17:19:03 2023 -0700 Fix the compile error on platform_open_log_file() (#596) commit 4679bb7 Author: Rob Johnson <[email protected]> Date: Tue Aug 15 16:15:54 2023 -0700 remove btree rough count stuff, since it is unused (#594) commit 3bf7023 Author: Rob Johnson <[email protected]> Date: Wed Jul 26 18:53:26 2023 -0700 Bidirectional Iterators (#588) * tweak iterator api to make it easier to add bidirectionality * debugging btree reverse iteration * reduce time we hold locks during btree_split_child_leaf * further refine the locking in btree node splits and fix reverse iterator bug * btree iterator init at key other than min and add btree_iterator_seek * splinterdb_iterator_prev implmentated and working * clang formatting * improve the trunk iterator logic * corrections for pull request * more pull request fixes * assert fix * more pull request feedback * iterator stress test, bug fixes, formatting * final bit of pr feedback * formatting --------- Co-authored-by: Evan West <[email protected]> commit 950df20 Author: Rob Johnson <[email protected]> Date: Tue Jul 25 16:27:33 2023 -0700 allow merge callbacks to be NULL (#577) In that case, splinterdb_update() is not supported. commit a7547cd Author: deukyeon <[email protected]> Date: Tue Jul 25 15:15:05 2023 -0700 (#580) Cleanup some bool stuff (#584) * Add the header file for _Bool * converting bool to bool32 * move stdbool include --------- Co-authored-by: Rob Johnson <[email protected]> commit a3e9469 Author: deukyeon <[email protected]> Date: Mon Jul 24 18:02:50 2023 -0700 Set the addresses of log for super block if it is. (#582) Co-authored-by: Alex Conway <[email protected]> commit 3cec342 Author: Gabe Rosenhouse <[email protected]> Date: Mon Jul 24 17:12:20 2023 -0700 CI fix: use new version of registry-image resource (#593) commit d2e8369 Author: Evan West <[email protected]> Date: Thu Jul 13 20:22:03 2023 +0000 fix formatting in trunk.c commit fad27b5 Author: Evan West <[email protected]> Date: Fri Jul 7 18:38:55 2023 +0000 remove local_max_key and fix filter assertion commit b6dafdf Author: Gabe Rosenhouse <[email protected]> Date: Mon Jun 26 11:13:59 2023 -0700 CI: switch PR resource to maintained one (#587) commit 6a2348c Author: Rob Johnson <[email protected]> Date: Sun Apr 30 13:04:41 2023 -0700 Robj/memtable race fix (#574) * Memtable Generation Bugfix Fixes a bug where memtable_maybe_rotate_and_get_insert_lock would speculatively increment the memtable generation even when the next memtable was not yet ready. This would cause concurrent lookup threads to attempt to access that memtable, resulting in errors. This fix requires the insert threads to wait until the next memtable is ready before finalizing the current one. * abstract memtable and trunk root-addr locking apis --------- Co-authored-by: Alex Conway <[email protected]> commit 8c639a0 Author: Rob Johnson <[email protected]> Date: Mon Apr 24 21:47:01 2023 -0700 fix next_req node-split bug in trunk (#575) commit 1e8f790 Author: deukyeon <[email protected]> Date: Mon Apr 24 19:36:07 2023 -0700 (#546) Fix the segmentation fault after splinterdb_stats_reset() (#547) Previously, when trunk_stats_reset() was called by splinterdb_stats_reset(), the entire statistics of a trunk, including the histogram handles, were reset to zero. commit fa990cf Author: Alex Conway <[email protected]> Date: Tue Dec 20 21:19:00 2022 +0000 Copy-on-Write Trunk This changeset implements copy-on-write for trunk nodes, which includes several high-level changes. This PR still needs to be rebased onto main, but the the purpose is to discuss high- and low-level design decisions. Changes in this PR: Trunk root lock. A distributed RW lock is used to access/change the current root. Flush from root. Flushes proceed from the root and cascade immediately rather than being triggered at the beginning of trunk_compact_bundle. Copy-on-write. Trunk nodes cannot be modified directly, and instead are change via a copy-on-write of the root-to-node path together with a change of the root node. Garbage Collection for unlinked branches and filters. After a copy-on-write, the nodes on the old path will be unreferenced. This PR does not GC the trunk nodes themselves, but it includes a GC path to dereference the replaced branches and filters. platform_batch_rwlock. Replaces distributed locks using dummy cache pages with a batched distributed RW lock implementation in platform.[ch]. commit b5a283b Author: Gabe Rosenhouse <[email protected]> Date: Sat Apr 22 20:10:34 2023 -0700 Update CONTRIBUTING.md to describe ok-to-test PR label (#573) commit d991bac Author: Gabe Rosenhouse <[email protected]> Date: Thu Apr 20 14:25:56 2023 -0700 CI requires an "ok-to-test" label before running PRs (#572) commit 77ab353 Author: gapisback <[email protected]> Date: Thu Apr 20 14:12:23 2023 -0700 CI: Bump timeout from 2h to 3h. shmem-tests cause debug test runs to take longer. (#566) In-flight stabilization of shared memory support in Splinter is bringin along with tons more additional tests. We are effetively running most of the existing twice; once w/o and once w/ shared memory configured. Debug-build test runs are timing out at 2 hours. Bump timeout to 3hs, and once stabilized, we can look into dropping this to 2h. commit b2245ac Author: Aditya Gurajada <[email protected]> Date: Wed Apr 5 16:29:13 2023 -0700 Fix bug in output formatted by size_to_str() helper. Fractional portion of value formatted by size_to_str() was incorrect. We were losing the scale for things which were supposed to be "xx.07", we were reporting "xx.7", which is incorrect. commit 8a04854 Author: Aditya Gurajada <[email protected]> Date: Mon Mar 20 15:26:29 2023 -0700 (#548) Use _Bool for boolean fields in external config struct. In SplinterDB's public splinterdb_config{} config, we have few fields defined as 'bool' which is typedef'ed to int32 on our side. This creates compatibility problems when linking this library with other s/w which may have defined 'bool' as 1-byte field. (Offsets of fields in the splinterdb_config{} struct following 1st field defined as 'bool' changes across dot-oh's.) This commit slightly adjusts the typedefs of boolean fields in external structs to now use _Bool. This should reduce the risk of such incompatibilities. Change return type of methods in public_util.h to _Bool . Relocate typedef int32 bool to private platform_linux/platform.h so it's used only on Splinter-side. Cleaned-up few instances around use of bool type for code hygiene: - Minor adjustment to routing_filter_is_value_found() returning bool. - Stray references to use of 0/1 for boolean values with FALSE/TRUE. commit f3c92ef Author: Aditya Gurajada <[email protected]> Date: Tue Apr 4 15:51:05 2023 -0700 (#561) Fix bug in routing_filter_prefetch(), causing assertion to trip. This commit fixes a simple arithmetic error in routing_filter_prefetch() while computing next page's address. The bug results in a debug-assert in clockcache_get_internal(), or an unending hang in clockcache_get() code-flow using release binary. A new test case test_issue_458_mini_destroy_unused_debug_assert has been added which reproduces the problem. However, this case still runs into another failure (being tracked separately), so this case is currently being skipped. commit 89f09b3 Author: Gabriel Rosenhouse <[email protected]> Date: Wed Mar 22 20:22:05 2023 -0700 CI: use gcc for ASAN jobs commit 9037ebe Author: Aditya Gurajada <[email protected]> Date: Wed Mar 22 10:54:35 2023 -0700 (#554) Fixes to get couple of tests running cleanly in ASAN-builds This commit fixes minor errors in 2 tests (io_apis_test, filter_test) to get them running cleanly in ASAN-builds. commit ea7203a Author: Aditya Gurajada <[email protected]> Date: Wed Mar 22 14:25:29 2023 -0700 (#554) Enhance test.sh to run a sub-set of tests named by their driving function. This commit now allows running as "test.sh <fn-name>" interface, where the name of the driving function executing a batch of tests can be run independently, without having to go through full execution of all tests. This helps developers shorten their fix-dev-test cycle, especially when validating quick-fixes for long-running tests, like ASAN / MSAN builds. commit a5c821c Author: Gabriel Rosenhouse <[email protected]> Date: Tue Mar 21 11:00:01 2023 -0700 CI: temporarily cover the shmem branch revert this once it merges commit 5dd7535 Author: Gabriel Rosenhouse <[email protected]> Date: Tue Mar 21 10:54:58 2023 -0700 CI: run msan and asan tests on all PRs commit 98f5ca1 Author: Gabriel Rosenhouse <[email protected]> Date: Tue Mar 21 10:59:29 2023 -0700 CI: fixup for multi-branch work commit 6be0461 Author: Gabriel Rosenhouse <[email protected]> Date: Mon Mar 20 15:16:00 2023 -0700 CI: refactor config to enable coverage of multiple branches commit d9fcc40 Author: Aditya Gurajada <[email protected]> Date: Mon Jan 9 17:55:52 2023 -0800 Identify Memtable v/s Branch page types via BTree-print routines. This commit extends BTree-print routines to also report the page type, whether it's a branch or a memtable BTree. As the structures and print methods are shared between two objects, this extra information will help in diagnostics. Trunk nodes are likewise identified. Extend btree_print_tree() to receive page_type arg. Minor fix in trunk_print_pivots() to align outputs for pivot key's string. commit 2ea30f4 Author: Aditya Gurajada <[email protected]> Date: Fri Jan 6 12:54:11 2023 -0800 (#500) Move hook-related global vars to task_system{} struct. This commit removes the dependency of task system structures on global variables declared in task.c . The hook-related variables are now moved as members of the task_system{} struct. This removes accessing potentially stale values when task-system is destroyed and re-created. Also, TASK_MAX_HOOKS is now decreased from 8 to 4. This change largely has no functional impact, and is mostly a test-stabilization fix. commit 4b6e0b1 Author: Aditya Gurajada <[email protected]> Date: Mon Feb 6 17:18:13 2023 -0800 Add initial support for message logging levels, used in unit-tests This commit does some clean-up and normalizes the behaviour of interfaces to control outputs from C-unit tests. The main goal is to reduce voluminous output generated by few unit-tests that exercise print diagnostic code (which otherwise crashes browsers when viewing test-run outputs in CI). An additional benefit of this rework is that we now have a way to run unit-tests to see output generated at different verbosity levels. - By default, unit test execution remains silent and only error messags will be printed. ctests' main() takes care of setting this up. - set_log_streams_for_tests() becomes the single-interface that unit test code has to invoke, when needed to change the test output's verbosity level. - Small collection of MSG_LEVEL_ levels added to ctest.h Test execution examples: Run with env-var to see diff outputs: VERBOSE=0 (or unset env-var): Default; silent output VERBOSE=3 : See error messages VERBOSE=6 : See info and error messages VERBOSE=7 : See all messages; mainly intended to collect debug output commit 1f09113 Author: gapisback <[email protected]> Date: Thu Jan 19 13:42:32 2023 -0800 Fix-up indentation of multi-line comments to conform to coding standards. (#535) This commit fixes up several comments in btree.c to conform to the style we have followed elsewhere for multi-line comments. No code logic changes are done with this commit. commit caeaeb1 Author: Aditya Gurajada <[email protected]> Date: Fri Dec 23 10:26:27 2022 -0800 (#513) Add set_log_streams_for_*() fns to manage unit-test outputs. This commit refactors existing chunks of code that exists in different unit- test sources to manage output file handles to a common function. This is defined in new file unit/unit_tests_common.c ; set_log_streams_for_tests() Tests that check error-raising behaviour will now need to call set_log_streams_for_error_tests(), to manage output streams. Minor correction to TEST_DB_NAME; change it to conform to the r.e. defined in .gitignore to suppress listing this in 'git status' output. commit 7e85a29 Author: Aditya Gurajada <[email protected]> Date: Thu Dec 22 10:44:04 2022 -0800 (#511) Add fns to print 'size' as human-readable string w/ unit-specifiers This commit adds couple of utility functions to snprintf(), in an output buffer, the 'size' unit to a human-readable string with unit-specifiers. - size_to_str() - Convert 'size' to a string in an output buffer - size_to_fmtstr() - Same as above, using user-specified format-string. Useful to generate output enclosed in, e.g., '(%s)'. - Add size_str(), size_fmstr() caller-macros to simplify calling these formatting functions. These macros declare on-stack buffers used to format the output string. size_str() provided by Rob Johnson, to greatly simplify the usage. Add utility bytes-to-Units conversion macros. Add unit tests to exercise these interfaces. Apply these utility fns in couple of stats-printing and BTree print-methods, to display size values as human-friendly unit specifiers. commit 84484df Author: Aditya Gurajada <[email protected]> Date: Fri Jan 6 10:47:16 2023 -0800 (#499) Minor cleanup of INVALID_TID, and MAX_THREADS in task-system This commit applies some minor clean-up to task system as a follow-on to the larger rework done under PR #497. Consistently use STATUS_BUSY, as a way to report when all concurrent threads are in-use. Minor changes are done to task system unit-test code & cleanup of comments. commit 60c7910 Author: Aditya Gurajada <[email protected]> Date: Wed Dec 21 16:32:08 2022 -0800 (#507) Rework of platform_buffer_create()/destroy() to init/deinit() interfaces. This commit reworks the buffer_handle{} interfaces to now become platform_buffer_init() and platform_buffer_deinit(). Structures that need a buffer_handle{}, declare a nested sub-struct, which will go through this init / deinit interface to allocate / free memory using existing mmap() interfaces. This removes the need for an input 'heap_id / heap_handle' arg to allocate and free memory. This change does not functionally change anything in these methods. Added small unit-test, platform_apis_test, to exercise these changes. Cleanup structures and fns that used to take 'heap_handle *' which now become unused with this rework. Tighten up backout / error handling in clockcache_init() and deinit() code-flow. Co-authored by Rob Johnson, who reworked the entire interfaces as implemented above, to remove the dependency on 'hid' argument.

rtjohnso added 5 commits November 29, 2022 14:26

tweak example running policy

916e6e6

Merge branch 'main' of github.com:vmware/splinterdb

f63b6a9

add a task_system_config and plumb it through the tests

8757baa

cleanup redundant code in task.c

6726618

allow using bg and fg threads at the same time

d3537e3

rtjohnso added the dont merge label Dec 4, 2022

vmwclabot added the cla-not-required label Dec 4, 2022

rtjohnso added 2 commits December 4, 2022 14:50

Merge remote-tracking branch 'origin/main' into robj/task_system_config

6703e56

format, etc

67908b1

rtjohnso commented Dec 6, 2022

View reviewed changes

src/task.h Outdated Show resolved Hide resolved

rtjohnso commented Dec 6, 2022

View reviewed changes

src/task.h Outdated Show resolved Hide resolved

rtjohnso commented Dec 6, 2022

View reviewed changes

src/task.c Outdated Show resolved Hide resolved

rtjohnso commented Dec 6, 2022

View reviewed changes

src/task.c Show resolved Hide resolved

rtjohnso commented Dec 6, 2022

View reviewed changes

src/task.c Outdated Show resolved Hide resolved

rtjohnso commented Dec 6, 2022

View reviewed changes

src/task.c Outdated Show resolved Hide resolved

rtjohnso commented Dec 6, 2022

View reviewed changes

src/task.c Outdated Show resolved Hide resolved

ajhconway approved these changes Dec 6, 2022

View reviewed changes

src/task.c Outdated Show resolved Hide resolved

src/task.c Outdated Show resolved Hide resolved

src/task.h Outdated Show resolved Hide resolved

tests/config.c Outdated Show resolved Hide resolved

rtjohnso commented Dec 6, 2022

View reviewed changes

src/task.c Outdated Show resolved Hide resolved

rtjohnso commented Dec 6, 2022

View reviewed changes

src/task.c Show resolved Hide resolved

rtjohnso commented Dec 6, 2022

View reviewed changes

Merge remote-tracking branch 'origin/main' into robj/task_system_config

35cf24a

gapisback mentioned this pull request Dec 7, 2022

Enable use of background threads in Splinter. Engage tests to verify basic stability of this existing feature. #469

Merged

rtjohnso added 5 commits December 6, 2022 17:46

respond to reviews

0626082

fix a few bugs

23f0db0

got merge to compile

5edb977

fix task shutdown race

0aae3bc

formatting

adf534b

This was referenced Dec 30, 2022

Potential use of uninitialized variable, thread_tid, in task_register_thread() in assert code. #499

Closed

Improve the handling of 'hook'-related globals in task.c: hook_init_done, num_hooks, hooks[]. #500

Closed