Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Comparison to compression VFS #9

Open
phiresky opened this issue Aug 1, 2022 · 4 comments
Open

Comparison to compression VFS #9

phiresky opened this issue Aug 1, 2022 · 4 comments

Comments

@phiresky
Copy link
Owner

phiresky commented Aug 1, 2022

Something like https://github.com/mlin/sqlite_zstd_vfs. Pretty different in the tradeoffs and design decisions, but still should be compared to.

@evbo
Copy link

evbo commented Sep 15, 2022

Any thoughts on SQLite's official CEROD extension? I know it's commercial but the source is public, so as long as you did an academic study I think that'd be legal?

If not, do you know anecdotally why that extension might not be a great choice for comparison? Does it have too much performance penalty? Is read-only requirement too restrictive?

@phiresky
Copy link
Owner Author

I didn't know CEROD was source-available. I don't see any obvious documentation about how the compression aspect of CEROD works, but I'll assume it's similar to sqlite_zstd_vfs. First of all, read-only sounds extremely restrictive to me.

The main potential disadvantage of the VFS approach is that all data is compressed in the same way. That means that even the often-accessed parts (e.g. B-Tree inner nodes) are compressed which might be bad for performance. It also means there's no way to compress different data differently. For example, with my extension you can compress one table with high compression level, another with low compression level (resulting in higher write performance), and another one not at all. You can also flexibly choose the method for deciding which data is compressed together or at all with an arbitrary SQL expression, which is advantageous for e.g. a hot-set vs archived-data situation.

Another advantage of sqlite-zstd is that it can be incrementally enabled without rewriting the rest of the database and without affecting the storage layer (how a database is opened / written to) at all.

The main disadvantage is that this extension is more complex than e.g. sqlite_zstd_vfs and not 100% transparent to the application, and that only data can be compressed, not the B-trees / indexes, which can reduce the effectiveness.

I'm pretty sure it's possible to construct both examples where my extension is faster and more storage efficient than VFS-based alternatives, as well as where it is worse. But it's hard to say in general which will be the case for a real-world application.

@evbo
Copy link

evbo commented Sep 22, 2022

In my case, it boiled down to VFS applied to a database of mostly ints (no blobs or big varchars).

If I had blobs or big varchars, I would use yours :)

@solerman
Copy link

solerman commented Nov 7, 2022

Give your original idea in the blog,there is some overlap with time-series db (or timeseries db extensions, there are a few for postgresql eg https://github.com/timescale/timescaledb)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants