- This is a Java 21+ library and small application framework for processing directories of Markdown documents.
- It is especially well suited for Obsidian vaults and iA Writer libraries.
- It detects queries in the documents, runs them, and writes back the results.
- As an application, it monitors and processes directories in the background.
Okay, that probably doesn't tell you much.
This is a Java library and application framework that spins up a daemon. This daemon monitors one or more directories of Markdown documents, like Obsidian vaults or iA Writer libraries. Based on changes happening in the directories and their Markdown documents, it detects and runs queries embedded in the documents, generates Markdown output for these queries and embeds this output in the documents themselves.
Here's an example of what you can write in a Markdown document:
<!--query:list
folder: Articles
-->
THE OUTPUT WILL GO HERE
<!--/query-->
Put this snippet (without the code block) in a document in a directory tracked by this tool, save it and watch THE OUTPUT WILL GO HERE
be magically replaced with a sorted list of links to documents in the Articles
subdirectory. Add a new article there, delete one, or update an existing one, and watch the list get updated instantly.
Or, I should say: after 3 seconds. This is the built-in delay for processing files after changes were detected. This delay is needed for Obsidian, which automatically saves files every few seconds when you're typing in them. If in the space of the 3 second delay another change is detected, the processing run is cancelled, and a new one is scheduled to run, again in 3 seconds. This little trick ensures the processing doesn't happen unnecessarily often.
The query syntax may seem a bit weird at first, but notice that it is built up of HTML comment tags. That means that the query definitions disappear from view when you preview the Markdown, or export it to some other format using the tool of your choice, leaving you with just the query output. In effect the query syntax is invisible to any tool that processes Markdown correctly.
What can a query do? Whatever you can code in Java! The internal API of this tool allows you to extract any kind of information from your documents or elsewhere and use them in queries.
Don't know which queries are available? Simply put a blank query in a document in a vault and save it:
<!--query-->
<!--/query-->
By default, this tool provides just a couple of built-in generic queries. See the section on those below for more details. These are useful queries, but to make this tool really shine, you will want to create your own queries.
To use this library, you have to code your own Java application, define this tool as a dependency, and implement your own curator, custom data models and custom queries. See further on for an example.
The markdown-demo-curator provides a simple example of a repository of notes and an application on top of it to monitor it for changes and process queries.
The template-application folder holds a minimal template for creating your own curator.
The music test suite provides more examples of what this tool can do and how it works. The test code contains a MusicCurator that can serve as an example for building your own curator, on top of your own vault.
Vincent's Markdown Curator (vmc) is my own, personal implementation that I use every day. It runs on top of 3 independent vaults - work, volunteering, personal - each with their own unique queries, and some shared across. You might find some inspiration in it.
If you're an Obsidian user, then note that most of the things this library does can also be achieved using plugins, like Dataview. I do not like those kinds of plugins. I believe they defeat Obsidian's purpose. For me Obsidian is all about storing knowledge in portable Markdown files. Sprinkling those same files with code (queries) that only Obsidian, with specific plugins installed can understand is not the right idea, I think.
With this library I get the best of both worlds: portable Markdown and "dynamic" content. Query output is embedded in the documents as Markdown content. As far as Obsidian concerns, this tool is not even there.
On the other hand, Obsidian plugins are much easier to install and use. This library requires you to get your hands dirty with Java. You must build your own application. That's not for everyone.
Shouldn't I have built this library as an Obsidian plugin itself? Maybe. Probably. But, I didn't. Why not? Because I'm sure my use of Markdown will outlive my use of Obsidian. Also, being able to change files in a vault with any editor and have this library still work in the background leads to fewer surprises.
Case in point: with the June 2022 release of iA Writer 6 and its support for wikilinks, mostly compatible with Obsidian's, there's now a truly native and focused macOS app for personal knowledge management. iA Writer has far fewer bells and whistles than Obsidian, but that's exactly why I happen to like it so much. And because I do not depend on Obsidian-specific plugins for my content, I can easily switch between them at will and even use them simultaneously.
This tool is specifically written for a variant of Markdown that I call Vincent Flavored Markdown. Basically VFM is the same as Github Flavored Markdown (GFM) with the following constraints and additions:
- A document can have YAML front matter, between
---
. - For headers only ATX (
#
) headers are supported, without the optional closing sequence of#
s. Setext-style headers are not supported. - Headers are always aligned to the left margin.
- Code blocks are always surrounded with backticks, not indented.
- Internal links - links to other documents in the same repository - use double square brackets. (
[[Like this]]
). The link always points to a file name within the repository. (This is what Obsidian and iA Writer do.) - File names are considered to be globally unique within the repository. Surprises might happen otherwise.
- The document's title is, in this order of preference:
- The title of the first level 1 header, if present and at the top of the document.
- The value of the YAML front matter field
title
, if present - The file name, without extension.
- The file extension is
.md
. - Queries can be defined in HTML comments, for this tool to process. See below.
In practice, I only use level 1 headers or the title
property if the filename is not a good title. In 99% of the cases it is. I do not duplicate the filename inside the document, because, well, that's duplication.
If these limitations are not to your liking, then feel free to send me a pull request to support your own personal preferences.
This library/application does not fully parse Markdown. It only does so on a line-by-line level. Documents are broken up in blocks of:
- Front matter
- Sections (these can be nested)
- Code
- Queries
- Text
A text block is "anything not of the above". The content of a text block itself is not parsed. Whether text is in bold or italic, holds a list or a table, uses CriticMarkup or some other format: the internal parser is oblivious to it; it's all just text. When you build your own queries, it's up to you to extract content out of the various blocks, as you see fit.
I have some ideas to extend this further in order to make query construction easier, but I'm not planning on introducing a full Markdown parser.
Creating your own application means that you'll need to:
- Create a new Java artifact
- Create and publish a custom curator
- Create and register one or more queries
- Create your own custom data models
- Copy the
template-application
in this repository to a new directory. - Update the
pom.xml
in your copy:- Set your own groupId and artifactId.
- Make sure to use the latest version of dependencies and plugins.
A mvn clean package
and java -jar target/my-markdown-curator.jar
should result in the application starting up and exiting immediately, telling you that it can't find any curators.
Because I have not published his library to Maven central yet, or anywhere else, you have to install this library in your local repository first. To do so: clone it and do an
mvn install
.
- Define a Dagger
@Module
to set up the context of your curator. In the module include at least thenl.ulso.markdown_curator.CuratorModule
. - Define a Dagger
@Component
that depends on your module and that exposes yourCurator
instance. Typically this is in an interface with just one method. - Compile your code with Maven or in your IDE. If all was well you'll end up with extra code generated by Dagger. If you named your component
MyComponent
, there will be aDaggerMyComponent
now too. - Implement the
CuratorFactory
interface. In its implementation use the Dagger-generated code to create and return yourCurator
instance. - Add the
CuratorFactory
implementation tosrc/main/resources/META-INF/services/nl.ulso.markdown_curator.CuratorModule
.
See the markdown-curator-demo for an example.
An mvn package
and java -jar target/my-markdown-curator.jar
should result in the application starting up and staying up, monitoring the directory you provided in your own custom curator.
Try changing a file in any Markdown document in your document repository now. For example, add the toc
query. Magic should happen!
<!--query:toc-->
<!--/query-->
- Implement the
Query
interface. - Register the query in your own module, by binding the concrete instance into a
Query
set:
@Binds @IntoSet abstract Query bindMyOwnCustomQuery(MyOwnCustomQuery query);
Rebooting your application should result in the availability of the new query.
Once you've implemented a couple of queries you might run into one or two issues:
- Duplication. Extracting specific values from documents might be complex, and the same values might be needed across queries.
- Heavy processing. Running many queries across large data sets on every change, no matter how small, can be CPU intensive.
To solve these issues you can create your own data models, which you can then build your queries upon.
To do so, implement the DataModel
interface and register it in your curator module:
@Binds @IntoSet abstract DataModel bindMyOwnCustomDataModel(MyOwnCustomDataModel dataModel);
Now you can use (inject) it in your own queries. Whenever a change is detected, the curator requests your data models to update themselves accordingly, through the vaultChanged
method. It runs the queries afterwards.
The curator always runs all queries. It's not smart enough to detect that a query depends on a specific
DataModel
and that theDataModel
might not have changed internally. This would require information that is not available at run-time, except through reflection, which this library is not using. A possible solution is to build a Dagger plugin that extracts the necessary information. That would be cool, but would also be solving a problem that I currently do not have; see the FAQ.
IMPORTANT: make sure your data models are registered as @Singleton
s!
By extending the DataModelTemplate
class you get full refreshes basically for free, and an easy way to process events in a more granular fashion, if so desired: simply override the process
methods of choice and provide your own implementation.
As an example of a custom data model, see the built-in Journal
model, which is used by a number of queries.
This section lists all readily available queries and explains what they do. Each query supports arguments to be passed to it. To know what they are, use the help
query somewhere in a monitored repository. For example:
<!--query:help
name: timeline
-->
<!--/query-->
This will write information on the selected query (in this case: timeline
), including the parameters it supports as the query output.
Built-in queries are always available and cannot be disabled.
This query generates a sorted list of documents in a folder. Each item is a link to a document. Through the configuration the list can be reverse sorted, and documents in subfolders can be recursively added as well.
This query generates a sorted table of pages, with optional front matter fields in additional columns. This is a more powerful version of the list
query.
Through the configuration you can extract any front-matter field from the individual documents and add them to the table. Next to that you can (reverse) sort the table on any front-matter field.
This query generates a table of contents for the current document. You can tweak the table by configuring the minimum and maximum header levels to include.
By including the LinksModule
to your Curator module, the queries in this section become available.
@Module(includes = {CuratorModule.class, LinksModule.class})
abstract class MyCuratorModule
{
// Your code here
}
This query lists all dead links in a document. In other words: all links that refer to documents that do not exist within the vault.
The Journal module supports Logseq-like daily outlines, and has a number of queries to slice and dice information from these outlines.
I like the way Logseq puts emphasis on the daily log as the place to write notes. What I do not like about Logseq, however:
- Everything is an outline. I prefer the freedom full Markdown gives me. When I write an article for example, I use sections and paragraphs, without bullets.
- It's all dynamic, and therefore the functionality only works in Logseq itself. I like to be able to use any text editor.
- All documents are stored in the same folder. I prefer using a couple of folders to categorize documents: "Projects", "Contacts", "Articles", and so on.
This module addresses that, by generating static timelines from the daily logs. The journal looks only at a specific section in the daily log, where it expects an outline. That means the daily log may contain other content as well.
For example, here is the template I currently use for my daily notes:
<!--query:dayNav-->
<!--/query-->
## π Activities
- *Put your outline here!*
## ποΈ On the agenda
<!--query:timeline-->
<!--/query-->
## π§ Retrospective
...
To enable the module in your Curator, you have to include it:
@Module(includes = {CuratorModule.class, JournalModule.class})
abstract class MyCuratorModule
{
@Provides
static JournalSettings journalSettings()
{
return new JournalSettings(
"Journal", // Where daily journal pages are kept
"Markers", // Where marker descriptions are kept
"Activities", // Name of the section with the outline
"Projects" // Where project notes are kept
);
}
// Your code here
}
The timeline
query generates a timeline on a certain topic; by default this is the page the timeline query is added to. The timeline is sorted by date, newest first. Each entry for the selected topic contains the context from the daily journal, similar as to how Logseq does it.
The marked
query generates a selection of lines annotated with a specific marker, one section per marker, on a certain topic. By default, the topic is the page the query is added to. Lines in each section are ordered according to the timeline; oldest first. The markers themselves are removed from each line.
A marker is nothing more than a reference to a document. That can be any document. The document might not even exist; the functionality still works. Markers are useful to collect specific segments from the timeline and show them promimently in their own section, for example at the top of a document.
Markers only apply to a topic if they are exactly one level lower than the topic itself. This is so you can reuse markers for different topics, even when the topics are nested.
For example, let's say you have a timeline somewhere that looks like this:
- Important meeting on [[Topic 1]].
- Meeting note 1
- [[βοΈ]] Important meeting note 2
- We also discussed [[Topic 2]].
- [[βοΈ]] We shouldn't forget this!
If you now put the following query on the page of "Topic 1":
<!--query:markers markers: [β]οΈ-->
<!--/query-->
...this query will produce the output:
## βοΈ
- Important meeting note 2
It lists all lines marked with a reference to βοΈ and collects them in a single section. It doesn't show "We shouldn't forget about this!", because the marker on that line applies to Topic 2.
You can change the header titles of the section for each marker in the query output by adding a title
property to the marker document. This ensures consistency across the vault and simplifies the query definition.
In this example, if you were to define the page βοΈ
as follows:
---
title: βοΈ Important!
---
(Here it's useful to explain what the marker is used for.)
...then the section title in all query outputs would include the text "Important !".
If you want to group the entries for a marker by date in the output, then add the front matter variable group-by-date
with a value of true
.
Creating marker documents has more advantages than just being able to influence the query output:
- They're documents like any other, so things like backlinks "just work".
- They can define aliases in their front matter, natively supported by Obsidian.
- They prevent dead links in your vault.
- They allow you to explain what a marker is supposed to be used for, for future reference.
- ...they could even have their own timeline!
The period
query generates a list of notes from a specific folder that were referenced in a certain period. For example:
<!--query:period
start: 2023-09-01
end: 2023-09-30
folder: 'Contacts'
-->
- [[Contact A]]
- [[Contact D]]
- [[Contact M]]
<!--/query-->
This example shows all contacts that were referenced from the daily log in September 2023, in a sorted list.
The weekly
query is a specialization of the period
query. It picks a specific week of a specific year. If your weekly notes are named YYYY Week ww
(e.g. 2023 Week 42
) and you place this query in that note, then no configuration is necessary. It will list all projects referenced in that week.
Put this query at the top of the daily log pages.
The dayNav
query generates a set of links to the previous and next daily entry in the journal, as well as to the weekly entry that the day's entry belongs to. It also prints the date as a readable text, like "Sunday, October 1, 2023"
Put this query at the top of the weekly log pages.
The weekNav
query generates a set of links to the previous and next weekly entry in the journal, as well as to each of the individual entries in the week. It also prints the week as a readable text, like "2023, Week 39".
Solution: set the environment to use the right language, e.g. LC_CTYPE=UTF-8
.
My personal experience:
- When run from the command line, changes to all files were detected.
- When run from within IntelliJ IDEA, changes to files with emoj's in their name where not detected.
Adding the LC_CTYPE
variable to the Run configuration environment fixed it. The command line already had it.
This shouldn't be needed, but when in doubt, it's easy: remove or change the hash at the bottom of the query definition.
The curator uses the hash to detect changes in query output; not the query output itself. It does this because query output is not actually stored in memory. That's why you see these hashes show up in the query output (as part of the closing HTML comment).
When you change the hash in any way, the curator will assume the content has changed, and will replace it with a fresh query result.
Yes, you can! By providing argument "-1" or "--once" at start-up, the curator will do its work once and then quit.
I've gone out of my way to limit both the amount of memory and CPU it uses. This is a Java application, so there's that; there's a minimal memory footprint. But on top of that I aim to use as little as possible.
However, this application reads all Markdown files in memory, and keeps them there. (Without the content of the query blocks in the content; those are only kept in memory during a processing run.) I made this choice explicitly; see ADR00003. This means that the amount of memory used grows with the amount of documents in a repository. Since those documents are normally written by hand, I figured it would take about forever to use more memory than is available in modern hardware. And by that time the amount of memory will likely have doubled at least.
When changes are detected the application kicks in by first refreshing all data models, then running all queries embedded in the content, and finally writing to disk only those files that have changed. The queries run in parallel, using Java virtual threads.
This is all nice and good, but what does it actually mean?
Well, I use vincents-markdown-curator (vmc) all day, every day. It sits on top of 4 repositories, with around 5000 Markdown documents in there. My work repository is the largest by far, with over 3500 documents. In those documents there are over 3600 embedded queries. And its growing, because I create at least one document for every working day to hold the daily log, and every daily log embeds at least 2 queries.
I've limited vmc to use at most 128 MB of memory. When it's not doing anything it currently uses a little over 65 MB, according to jconsole. When it's running queries in my work repository this spikes to about twice that, but then quickly comes down again.
CPU-wise I don't get to see it use more than 5% of my M1 Pro and then only during processing. Otherwise it's using around 0.3%.
Running all 3600 queries takes less than 75 milliseconds.
All in all I basically don't notice that vmc is running all the time. It will be many years from now that I'll have so many documents with so many embedded queries in them that performance becomes an issue. So, that's a challenge for another day.