Merge pull request #10 from databio/refactor

Release 0.0.2
databio · Nov 18, 2022 · f23ba8a · f23ba8a
2 parents 8fb5119 + 731643b
commit f23ba8a
Show file tree

Hide file tree

Showing 88 changed files with 4,933 additions and 2,184 deletions.
diff --git a/.github/workflows/black.yml b/.github/workflows/black.yml
@@ -0,0 +1,11 @@
+name: Lint
+
+on: [pull_request]
+
+jobs:
+  lint:
+    runs-on: ubuntu-latest
+    steps:
+      - uses: actions/checkout@v2
+      - uses: actions/setup-python@v2
+      - uses: psf/black@stable
diff --git a/.github/workflows/python-publish.yml b/.github/workflows/python-publish.yml
@@ -0,0 +1,31 @@
+# This workflows will upload a Python Package using Twine when a release is created
+# For more information see: https://help.github.com/en/actions/language-and-framework-guides/using-python-with-github-actions#publishing-to-package-registries
+
+name: Upload Python Package
+
+on:
+  release:
+    types: [created]
+
+jobs:
+  deploy:
+
+    runs-on: ubuntu-latest
+
+    steps:
+    - uses: actions/checkout@v2
+    - name: Set up Python
+      uses: actions/setup-python@v2
+      with:
+        python-version: '3.x'
+    - name: Install dependencies
+      run: |
+        python -m pip install --upgrade pip
+        pip install setuptools wheel twine
+    - name: Build and publish
+      env:
+        TWINE_USERNAME: ${{ secrets.PYPI_USERNAME }}
+        TWINE_PASSWORD: ${{ secrets.PYPI_PASSWORD }}
+      run: |
+        python setup.py sdist bdist_wheel
+        twine upload dist/*
diff --git a/.github/workflows/run-pytest.yml b/.github/workflows/run-pytest.yml
@@ -0,0 +1,35 @@
+name: Run pytests
+
+on:
+  push:
+    branches: [dev]
+  pull_request:
+    branches: [master, dev]
+
+jobs:
+  pytest:
+    runs-on: ${{ matrix.os }}
+    strategy:
+      matrix:
+        python-version: ["3.7", "3.10", "3.11"]
+        os: [ubuntu-latest]
+
+    steps:
+    - uses: actions/checkout@v2
+
+    - name: Set up Python ${{ matrix.python-version }}
+      uses: actions/setup-python@v2
+      with:
+        python-version: ${{ matrix.python-version }}
+
+    - name: Install dev dependencies
+      run: if [ -f requirements/requirements-dev.txt ]; then pip install -r requirements/requirements-dev.txt; fi
+
+    - name: Install test dependencies
+      run: if [ -f requirements/requirements-test.txt ]; then pip install -r requirements/requirements-test.txt; fi
+
+    - name: Install package
+      run: python -m pip install .
+
+    - name: Run pytest tests
+      run: pytest -x -vv
diff --git a/.gitignore b/.gitignore
@@ -0,0 +1,78 @@
+# ignore test results
+oldtests/test/*
+
+# toy/experimental files
+*.pkl
+
+# ignore eggs
+.eggs/
+
+# generic ignore list:
+*.lst
+
+# Compiled source 
+*.com
+*.class
+*.dll
+*.exe
+*.o
+*.so
+*.pyc
+
+# Packages 
+# it's better to unpack these files and commit the raw source
+# git has its own built in compression methods
+*.7z
+*.dmg
+*.gz
+*.iso
+*.jar
+*.rar
+*.tar
+*.zip
+
+# Logs and databases 
+*.log
+*.sql
+*.sqlite
+
+# OS generated files 
+.DS_Store
+.DS_Store?
+._*
+.Spotlight-V100
+.Trashes
+ehthumbs.db
+Thumbs.db
+
+# Gedit temporary files 
+*~
+
+# libreoffice lock files:
+.~lock*
+
+# Default-named test output
+microtest/
+open_pipelines/
+
+# IDE-specific items
+.idea/
+
+# pytest-related
+.cache/
+.coverage*
+.pytest_cache
+
+# Reserved files for comparison
+*RESERVE*
+
+doc/
+site/
+build/
+dist/
+markmeld.egg-info/
+__pycache__/
+
+
+*ipynb_checkpoints*
+hello_looper-master*
diff --git a/LICENSE.txt b/LICENSE.txt
@@ -1,4 +1,4 @@
-Copyright 2019 Nathan Sheffield
+Copyright 2022 Nathan Sheffield
 
 Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
 

diff --git a/README.md b/README.md
@@ -1,193 +1,18 @@
-# markmeld
+# <img src="docs/img/markmeld_logo_long.svg" alt="markmeld logo" height="70">
 
-`markmeld` is a command-line tool for integrating structured data from `yaml` or `markdown` files into `markdown` output using `jinja2` templates. The name `markmeld` refers to it as a *markup* *melder*. It makes it easy to restructure your structured data into different output formats. It's a companion to pandoc that allows you to merge and shape various data, from yaml or markdown documents, and output them into markdown format that can then (optionally) be piped to pandoc.
+Read the complete documentation at [markmeld.databio.org](https://markmeld.databio.org).
 
-![demo](markmeld_abstract.svg)
-
-
-## Install
+## Testing
 
 ```
-pip install https://github.com/databio/markmeld/archive/refs/heads/master.zip
+pytest
 ```
 
-Markmeld provides the `mm` executable:
+You can also just build the demos.
 
 ```
 cd demo
 mm default
 ```
 
-This will produce the output, automatically piping to pandoc. You can also get the raw output with `-p`, like this:
-
-```
-mm default -p > rendered.md
-```
-
-## Markmeld config file
-
-You produce a file called `_markmeld.yaml` to configure your project. In the file you specify any variables you want,  The `demo/_markmeld.yaml` looks like this:
-
-```
-targets:
-  default:
-    md_template: md_template.jinja
-latex_template: pandoc_default.tex
-output_file: "{today}_demo_output.pdf"
-data_yaml:
-  - some_data.yaml
-data_md:
-  some_text_data: some_text.md
-```
-
-The configurable attributes are:
-
-- `targets`: a list of targets (outputs) to build. Each target can contain the other configurable attributes.
-- `data_yaml` - a list of yaml files to make available to the templates
-- `data_md` - a named list of markdown files, which will be made available to the templates
-- `data_variables` - direct yaml data made available to the templates.
-- `data_md_globs` - Globs, where each file will be read, and available at the key of the filename.
-Any other attributes will be made available to the build system, but not to the jinja templates.
-
-In the demo, the only target you can build is `default`. You can see the list of targets with `mm -l`. 
-
-## md jinja template
-
-Your markdown items will be available under the key you specify in the config. If you are using the `_globs` key, then they will be available under the filename. You can then access them in the jinja template as variables, like this:
-
-```
-{{ variable.content }}
-```
-
-The `.content` attribute will have the actual markdown -- this is probably what you want. But if you want metadata, you can also access that under `{{ variable.metadata }}`.
-
-
-## The jinja md array
-
-See detailed instructions for how to access md content with variable names using the md array.
-
-## Hooks
-
-You can add a 'prebuild' hook, which runs a separate target them by adding:
-
-```
-prebuild: 
-  - manuscript_supplement
-  - manuscript
-postbuild:
-  - split
-```
-
-in `_markmeld.yaml`. This allows you to build another recipe before the current one. These recipes can be built-in recipes (which are in `mm_targets`), or can be recipes from your cfg file. I'm using built-in recipes to provide alternative commands, like building figures or splitting stuff. I guess I could make these command templates instead.
-
-## Imports
-
-It's super useful to define global config options, and then re-use them across projects. You can do this with `imports`.So I have a global config file, say `/_markmeld_config.yaml`:
-
-```yaml
-sciquill: /home/nsheff/code/sciquill/
-figczar: /home/nsheff/code/sciquill/pandoc_filters/figczar/figczar.lua
-highlighter: /home/nsheff/code/sciquill/pandoc_filters/change_marker/change_marker.lua
-multirefs: /home/nsheff/code/sciquill/pandoc_filters/multi-refs/multi-refs.lua
-csl: /home/nsheff/code/sciquill/csl/biomed-central.csl
-bibdb: /home/nsheff/code/papers/sheffield.bib
-```
-
-Now you use:
-```yaml
-imports:
-- /_markmeld_config.yaml
-```
-
-And now I can use `{figczar}` and `{bibdb}` in `command` section of a `_markmeld.yaml` file. If you want to be really cool, maybe point to this config file with `$MARKMELD` and then use:
-
-```yaml
-imports:
-- $MARKMELD
-```
-
-It works! Imports are in priority order, and lower priority than whatever you have in the local file, like `css`.  You can also define targets and import them.
-
-## Raw commands
-
-If in a command you use `type: raw`, then the command will run directly, and not pass the template render as stdin.
-
-## Commands without pandoc
-
-Usually, I want to run whatever my template is through pandoc, to produce the output. Markmeld first creates markdown using the jinja template, and then passes this to pandoc to convert to the final output.
-
-But sometimes, the output I make from the jinja template is *not* markdown, and that's my end product. For example, I may want to produce a `csv` file representation of some data I had in yaml format. Markmeld can also do this. In this case, you would just change the `command`, and don't use pandoc.
-
-```
-command: |
-  cat > {output_file}
-```
-
-Then, your jinja template would spit out a csv file. This command basically just writes that to an output file. You can use it to get the output from jinja directly.
-
-## Rationale
-
-Why is this better than just stringing stuff together using pandoc? Well, for one, the power of a jinja template is pretty nice... so I can just tell markmeld about all the data, which can be either markdown or yaml, and then using jinja I can restructure the output in whatever format I want. Furthermore, it allows me to intersperse yaml data in there. Without markmeld, I couldn't really find an easy way to integrate prose content (in markdown format) with structured content (in yaml format) into one output. This is useful for something like a CV/Biosketch, where I have some prose components, and then some lists, which I'd rather draw from a structured YAML file.
-
-For simple documents like a manuscript that don't really use much structured content and are purely gluing together prose, you can get by with just straight-up pandoc. You'd just pass multiple markdown files directly to pandoc on the command line. But even in these situations, you gain something from going the route of the jinja template with markmeld: it formalizes the linking of documents into a separate file, instead of relying the on order and content of CLI arguments to pandoc. So you can more easily write a little recipe saying, "provide these pieces of content under these names, and then use this jinja template to produce the output". So, it makes that recipe reproducible.
-
-## How to write mail-merge letters with markmeld
-
-1. Data
-
-You need a `data.yaml` file like this. This is a list of people you want to send the letter to:
-
-```
-people:
-  - first_name: Bob
-    last_name: Jones
-    email: [email protected]
-```
-
-2. Letter
-
-Write your letter in a jinja template like this `letter.jinja`:
-
-```
-{% for person in people %}
-
-<a href="mailto:{{ person.email }}?subject=SUBJECT&body=Hi {{person.first_name}},%0D%0A%0D%Letter contentt %0D%0A%0D%0AThanks, and we should catch up some time!%0D%0A%0D%0A-Nathan">{{ person.first_name }}</a>
-
-{% endfor %}
-```
-
-3. Markmeld config in `_markmeld.yaml`:
-
-Which is something like:
-
-```
-imports:
-  - $MMDIR/$HOSTNAME.yaml
-targets:
-  links:
-    md_template: letter_template.jinja
-    output_file: "{today}.html"
-    data_yaml:
-      - data.yaml
-    command: |
-      pandoc \
-        -o {output_file}
-```
-
-Now just `mm links`, open the file, and you have personalized click links for all your letters. Easy peasy!
-
-
-## Limitations and TODO
 
-- [x] tab completion
-- [x] Config file should be a positional argument
-- [x] some kind of list functionality to show available recipes to build? `mm -l`
-- [x] the latex template is configurable, but nothing else with pandoc. Really, should pandoc just be something you pipe `mm` output to?
-- [x] CLI: `mm meldsource.yaml target`
-- [x] use `_markmeld.yaml` by default, so you configure by putting a `_markmeld.yaml` file in root.
-- [ ] `mm` without a target lists the targets.
-- [ ] Currently, paths are relative to the working directory. Instead, paths should be relative to the directory of the yaml file. (this will only matter when I start trying to build stuff using external `_markmeld.yaml` files in other folders.)
-- [ ] Might need better error handling in case some sections aren't present in the config file. All sections are optional. This has not been thoroughly tested.
-- [ ] Currently, config files can import one another with `imports`. This way I can keep common targets in common config files. Would this be a useful application for PEP?
-- [ ] Right now, if you want to provide markmeld with `md` data, you can either specify them explicitly, in which case you can define an identifier by which you can refer to that file, like `my_identifier: path/some_file.md`, which can then be referenced in a template with `{{ my_identifer.content }}`. But if you use `data_md_globs`, then you just give it file globs, and the identifier is the filename. I could build an alternative metadata key, like `mm_id: my_identifier`, and if you use the glob approach, it could become available under that label. Why might this be useful? 1) For a mix/match where I want swap out one possible version of `my_identifier` with another, this way I can do that with different file names; 2) if using hedgedoc, I may not control the filename. So if it's a remote file... I guess I'd just have to make it explicit...
-- `markmeld_templates: [ ]` - a priority list of folders to search for a named template file (in `md_template`, which must exist as a file). (maybe?)
diff --git a/demo/2021-10-25_demo_output.pdf → demo/2022-09-23_demo_output.pdf b/demo/2021-10-25_demo_output.pdf → demo/2022-09-23_demo_output.pdf
diff --git a/demo/2022-10-02_demo_output.pdf b/demo/2022-10-02_demo_output.pdf
diff --git a/demo/_markmeld.yaml b/demo/_markmeld.yaml
@@ -1,6 +1,6 @@
 targets:
   default:
-    md_template: md_template.jinja
+    jinja_template: md_template.jinja
     recursive_render: false
 output_file: "{today}_demo_output.pdf"
 data_yaml:

diff --git a/demo/null.yaml b/demo/null.yaml
@@ -0,0 +1,11 @@
+version: 1
+targets:
+  default:
+    jinja_template: null
+    recursive_render: false
+output_file: "{today}_demo_output.pdf"
+data:
+  yaml_globs_unkeyed:
+    - some_data.yaml
+  md_files:
+    some_text_data: some_text.md