Customizing Markdown Rendering with Lua Filters
Customizing Markdown Rendering with Lua Filters
📋 Table of Contents
- 🌐 Introduction
- ⚙️ Understanding Quarto’s Rendering Process
- 🔧 Lua Filters: Intercepting the Conversion
- 🚀 Creating Your First Lua Filter
- 🎯 Advanced Filter Techniques
- 📦 Quarto Filter Extensions
- 🎓 Conclusion
- 📚 Resources
- 📎 Appendix A: The Bullet List Problem
- 🔀 Appendix B: Alternatives to Lua Filters
🌐 Introduction
Quarto is a powerful publishing system built on top of Pandoc, enabling the creation of technical documentation, websites, and presentations from Markdown files. While Quarto’s default rendering is robust, there are cases where you need fine-grained control over how Markdown is converted to HTML. This is where Lua filters become invaluable.
This article explains:
- How Quarto’s rendering pipeline processes Markdown
- How Lua filters intercept and modify the Abstract Syntax Tree (AST)
- Practical techniques for fixing HTML generation issues
- A real-world case study: solving bullet list rendering problems
⚙️ Understanding Quarto’s Rendering Process
The Rendering Pipeline
Quarto’s rendering process follows this flow:
graph LR
A[Markdown Source] --> B[Quarto Preprocessor]
B --> C[Pandoc Parser]
C --> D[Abstract Syntax Tree]
D --> E[Lua Filters]
E --> F[Pandoc Writer]
F --> G[HTML Output]
Key stages:
- Quarto Preprocessor: Processes YAML metadata, code chunks, and Quarto-specific syntax
- Pandoc Parser: Converts Markdown to an Abstract Syntax Tree (AST)
- AST: Internal representation of document structure as a tree of elements
- Lua Filters: Custom code that modifies the AST before output generation
- Pandoc Writer: Converts AST to target format (HTML, PDF, etc.)
The Abstract Syntax Tree (AST)
The AST represents your document as nested elements. For example:
**Bold text:**
- Item 1
- Item 2Without a blank line, Pandoc parses this as a single Para (paragraph) block containing:
Para {
Strong { Str "Bold" Space Str "text:" }
SoftBreak
Str "-" Space Str "Item" Space Str "1"
SoftBreak
Str "-" Space Str "Item" Space Str "2"
}With a blank line, it becomes:
Para { Strong { Str "Bold" Space Str "text:" } }
BulletList {
{ Plain { Str "Item" Space Str "1" } }
{ Plain { Str "Item" Space Str "2" } }
}Key insight: Pandoc uses blank lines to separate block-level elements. Without them, list markers are just text within a paragraph.
🔧 Lua Filters: Intercepting the Conversion
What Are Lua Filters?
Lua filters are scripts that traverse and modify Pandoc’s AST during rendering. They allow you to:
- Transform specific document elements
- Add custom HTML attributes
- Generate content dynamically
- Fix rendering issues that can’t be addressed with Markdown syntax alone
Filter Registration
Filters are registered in _quarto.yml:
project:
type: website
filters:
- _extensions/my-filter/my-filter.luaOr in document frontmatter:
---
title: "My Document"
filters:
- my-filter.lua
---Filter Execution Order
Filters are applied in the order they appear in configuration. Each filter receives the AST from the previous filter’s output.
Pandoc Markdown Extensions
Pandoc supports numerous Markdown extensions that affect AST generation before filters run. Understanding these is crucial for filter development:
Common extensions that impact filtering:
fenced_code_attributes- Adds attributes to code blocks (.class #id key=value)backtick_code_blocks- Enables ``` syntax for code blocksfenced_divs- CreatesDivelements with:::syntaxbracketed_spans- Creates inlineSpanelements with[text]{.class}raw_html- Preserves HTML elements asRawBlock/RawInlineimplicit_figures- Converts standalone images toFigureelementspipe_tables- Enables|delimited tables
Checking enabled extensions:
function Pandoc(doc)
-- Extensions are available in doc.meta
local extensions = PANDOC_READER_OPTIONS.extensions
io.stderr:write("Enabled extensions: " .. tostring(extensions) .. "\n")
return doc
endWriting extension-aware filters:
-- Handle both fenced divs and HTML divs
function Div(elem)
-- This catches both :::{.class} and <div class="class">
if elem.classes:includes("callout") then
-- Transform callout divs
elem.classes:insert("custom-callout")
end
return elem
end
function RawBlock(elem)
-- Handle raw HTML divs if fenced_divs disabled
if elem.format == "html" and elem.text:match('<div class="callout">') then
-- Parse and convert to Div element
return pandoc.Div({}, {class = "custom-callout"})
end
return elem
endBest practices:
- Test filters with different extension sets
- Document which extensions your filter requires
- Provide fallback behavior for disabled extensions
- Use
PANDOC_READER_OPTIONSto detect extension availability
🚀 Creating Your First Lua Filter
Basic Filter Structure
A Lua filter defines functions that match Pandoc AST element types:
-- Simple filter to modify all paragraphs
function Para(elem)
-- Modify the paragraph element
return elem
end
-- Filter for headers
function Header(elem)
-- elem.level contains header level (1-6)
-- elem.content contains inline elements
return elem
end
-- Filter for links
function Link(elem)
-- elem.target contains URL
-- elem.content contains link text
return elem
endElement Types
Common Pandoc AST element types:
Block elements:
Para- ParagraphHeader- HeadingBulletList- Unordered listOrderedList- Numbered listCodeBlock- Code blockDiv- Generic container
Inline elements:
Str- Text stringSpace- Space characterSoftBreak- Line break (newline in source)LineBreak- Hard line breakStrong- Bold textEmph- Italic textLink- HyperlinkCode- Inline code
Example: Adding CSS Classes
-- Add a CSS class to all code blocks
function CodeBlock(elem)
elem.classes:insert("highlight")
return elem
end
-- Add custom attributes to headers
function Header(elem)
if elem.level == 2 then
elem.classes:insert("section-header")
end
return elem
endExample: Transforming Content
-- Convert all "TODO" text to highlighted notes
function Str(elem)
if elem.text == "TODO" then
return pandoc.Span(
{pandoc.Str("TODO")},
{class = "todo-marker"}
)
end
return elem
end🎯 Advanced Filter Techniques
Common Filter Patterns
Reusable patterns for common filtering tasks:
Pattern 1: Find and replace text
-- Replace all occurrences of a term with styled version
function Str(elem)
if elem.text:match("API") then
return pandoc.Span(
{pandoc.Str(elem.text)},
{class = "api-term"}
)
end
return elem
endPattern 2: Wrap elements with containers
-- Wrap all tables in a responsive container
function Table(elem)
return pandoc.Div(
{elem},
{class = "table-responsive"}
)
endPattern 3: Add automatic anchors
-- Generate URL-friendly anchors for headers
function Header(elem)
if not elem.identifier or elem.identifier == "" then
elem.identifier = pandoc.utils.stringify(elem.content)
:gsub("%s+", "-")
:gsub("[^%w-]", "")
:lower()
end
return elem
endPattern 4: Collect document metadata
local word_count = 0
local code_blocks = 0
function Str(elem)
word_count = word_count + 1
return elem
end
function CodeBlock(elem)
code_blocks = code_blocks + 1
return elem
end
function Pandoc(doc)
doc.meta["word_count"] = word_count
doc.meta["code_blocks"] = code_blocks
return doc
endPattern 5: Transform based on attributes
-- Convert divs with special attributes to custom elements
function Div(elem)
if elem.attributes["type"] == "warning" then
local icon = pandoc.RawInline("html", "⚠️ ")
local title = pandoc.Strong({icon, pandoc.Str("Warning")})
table.insert(elem.content, 1, pandoc.Para({title}))
elem.classes:insert("warning-box")
end
return elem
endPattern 6: Link validation and transformation
-- Add external link indicator and attributes
function Link(elem)
local url = elem.target
if url:match("^https?://") and not url:match("^https://mysite.com") then
elem.classes:insert("external-link")
elem.attributes["target"] = "_blank"
elem.attributes["rel"] = "noopener noreferrer"
-- Add visual indicator
table.insert(elem.content, pandoc.Str(" ↗"))
end
return elem
endInspecting Element Structure
Use io.stderr:write() to debug AST structure:
function Para(elem)
io.stderr:write("=== Paragraph ===\n")
for i, inline in ipairs(elem.content) do
io.stderr:write(string.format("[%d] %s\n", i, inline.t))
end
return elem
endReturning Multiple Elements
Filters can return arrays to replace one element with many:
function Para(elem)
-- Split paragraph into multiple blocks
return {
pandoc.Para({pandoc.Str("First paragraph")}),
pandoc.Para({pandoc.Str("Second paragraph")})
}
endError Handling Strategies
Robust filters handle edge cases and failures gracefully:
1. Validate inputs before processing:
function Link(elem)
-- Validate URL exists
if not elem.target or elem.target == "" then
io.stderr:write("Warning: Empty link target\n")
return elem
end
-- Validate URL format
if not elem.target:match("^https?://") and not elem.target:match("^/") then
io.stderr:write(string.format("Warning: Unusual URL format: %s\n", elem.target))
end
return elem
end2. Use pcall for risky operations:
function CodeBlock(elem)
local success, result = pcall(function()
return syntax_highlight(elem) -- External function that might fail
end)
if not success then
io.stderr:write("Syntax highlighting failed: " .. tostring(result) .. "\n")
elem.classes:insert("highlight-error")
return elem
end
return result
end3. Provide fallback behavior:
function Image(elem)
-- Try to read image dimensions
local success, dimensions = pcall(get_image_size, elem.src)
if success then
elem.attributes["width"] = dimensions.width
elem.attributes["height"] = dimensions.height
else
-- Fallback: Use default dimensions
io.stderr:write(string.format("Could not read %s, using defaults\n", elem.src))
elem.attributes["width"] = "800"
elem.attributes["height"] = "600"
end
return elem
end4. Accumulate errors for summary:
local errors = {}
function Para(elem)
if elem.content[1] and elem.content[1].t == "Str" and elem.content[1].text:match("^ERROR:") then
table.insert(errors, pandoc.utils.stringify(elem.content))
end
return elem
end
function Pandoc(doc)
if #errors > 0 then
io.stderr:write(string.format("\n=== Found %d errors ===\n", #errors))
for i, err in ipairs(errors) do
io.stderr:write(string.format("%d. %s\n", i, err))
end
end
return doc
end5. Fail gracefully with informative messages:
function Div(elem)
if elem.attributes["include-file"] then
local filename = elem.attributes["include-file"]
local file = io.open(filename, "r")
if not file then
-- Return error message as content
return pandoc.Div({
pandoc.Para({
pandoc.Strong({pandoc.Str("Error:")}),
pandoc.Space(),
pandoc.Str(string.format("Could not include file '%s'", filename))
})
}, {class = "error"})
end
local content = file:read("*all")
file:close()
-- ... process content
end
return elem
endFilter Coordination and Ordering
When using multiple filters, order matters:
Filter execution flow:
# _quarto.yml
filters:
- normalize-links.lua # 1. Fix relative URLs
- validate-links.lua # 2. Check link validity
- add-link-icons.lua # 3. Add visual indicatorsEach filter receives the AST modified by previous filters.
Passing data between filters:
Filters can’t directly share variables, but can use document metadata:
-- filter-1.lua: Collect link count
local link_count = 0
function Link(elem)
link_count = link_count + 1
return elem
end
function Pandoc(doc)
doc.meta["link_count"] = link_count
return doc
end-- filter-2.lua: Use link count from previous filter
function Pandoc(doc)
local count = doc.meta["link_count"]
if count and count > 100 then
io.stderr:write("Warning: Document has many links\n")
end
return doc
endCoordination strategies:
Separation of concerns: Each filter handles one transformation
sanitize.lua- Clean up malformed elementsenhance.lua- Add extra functionalitystyle.lua- Apply visual styling
Conditional execution: Use metadata to enable/disable filters
function Pandoc(doc) -- Only run if enabled in metadata if not doc.meta["enable-custom-filter"] then return doc end -- ... filter logic endFilter dependencies: Document prerequisites in comments
-- REQUIRES: normalize-links.lua must run first -- EXPECTS: All links converted to absolute URLs function Link(elem) assert(elem.target:match("^https?://"), "Link not normalized") -- ... validation logic endUse extension bundles: Package related filters together
# _extension.yml contributes: filters: - step1-normalize.lua - step2-validate.lua - step3-enhance.lua
Debugging and Testing Filters
Debugging techniques:
Use stderr for logging: Output doesn’t interfere with document rendering
function Header(elem) io.stderr:write(string.format("Processing H%d: %s\n", elem.level, pandoc.utils.stringify(elem.content) )) return elem endInspect element structure: Use
pandoc.utils.stringifyand type inspectionfunction debug_element(elem, name) io.stderr:write(string.format("\n=== %s ===\n", name)) io.stderr:write("Type: " .. elem.t .. "\n") if elem.content then io.stderr:write("Content: " .. pandoc.utils.stringify(elem.content) .. "\n") end if elem.attributes then io.stderr:write("Attributes: " .. pandoc.utils.stringify(elem.attributes) .. "\n") end end function Div(elem) debug_element(elem, "Div") return elem endSave AST to file: Examine structure offline
function Pandoc(doc) local file = io.open("ast-debug.json", "w") file:write(pandoc.json.encode(doc)) file:close() return doc endUse Pandoc’s native debugging: Run with
-t nativeto see ASTpandoc document.md -t native > ast.txt
Testing strategies:
Unit test with minimal documents:
<!-- test-header.md --> # Test Header This should trigger the filter.Create test suite:
# test-filter.sh #!/bin/bash echo "Test 1: Header transformation" pandoc test-header.md --lua-filter=my-filter.lua -t html echo "Test 2: Link modification" pandoc test-links.md --lua-filter=my-filter.lua -t html echo "Test 3: Code block styling" pandoc test-code.md --lua-filter=my-filter.lua -t htmlCompare before/after AST:
# Show AST before filter pandoc input.md -t native > before.txt # Show AST after filter pandoc input.md --lua-filter=my-filter.lua -t native > after.txt # Compare diff before.txt after.txtQuarto-specific testing:
# test-document.qmd --- title: "Filter Test" format: html filters: - my-filter.lua keep-md: true # Keep intermediate markdown to inspect --- # Test Content This tests the filter.
Error handling in filters:
function CodeBlock(elem)
local success, result = pcall(function()
-- Potentially error-prone code
return process_code_block(elem)
end)
if not success then
io.stderr:write("Error processing code block: " .. tostring(result) .. "\n")
return elem -- Return original element
end
return result
endCommon debugging scenarios:
- Filter not triggering: Check element type names (use
-t nativeto verify) - Partial matches: Verify AST structure matches your assumptions
- Quarto-specific issues: Check if Quarto preprocessing affects AST
- Order dependencies: Ensure filter runs at correct stage in pipeline
Conditional Processing
Add logic to target specific patterns:
function Para(elem)
-- Only process paragraphs starting with "Note:"
if elem.content[1] and
elem.content[1].t == "Str" and
elem.content[1].text:match("^Note:") then
-- Wrap in a div with special class
return pandoc.Div({elem}, {class = "note"})
end
return elem
endCreating Extension Packages
Organize filters as Quarto extensions:
_extensions/
my-filter/
_extension.yml
my-filter.lua
**_extension.yml:**
title: My Filter
author: Your Name
version: 1.0.0
contributes:
filters:
- my-filter.luaRegister in _quarto.yml:
filters:
- _extensions/my-filter/my-filter.luaAppendix A: The Bullet List Problem
Problem Definition
Issue: Markdown lists that immediately follow text without a blank line separator don’t render as HTML lists, even with the +lists_without_preceding_blankline Pandoc extension enabled.
Example source:
**New Features:**
- Feature 1
- Feature 2
- Feature 3Actual HTML output:
<p><strong>New Features:</strong> - Feature 1 - Feature 2 - Feature 3</p>Expected HTML output:
<p><strong>New Features:</strong></p>
<ul>
<li>Feature 1</li>
<li>Feature 2</li>
<li>Feature 3</li>
</ul>Root Cause Analysis
- Pandoc’s parsing behavior: Without a blank line, Pandoc treats the entire sequence as a single paragraph
- Line breaks become
SoftBreakelements in the AST, not block separators - List markers are parsed as literal text (
Str "-") within the paragraph - The
+lists_without_preceding_blanklineextension only works when list items start at the beginning of a line in a new paragraph context
AST Structure Analysis
The problematic paragraph is parsed as:
Para {
[1] Strong { Str "New" Space Str "Features:" }
[2] SoftBreak
[3] Str "-"
[4] Space
[5] Str "Feature"
[6] Space
[7] Str "1"
[8] SoftBreak
[9] Str "-"
[10] Space
[11] Str "Feature"
[12] Space
[13] Str "2"
-- etc.
}Key observations:
- All content is within a single
Paraelement SoftBreakindicates line breaks in source- List markers (
"-") are justStrelements - No
BulletListstructure exists
Solution Approaches
Approach 1: Standard Markdown (Recommended)
Solution: Add a blank line before the list in the source:
**New Features:**
- Feature 1
- Feature 2
- Feature 3Pros:
- No custom code required
- Standard Markdown syntax
- Works reliably across all Markdown renderers
Cons:
- Requires modifying source files
- May not match desired visual spacing
Approach 2: Lua Filter Detection and Transformation
Solution: Create a Lua filter that detects the inline list pattern and transforms it.
Filter implementation:
-- detect-inline-lists.lua
-- Detects paragraphs containing "SoftBreak + list marker" patterns
-- and transforms them into proper header + list structure
function Para(block)
-- Find all positions where SoftBreak is followed by "- "
local listMarkers = {}
for i, inline in ipairs(block.content) do
if inline.t == "SoftBreak" and i < #block.content then
local next = block.content[i + 1]
-- Check for pattern: SoftBreak → Str("-") → Space
if next and next.t == "Str" and next.text == "-" then
if i + 2 <= #block.content and block.content[i + 2].t == "Space" then
table.insert(listMarkers, i)
end
end
end
end
-- Only transform if we have at least 2 list items
if #listMarkers < 2 then
return block
end
local firstMarker = listMarkers[1]
-- Split into header content and list items
local headerContent = {}
local listItems = {}
local currentItem = {}
-- Extract header (everything before first list marker)
for i = 1, firstMarker - 1 do
table.insert(headerContent, block.content[i])
end
-- Process list items
local i = firstMarker + 1
while i <= #block.content do
local inline = block.content[i]
-- Detect list item start: preceded by break, is "- "
local isListMarker = false
if inline.t == "Str" and inline.text == "-" and
i + 1 <= #block.content and block.content[i + 1].t == "Space" then
if i > 1 then
local prev = block.content[i - 1]
if prev.t == "SoftBreak" then
isListMarker = true
end
elseif i == firstMarker + 1 then
isListMarker = true
end
end
if isListMarker then
-- Save previous item
if #currentItem > 0 then
table.insert(listItems, pandoc.Plain(currentItem))
currentItem = {}
end
-- Skip "- " and continue
i = i + 2
elseif inline.t == "SoftBreak" then
-- Check if next is list marker
if i + 1 <= #block.content and
block.content[i + 1].t == "Str" and
block.content[i + 1].text == "-" then
-- Skip this break (it separates list items)
i = i + 1
else
-- Keep break within item
table.insert(currentItem, inline)
i = i + 1
end
else
-- Regular content
table.insert(currentItem, inline)
i = i + 1
end
end
-- Add final item
if #currentItem > 0 then
table.insert(listItems, pandoc.Plain(currentItem))
end
-- Build result
local result = {}
if #headerContent > 0 then
table.insert(result, pandoc.Para(headerContent))
end
if #listItems > 0 then
table.insert(result, pandoc.BulletList(listItems))
end
-- Safety check: if result is empty, return original
if #result == 0 then
return block
end
return result
endRegistration:
Create extension structure:
_extensions/
detect-inline-lists/
_extension.yml
detect-inline-lists.lua
**_extension.yml:**
title: Detect Inline Lists
author: Diginsight Team
version: 1.0.0
contributes:
filters:
- detect-inline-lists.lua**_quarto.yml:**
filters:
- _extensions/detect-inline-lists/detect-inline-lists.luaPros:
- Automatically fixes the issue without changing source
- Works for all documents in the project
- Transparent to content authors
Cons:
- Adds complexity to build process
- Requires careful testing to avoid false positives
- May have edge cases that break normal paragraphs
Implementation Considerations
Testing strategy:
- Create test documents with various patterns:
- Normal paragraphs with “-” in text
- Lists after headings
- Lists after code blocks
- Lists within blockquotes
- Mixed bold/italic text before lists
- Verify filter doesn’t break:
- Tables containing “-”
- Inline math expressions
- Code examples showing list syntax
Debugging:
Add logging to understand AST structure:
function Para(elem)
io.stderr:write("\n=== Paragraph ===\n")
for i, inline in ipairs(elem.content) do
local text = inline.text or ""
io.stderr:write(string.format("[%d] %s: %s\n", i, inline.t, text))
end
-- ... rest of filter logic
endRun Quarto with stderr visible:
quarto render document.md 2>&1 | grep "Paragraph"Recommendation
For most use cases: Use standard Markdown with blank lines (Approach 1). This is:
- Simple and reliable
- Portable across Markdown processors
- Easier to maintain
When to use Lua filters:
- Large existing documentation that can’t be easily modified
- Automated content generation where adding blank lines is difficult
- Custom rendering requirements beyond standard Markdown
📦 Quarto Filter Extensions
Quarto provides a modern extension system for packaging and distributing Lua filters. Extensions make filters reusable across projects and shareable with the community.
Extension Structure
A Quarto extension bundles filters with metadata:
my-extension/
_extension.yml # Extension metadata
my-filter.lua # Filter implementation
additional-filter.lua # Optional additional filters
README.md # Documentation
**_extension.yml structure:**
title: My Custom Filter
author: Your Name
version: 1.0.0
quarto-required: ">=1.2.0"
contributes:
filters:
- my-filter.lua
- additional-filter.luaCreating an Extension
1. Initialize extension directory:
quarto create extension filter my-extension
cd my-extensionThis generates the basic structure with _extension.yml and a starter filter.
2. Implement filter logic:
-- my-extension/custom-callouts.lua
function Div(elem)
if elem.classes:includes("note") then
-- Transform .note divs to custom callouts
elem.classes:insert("custom-callout")
elem.classes:insert("callout-note")
-- Add icon
local icon = pandoc.RawInline("html", '<i class="bi bi-info-circle"></i>')
if #elem.content > 0 and elem.content[1].t == "Para" then
table.insert(elem.content[1].content, 1, icon)
table.insert(elem.content[1].content, 2, pandoc.Space())
end
end
return elem
end3. Configure extension metadata:
# _extension.yml
title: Custom Callouts
author: Your Name
version: 1.0.0
quarto-required: ">=1.3.0"
contributes:
filters:
- custom-callouts.lua
format-resources:
- custom-callouts.css # Optional CSSUsing Extensions
Install from directory:
quarto add path/to/my-extensionInstall from GitHub:
quarto add username/my-extensionEnable in document:
---
title: "My Document"
filters:
- my-extension
---Or in _quarto.yml for project-wide use:
project:
type: website
filters:
- my-extensionExtension Best Practices
1. Version your extensions: Use semantic versioning
version: 1.2.3 # major.minor.patch
quarto-required: ">=1.3.0"2. Document requirements: Specify Quarto version and dependencies
quarto-required: ">=1.4.0"3. Provide examples: Include sample documents
my-extension/
_extension.yml
filter.lua
example.qmd # Example document
README.md # Usage instructions
4. Test across formats: Ensure filter works with HTML, PDF, docx
# test-document.qmd
---
title: "Test"
format:
html: default
pdf: default
docx: default
filters:
- my-extension
---5. Handle format-specific logic:
function Div(elem)
if FORMAT:match("html") then
-- HTML-specific rendering
return html_callout(elem)
elseif FORMAT:match("latex") then
-- LaTeX-specific rendering
return latex_callout(elem)
end
return elem -- Default for other formats
endPublishing Extensions
1. Create GitHub repository:
git init
git add .
git commit -m "Initial extension"
git remote add origin https://github.com/username/my-extension.git
git push -u origin main2. Tag releases:
git tag -a v1.0.0 -m "First release"
git push origin v1.0.03. Add to Quarto extensions listing:
Submit to quarto.org/docs/extensions/listing.html via pull request.
Example: Complete Extension Package
Scenario: Create extension for automatic acronym expansion
File: acronym-expander.lua
-- Load acronyms from metadata
local acronyms = {}
function Meta(meta)
if meta.acronyms then
for key, value in pairs(meta.acronyms) do
acronyms[key] = pandoc.utils.stringify(value)
end
end
return meta
end
-- Expand acronyms on first occurrence
local seen = {}
function Str(elem)
local text = elem.text
if acronyms[text] and not seen[text] then
seen[text] = true
return pandoc.Span({
pandoc.Str(acronyms[text]),
pandoc.Space(),
pandoc.Str("("),
pandoc.Strong({pandoc.Str(text)}),
pandoc.Str(")")
})
end
return elem
end**File: _extension.yml**
title: Acronym Expander
author: Documentation Team
version: 1.0.0
quarto-required: ">=1.3.0"
contributes:
filters:
- acronym-expander.luaFile: example.qmd
---
title: "Acronym Example"
filters:
- acronym-expander
acronyms:
API: "Application Programming Interface"
REST: "Representational State Transfer"
HTTP: "Hypertext Transfer Protocol"
---
## Using APIs
The API provides REST endpoints over HTTP.Output: First occurrences expanded, subsequent uses show acronym only: > The Application Programming Interface (API) provides Representational State Transfer (REST) endpoints over Hypertext Transfer Protocol (HTTP). The API endpoints…
🎓 Conclusion
Lua filters provide powerful capabilities for customizing Quarto’s rendering process. By understanding how Pandoc’s AST works and where filters fit in the pipeline, you can solve complex rendering issues that standard Markdown can’t address.
The bullet list problem demonstrates both the power and complexity of filter-based solutions. While filters can automatically fix rendering issues, they require careful implementation and testing. In many cases, adjusting source Markdown is simpler and more maintainable.
Key takeaways:
- Understand the AST: Use debugging to see how Pandoc parses your content
- Start simple: Try standard Markdown solutions before writing filters
- Test thoroughly: Filters can have unexpected side effects on other content
- Document your filters: Future maintainers need to understand custom rendering logic
📚 Resources
Pandoc Lua Filters Documentation [📘 Official]
Comprehensive reference for Pandoc’s Lua filter API, including all available functions, element types, and AST manipulation methods. Essential resource for filter development.
Quarto Extensions Guide [📘 Official]
Official guide to creating and using Quarto extensions, including filter extensions, shortcodes, and custom formats. Shows how to package filters for reuse and distribution.
Pandoc AST Reference [📘 Official]
Detailed type reference for all Pandoc AST elements. Use this to understand element structure and available properties when writing filters.
Lua 5.3 Reference Manual [📘 Official]
Complete language reference for Lua 5.3 (used by Pandoc). Covers language syntax, standard libraries, and programming patterns.
Pandoc Lua Filters Repository [📗 Verified Community]
Collection of community-contributed Lua filters with real-world examples. Excellent source for learning filter patterns and finding reusable solutions.
Quarto Filter Development Best Practices [📘 Official]
Quarto-specific guidance for writing filters that work well with Quarto’s processing pipeline, including performance tips and testing strategies.
Pandoc User’s Guide - Filters Section [📘 Official]
Comprehensive overview of how Pandoc filters work, including JSON filters and Lua filters, with execution order and pipeline details.
Lua Filter Tutorial by Albert Krewinkel [📗 Verified Community]
Practical tutorial by Pandoc’s lead Lua filter developer. Covers common patterns, debugging techniques, and advanced use cases.
Quarto CLI Reference [📘 Official]
Command-line reference for Quarto, including options for filter debugging, format-specific rendering, and project configuration.
Pandoc Discussions - Lua Filters [📗 Verified Community]
Active community forum for Lua filter questions, troubleshooting, and sharing techniques. Great resource for finding solutions to specific problems.
Lua 5.4 Updates [📘 Official]
Language updates in Lua 5.4 (used by newer Pandoc versions). Important for understanding new features and compatibility considerations.
Quarto Journal Articles Extension [📗 Verified Community]
Real-world examples of complex filter extensions for academic journal formatting. Shows advanced techniques for multi-format output customization.
🔀 Appendix B: Alternatives to Lua Filters
While Lua filters are powerful, other approaches may be simpler for certain use cases:
1. Quarto Shortcodes
When to use: Reusable content snippets without complex AST manipulation
Example:
# _quarto.yml
shortcodes:
- my-shortcodes.lua-- my-shortcodes.lua
return {
["warning"] = function(args, kwargs, meta)
return pandoc.Div({
pandoc.Para({
pandoc.Strong({pandoc.Str("⚠️ Warning: ")}),
pandoc.Str(args[1])
})
}, {class = "warning"})
end
}Usage in markdown:
{{< warning "Check your configuration before deploying" >}}Pros: Simple syntax, easy to learn, good for non-developers
Cons: Limited to content insertion, can’t modify existing elements
2. Pandoc Templates
When to use: Customizing document structure and layout
Example: Modify HTML template to add custom header
# Extract default template
pandoc -D html > custom-template.htmlEdit custom-template.html:
$if(custom-banner)$
<div class="custom-banner">
$custom-banner$
</div>
$endif$Usage:
---
format:
html:
template: custom-template.html
custom-banner: "This is a draft document"
---Pros: Full control over output structure, format-specific
Cons: Requires template syntax knowledge, separate template per format
3. Preprocessing Scripts
When to use: Complex transformations before Pandoc processes content
Example: Python script to inject table of contents
# preprocess.py
import re
import sys
def add_toc(content):
headers = re.findall(r'^##\s+(.+)$', content, re.MULTILINE)
toc = '\n## Table of Contents\n\n'
for header in headers:
anchor = header.lower().replace(' ', '-')
toc += f'- [{header}](#{anchor})\n'
# Insert after first header
return re.sub(r'(^#\s+.+$)', r'\1\n' + toc, content, count=1, flags=re.MULTILINE)
if __name__ == '__main__':
content = sys.stdin.read()
print(add_toc(content))Quarto integration:
---
format: html
filters:
- type: json
path: preprocess.py
---Or use quarto render with pipes:
python preprocess.py < input.md | quarto render -Pros: Use any programming language, full text manipulation
Cons: Operates on markdown text (not AST), harder to get right, fragile
4. Quarto Extensions (Non-Filter)
When to use: Adding functionality without modifying AST
Types:
- Format extensions: Custom output formats
- Shortcode extensions: Reusable content macros
- Project extensions: Project-wide configurations
Example: Format extension for custom HTML output
# _extensions/my-format/_extension.yml
title: My Format
contributes:
formats:
html:
my-format:
theme: custom
css: styles.css
include-in-header: header.htmlPros: Packages related customizations together
Cons: Doesn’t modify content, only presentation
5. Post-processing HTML/LaTeX
When to use: Output-specific modifications after rendering
Example: Modify HTML after Quarto rendering
# postprocess.py
from bs4 import BeautifulSoup
import sys
html = sys.stdin.read()
soup = BeautifulSoup(html, 'html.parser')
# Add target="_blank" to external links
for link in soup.find_all('a', href=True):
if link['href'].startswith('http') and 'mysite.com' not in link['href']:
link['target'] = '_blank'
link['rel'] = 'noopener noreferrer'
print(soup.prettify())Usage:
quarto render document.qmd
python postprocess.py < document.html > document-final.htmlPros: Operates on final output, can use powerful libraries (BeautifulSoup, etc.)
Cons: Format-specific, separate script per format, runs outside Quarto
6. CSS and JavaScript (HTML only)
When to use: Visual modifications without changing content structure
Example: Automatic anchor links
// add-anchors.js
document.addEventListener('DOMContentLoaded', function() {
document.querySelectorAll('h2, h3, h4').forEach(function(heading) {
const anchor = document.createElement('a');
anchor.className = 'anchor-link';
anchor.href = '#' + heading.id;
anchor.innerHTML = '#';
heading.appendChild(anchor);
});
});---
format:
html:
include-after-body: add-anchors.js
---Pros: No build-time processing, works in browser, easier to debug
Cons: HTML-only, client-side only, doesn’t affect PDF/docx
Decision Matrix
| Approach | Complexity | Format Support | Use Case |
|---|---|---|---|
| Lua Filters | High | All formats | AST manipulation, cross-format transformations |
| Shortcodes | Low | All formats | Simple content injection |
| Templates | Medium | Format-specific | Document structure customization |
| Preprocessing | Medium | All formats | Text-based transformations |
| Extensions | Low-Medium | All formats | Packaging related customizations |
| Post-processing | Medium | Format-specific | Output-specific modifications |
| CSS/JS | Low | HTML only | Visual enhancements |
Recommendation
Start with the simplest approach that solves your problem:
- For simple content: Use shortcodes
- For layout: Use templates or CSS
- For cross-format transformations: Use Lua filters
- For format-specific output: Post-process with scripts
- For packaging: Create Quarto extensions
Only use Lua filters when:
- You need to transform document structure (AST)
- Changes must work across multiple output formats
- Other approaches are too limited or fragile
📜 Version History
- 1.0.0 (January 8, 2026): Initial article creation
- 1.1.0 (January 8, 2026): Major content expansion - Added Pandoc extensions, performance considerations, common patterns, debugging techniques, error handling, filter coordination, Quarto extensions section, and alternatives appendix. Expanded references with 8 new authoritative sources.