This is a blog post about technical considerations of markup languages — and I’m writing it on a site I built myself. If there’s an impulse more codified and vacuous than artists waxing poetic about the power of creativity, it’s programmers blogging about how their blogs work. In the years I’ve been erecting this monolith I’ve resisted that siren, but for once, I’ll indulge.
But the fact that I’ve resisted — the fact that such a post is out of character for this blog — points to a pertinent fact.
My background is in fiction writing, posting on sites like Wordpress, , and — more or less in that order.
These sites all have rich text editors, and working with them asks far less of you than fiddling with a programmatic conversion.
But of course, if you’ve used these sites from long enough, copy-pasting from platforms like Google Docs, you’ll know there are warts — AO3 especially is notorious for the spacing errors that crop up around italicized text when you paste.
However — and forgive me for speaking a bit superstitiously — there’s an kind of invisible ooze to using a
Do you know what you’re pasting when you’re pasting rich text? I have an old page on this site that exists for the express purpose of revealing what’s inside of rich text pastes.
To save you a click, here’s what a humble header, subtitle, and paragraph from a personal document results in: Hiding this behind a click to build anticipation. But mostly because it’s a big fat wall of text.<h1 dir="ltr" style="line-height:1.7999999999999998;text-align: center;margin-top:20pt;margin-bottom:6pt;" id="docs-internal-guid-fffb5069-7fff-2c13-5320-771304d40c62"><span style="font-size:26pt;font-family:'Times New Roman',serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">A Chimerical Hope</span></h1><p dir="ltr" style="line-height:1.7999999999999998;text-align: center;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:12pt;font-family:'Times New Roman',serif;color:#000000;background-color:transparent;font-weight:400;font-style:italic;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">No predators nor parasites.</span></p><br><p dir="ltr" style="line-height:1.7999999999999998;margin-top:0pt;margin-bottom:0pt;"><span style="font-size:12pt;font-family:'Times New Roman',serif;color:#000000;background-color:transparent;font-weight:400;font-style:normal;font-variant:normal;text-decoration:none;vertical-align:baseline;white-space:pre;white-space:pre-wrap;">The dream we share is simple: no traitors, no masters, only bugs united.&nbsp; Lucidity is a dreamer’s sacrifice to reality, and our sacrifice is the vesperbane.&nbsp; A vile transformation through blood and black nerve, the vespertine arts grant a mantis power and control beyond this world.&nbsp; Only vesperbanes can give our grand dreams breath — but all vesperbanes are shadowed by an incorrigible potential to instead become oppressors and defectors.</span></p>
That’s HTML. A lot of it, in its most gruesome form.
That goop is absolutely lousy with hardcoded inline styles and cryptic identifiers. And sure, that’s ugly, but admittedly aesthetics is a weak critique when the point is not looking at it — rather, what really bothers me here is that’s it messy. I don’t mean that as a synonym: Those styles specify fonts and arcane rules like white-space:pre-wrap. The line-height is defined to sixteen (16!)
More pointedly, part of why this HTML is so verbose is that it’s pretty much specifying the exact appearance of every span. It’s practically tailor-made for setting up a situation where one letter is jarringly, inexplicably, accidentally in the wrong font.
The workflow of editing text sees you inevitably moving bits around, italicizing this then unitalicizing that. You’ll soon wind up in situations where, say, a period or whitespace character retains a palipsest of the styles that once graced its adjacent. Doesn’t the idea of that drive you mad?
— suckerpinchendnote
But the fact that all “rich text” is is just HTML wearing a fancy hat means that all of the sites I listed off at the start (and plenty more) accept raw HTML input in addition to rich text pastes.
But even when it isn’t riddled with the unsightly warts of rich text,
And with that this prelude ends: enter markdown. Markdown is an elegant, compelling dream: readable plaintext with all the expressive power of HTML. Two decades removed from its inception, it has calcified into a formally specified format, but the pitch, the design sensibility, beckons forth.
The idea is that a Markdown-formatted document should be publishable as-is, as plain text, without looking like it’s been marked up with tags or formatting instructions.
But I’ve misled you, really. I frankly don’t care much markdown’s raw publishability, except insofar as that design coincides with ease to edit — and I mean edit in the oldest sense, proofreading prose rather than fiddling with code — because again, I write stories.
Between 2019–2023,
The thing about posting on a site like Wordpress or Royalroad, is that it’s putting yourself in a box — the platform limits what you can do. The rich text format itself is a prison, locking you in a broadly-interopable subset of HTML. (Perhaps a better, if no less cliché metaphor, is that it’s a road. And those roads see a lot of traffic — it more than suffices for the efforts of most people most of the time — but it won’t take you everywhere.)
But I’m ambitious and clever. I want to push the boundaries of my medium — to portray notions novel and alien. Even back in my wordpress days, you can find places where I reached for text shadows to portray strange cadences, vibes that italics nor bold quite suffice to evoke.
When I finally made the switch to neocities, after having written my own stylesheets,
It was posted on AO3, where the first chapter merely having code blocks was enough to dazzle fanfiction readers, but by chapter eight, I’m formatting text in two columns to represent a forked train of thought, and by chapter fourteen?
::::: reflection
Open your eyes. Find my gaze. *I want you to see this.*
:::::
::::: reflection
Let me [ [\_\_\_\_\_\_\_\_]{.invisible}[look.]{} [open.]{} [access.]{}
[control.]{}]{.overlap-container} *Let me exist.* [Or stare into death
and be still.]{.quiet}
:::::Woah! That’s not in the markdown spec. What’s going on here?
pandoc(1) is what happened. It extends markdown with a number of useful features — the most flexible by far are fenced divs and bracketed spans. Markdown has always allowed you to embed inline HTML, but it’s ugly — and remember, the point of using markdown was that I wished for something more concise than writing quirked up XML.
pandoc is, make no mistake, an amazing achievement. while I believe Markdown is what it’s for at heart
But the real superpower of pandoc is that, much in the way switching to neocities escapes the prison-roads of locked down platforms, switching to pandoc escapes at once the restrictions of both rich text and standard markdown.
Even writing raw HTML isn’t as powerful as what pandoc lets you do, because pandoc has a convenient interface for reading and writing the
One of this year’s creations I’m proudest of (at least on technical merit) in the glossary for my new setting. And structurally, it’s just a big definition list.<dl> is one of those HTML elements that doesn’t get enough love)endnote
The code to make this happen was only about a hundred lines of Lua,
)endnote
If that isn’t a testiment to the power of this software, I don’t know what would be more convincing. Unfortunately, that may be the last kind word I speak about pandoc in this post. As I continue I shall descend into the tenor of a rant.
This will become even clearer in later sections, but suffice it to say, I’ve become something of an unwilling wizard of pandoc’s guts. I wouldn’t call myself an expert — I’ve yet to even broadly familiarize myself with how readers and writers really work! — but I have written thousands of lines of Lua code to interact with its filtering interface. I know how the AST ticks, and I know far more than I’d like about some very specific edge cases and limitations.
And the longer I remain victim to using it, the more I begin to have ideas.
serpent.lua — a monster hatches
But first, an anecdote. I do not use any off the shelf state site generator for this blog. When, midway through 2024,
Incidentally, if you’re an aspiring indie web creator, I would strongly recommend starting with one if you have any intention of crafting a site with extensive indexing and interlinking.
I first tried out Hugo, easily one of the most popular SSGs so quick to come up in cursory searches, yet I swiftly encountered a blogpost persuasively arguing that it (and most major static site generators) kinda suck, actually!
This blogger name-checked the UNIX philosophy and overall appealed my sensibilities, so I decided to abort my efforts to switch to Hugo, and tried out their solution instead: soupault. The design was appealingly elegant, and it was practically a bespoke fit for the exact circumstances I had found myself in: migrating over a large, existing site.
Soupault brands itself as not a static site generator, and one of its distinctions is that it operates seamlessly on precompiled pages — it’d work just fine with the HTML files I already had, or it could convert new markdown files. It’s all so admirably flexible.
I quickly ran into some warts and deficiencies in the documentation, but this is an amateur software project, you have to be forgiving of these things. Overall, it’s mostly usable if you look past occasional typos, misleading pseudocode, a few cases where documented features are straight up wrong.site_index, the table of index data for every page in the site, but the docs claim index_entry will contain the index data for the current page. And it didnt’t. It was always nil.endnote
But I could cope and work around, none of this was a dealbreaker for me.
Except this software intentionally refuses to acknowledge a pretty basic concept.
“Hey soupault, do you think maybe we shouldn’t waste time creating pages that already exist and whose source files haven’t changed since last we created them? Y’know, the thing we figured out with Makefiles in the 80s?”
And its answer? A take unironically, unabashedly in the vein of “640K is enough for anyone.”
But this was… a bit of an issue for me, given that my site is 700MB.
See, here’s how soupault works: it dutifully copies and processes the whole thing from source to prod every single time you run it.
Still, most of my site’s weight is music, and most of the remainder is images, so alright, what if I just excluded the multimedia from the source directory?
This already defeats a major purpose of doing all of this, because programatically generating thumbnails and song pages was something I wanted a static site generator for,
Except nope! Even when it’s just the text, it still took 10 seconds to build them. I simply cannot live like this.
I was so exhausted and irritated after spending all day on this dead end that I said, you know what? Fuck it. I’ll just write some Makefiles and shell scripts to put together my site.
And that’s how Equestria was made! As of writing this blogpost in early 2026, that’s still fundamentally my infrastructure, which I informally call the serpent’s den.
The reason I recount this experience is it’s an illustrative characterization. There are hundreds of static site generators out there, because rolling your own is a fundamentally easy programming problem.
I recount this anecdote to suggest an analogy. When I lay it out like this,
But it has been observed to the point of dead metaphor — a camel can only bear so many straws.
What burdens have I borne?
This is Why We Fight
Every advanced user probably has little gripes with their markdown engine. Heck, I have gripes with markdown itself.
I don’t like the four-space rule that turns indentation into a codeblock-landmine. I don’t like setext headers.=== or --- underneath lines to make them headers, instead of a number of #.endnote
I don’t like that pandoc collapses two space characters into one pandoc.Space node — because I learned how to touch type from gtypist(1) and spent most my teenage and young adult years editing text in emacs(1), so I reflexively put two spaces after each sentence — I go so far as to contend this this is superior, aiding in visual parsing.
In fact, if you check the original home of my first web serial, you’ll find the fruits of that determination — there are unicode no-break space characters after most sentences, because I had put a sed(1) preprocessing filter before I piped my markup
I also used that sed filter to replace intended paragraphs of my source files (/\n {4,}/
All told, I’d gone to some quite extreme lengths with my typography.
The intricacies of what I do with em dashes does ultimately exemplify the strengths of pandoc’s filters. See, the way Lua filters work is simple — you hand it a file or Lua table (but I repeat myself) with fields for the AST nodes you care about.
For instance, if you have Emph = fun, then fun gets called on each instance of emphasized text and that element is replaced by whatever it returns.
As established above, I am an experimental writer, striving to do a great many creative things with formatting. In Black Nerve, you have anime-esque ⸢Spell Names⸥ and spooky vesper communion; while in Hostile Takeover, I used guillemets for «shortwave radio broadcasts», and occasionally spiced it up with
And those are the just ones that are easy to list off and demonstrate inline in a single sweeping sentence-gesture. I’ve written back-and-forth text message, and hell, my footnotes are so fluently integrated into my writing process that even while sitting here trying to think of examples, it took until now to remember how much monkeying about it had taken to get those working. Another trick you’ve already seen is my use of details disclosure elements that you must click to expand — I love that element.
Markdown is older than the <details> element, so there’s no syntax for it. But by design, you’re allowed to insert inline HTML, so it’s no real loss. If you search up how to do details in markdown, this is what stackoverflow tells you to do. Except pandoc doesn’t generally parse markdown inside HTML elements,
Another common trick, in both fiction and blogging, is wanting to centered a bit of text. Easy enough to add a .center { text-align: center} rule to your CSS and class="center" to your HTML — but in the case of pandoc, that means eclosing a passage in ::: center and ::: on their own lines.
Except if what you want to center is very brief, a single word, then these formatting directives quite possibly take up more space than the text they’re formatting.
But again, filters can automate this! Omit the verbose formatting, and just include enough for your code to know where to put the div with the .center class
But… reflect on what we’re doing here. Is there anything niggling at you?
Let’s look at it more carefully. Imagine you’re me, writing spooky murder drones pings. You might start with *Prey! Hunt! Devour!* — oblique text is pretty standard for psychic transmissions, though that’s not quite what this is.
Now, I’m adept with CSS, if I wanted to throw some yellow text shadow then it would make sense to do [Prey! Hunt! Devour!]{.murder-ping}, then add a CSS rule for making class="murder-ping" look pretty.
We can also omit the asterisks and make one of its rules font-style: italic — but if we care about people using screenreaders or reader mode or really stripped down browsers, maybe we want there to be an <em> element in our semantic HTML, so we can have our filter output an Emph node.
And again, if we’re thinking about how the unstyled text looks, it makes sense to have the filter also slap lil’ «guillemets» around them — I’ve always felt italics alone are badly overloaded in prose fiction.
I’m working through this example step by step because I think it’s worth respecting how natural each step of this progression feels.
But there’s a distinction to discover here. When a chapter has a line that says [05:50]{.timebreak .div} one of those classes is an actual class to evoke real CSS rules, and one of them is a fake class that exists to get caught in the filter.
The logical endpoint of this is the .rot13 class, which has no bearing on how the text is displayed — it’s purely an instruction to the filter.
And that’s the core insight I’ve been circling around. In web design we applaud the principle of the separation of style and content. HTML ought to specify the content, and not care about how it looks; that’s CSS’s job.
I think markup could benefit from a principle distinguishing style and procedure. Marking “This is a foo block.” vs “Postprocess this with the foo() function.” — because that’s what pandoc’s Lua filters devolve into it, ultimately. So many function calls with extra steps.
In short, a design that centralizes text-filtering functions as part of document structure could have valuable ergonomics.
On Weaving Cocoons
If you aren’t familiar with my work, then when I said I loved the details disclosure element, or that I’m experimental writer doing creative things, you could have brushed it off as a cute yet idle exclamation or an otherwise meaningless remark.
If you aren’t familiar, then gaze upon Weave Me Another Cocoon and let its depths ensnare you.
Let’s cover some background. Telescopic text is not new — it’s been here since 2010. I first encountered it when an acquaintance linked Nutshell right after seeing a demo of how my own footnotes work, though by that point I had already built up a lot of mental underbrush around the idea.
When I left for my walk that evening, that underbrush caught fire and remained ablaze for one feverish week. The result was WMAC. The nature of the story’s structure is that I had already “completed” it on that very first day, just a few hundred words in, albeit being more poem than narrative. But poems are not exempt from the same privations as prose, and the lack of grounding left that first draft feeling vague, unmoored.
I spent the rest of the week proving its narrative theorems, so to speak — but that was only half of what took up so much of my time. While I certainly have more to say about the narrative craft that makes WMAC tick, it remains a unique project in how it was as much a product of code as ink.
The basic idea of WMAC is rather simple — you click some text, and its content is substituted for something else. Which itself may contain more clickable text, and so on ad nauseum.
How on earth do you implement that in markdown?
I’m not going to cover every quirk of WMAC’s implementation here.
And there’s cute details to cover like the way scene breaks are styled, or how a browser inconsistency
But some broad strokes are obvious from the word go. I mentioned there were details elements, and that tells the whole story as far as the HTML itself is concerned — I used CSS to make all the summary elements look like links, and once you figure that out, it’s not hard to open up CodePen and cook up your own equivalent demo.
But what did the markdown source file look like? Consider those first three clicks.
Enantiodromia.
That paradox.
That ensnaring paradox — metamorphosis.
Pasting the <details><summary> et al. boilerplate in front of them all is a complete nonstarter. Something like:
[[Enantiodromia]{.s}[[That]{.s}That ensnaring]{.d} [[paradox.]{.s}paradox --- metamorphosis.]{.d}]{.d}is only better, only slightly. Remember, we have to proofread this.
You can shave some characters if you assume the first child of a .d must be the .s. There’s a couple place in the text where multiple words are transformed at once, but most links are one-word. But that does little to avail the plight of recursion. If we really want to make this convenient to write, we need to invent some powerful syntax.
If you’re like me, you’ll immediately start skimming the relevant section of the pandoc manual looking for any syntax you can easily repurpose in a filter. [Nesting spans suck,]{class="gratuitous-demonstration" problem="the two different types of brackets create a lot of visual noise," solution="so a promising alternative is the humble footnote."}
Not the widespread markdown footnotessomething^[like this].
It’s a hack, but with:
^[Enantiodomia / ^[That / That ensnaring] ^[paradox / paradox---metaphosis.]It now feels like we might be getting somewhere. Those slashes are what I picked as a cleaner way to separate the before and after of the substition. Start writing like this, though, and you’ll soon discover some other patterns in “idiomatic” telescopic text crop up. It’s easy enough to extend the syntax to accomodate it.
Consider:
^[Sister... / One sister | says, / says to another:]This is nice syntactic sugar, which I used all over the place in the source document.
Likewise, something like:
shouldn't. / + She ^[hated / always +] it.This plus-sign syntax saves me the trouble of repeating words when I’m only appending/prepending more.
You're peerless. / I could never | compare. / lie. / lie to
save my life. & Is my flattery/distraction/dissimulation so
transparent? & I'll recuse myself, then.The ampersand syntax takes this to a further extreme, abstracting all three symbols otherwise called for when enclosing a | word / + like this.
If you’re really knowledgeable about pandoc’s syntax, you’ll notice that if our filter is designed to look for / symbols specificallypandoc.Str("/") nodes)endnote
But I digress. Part of why I digressed was to distract you. There’s so many details
See, pandoc doesn’t like what we’re doing at all. Try it yourself! Echo a (well-formed!) string like ^[^[^[^[^[ ^[^[^[^[^[g ]]]]] ]]]]] into pandoc and watch how it haaaaaangs. If you test it iteratively, the first few footnotes are instant. By the time you nest six footnotes there’s a noticeable pause that only gets nonlinearly worse. A few more and I don’t have the patience to see its output.
Especialy when that output is useless — I needed to write my own footnotes system for this site primarily because I’m doing something funky with checkboxes that’s pretty nonstandard,
What’s worse is the AST actually does support it, and the parsers even understand it. Note nodes are treated as inlines with inline children, so nested footnotes fit cleanly into the structure, but as soon as you try to render it as HTML or plain text, the output only shows the first level of footnotes. But it gets worse.
Our discussion has centered on how WMAC worked, and on paper, it’s a single tree. But I refuse to read or write scriptio continua, let alone when it’s this deeply nested in structure — so we want to separate it out into subtrees, right?
On paper, the standard markdown footnotes are perfect for this. [^paradox] should let use put the paradox subtree anywhere we want, right?
You can probably guess what goes wrong — I’ve already spoiled it — but here, it’s not a mere failure of the output: pandoc’s markdown will not even parse the nested footnote references as you’d expect, if it’s an inline footnote inside a reference footnote definition block. It just quietly eats them.
You need a workaround. I eventually settled on macro system where, say, the verbatim code flame gets replaced with the contents of a #flame element after processing. Again, I won’t describe all the internals, but I do want to underscore just how far afield I went — I made pandoc segfault!
Are you beginning to sympathize with why I’m tempted to create my own markdown engine? I am pushing the limits of pandoc; my code needs to babysit its oversights, its algorithmic shortcomings, and its outright bugs.
In the end, telescope.lua is only around 159 lines of Lua.
izanagi.lua is 1659 lines.
I have tried to keep just what izanagi is somewhat under wraps.
Izanagi never produced an actual story that made use of its tech
And now…
Nothing beside remains ’round the decay of that collosal wreck.
The Straw the Strains
There’s one last strand worth tracing. I am a writer — an amateur webfiction writer. This comes with it an unavoidable fact: I make mistakes. Typos, errors, infelicities. I’m pretty decent at catching them — years of writing and painstaking editing passes drills that into you — but I’ll never be perfect. I have a few friends that (usually, hopefully) look at the things I write and tell me when I’ve made a mistake.
Now, the way this normally goes in the worlds of webfiction and fanfiction is that people use Google Docs, and they hand their beta readers and editors a link with feedback enabled, and comments can be left inline, with suggested edits committed with a click.
I write text files on my own computer. I don’t like Google Docs nor WYSIWYG word processing as a user experience.
As a result, the way typos are reported to me is rather primitive. People will quote enough text to probably disambiguate where a typo is located and DM it to me. I then have to search my source copy for the string and type the change myself. This sucks, and this really sucks if there’s formatting that gets in the way of a naïve string search find it.
And for something like WMAC? Forget about it!
What alternative is there, though? Even if I succumbed to using GDocs, it hardly avails me of needing to manually port changes to the source doc. Perhaps if I send the raw .md files to people and got back an honest-to-god diff patch, it could work — but so many of the people I work with are simply not that technically proficient.
My site doesn’t use javascript, but I do have some skill with it. And marvelously, HTML has an element attribute called contenteditable which is a shortcut for the work it’d to make a somewhat functional rich text editor. Could I write a script that, if you load one of my posts with ?editing=true in the url, goes through and makes the article’s text editable, then programmatically diffs the HTML into a format that I can work with?
Now, in all fairness, the amount of work this would save me is somewhat marginal,
But around about here we find the rub. If we want to produce useful typo reports, how do you associate HTML paragraphs with the markdown that produced it?
Experienced programmers will quickly recognize that this is an old problem is matters of compilation — what we need is to add debugging symbols to the HTML and create a sourcemap.
Can we do this? The answer, as you probably guess, is no. pandoc’s AST does not expose line number information. One more reason to wish for a better engine.
Except, wait a minute. This problem, unlike all the others, presents a unique opportunity. Because we can work around it, in a novel way. For other problems, if I wished to fix it, I’d have to write my own parser from the ground up, or do something really arcane with filters that ultimately makes me more dependent on pandoc, not less.
But for this? There no amount of filter magic that can recover source information that is already lost during the parsing of the document. No, but I could preprocess the file. Go through and insert invisible spans that note which line numbers correspond to which elements. Parsing paragraphs is less work than parsing the whole elephant, but because it is parsing, it gets a foot in the door. Once you’re parsing paragraphs, not that hard to spot codeblocks and fenced divs. Lists and block quotes are a bit harder, but maybe…
And that, finally, is what this year started me down the road to writing my own markdown parser.
Let’s pick up that thread we started on — how hard could coding a markdown engine be?
Give the CommonMark Spec a look — it’s a clear and engaging read. It’s also maddening. I dare you to read to the section on block quotes without viewing this whole endeavor as a testament to the folly of man.
Block quotes, incidentally, are where my two-day adventure into writing a markdown engine ran aground — I was trying to parse the whole thing in one pass with the help of the LPEG library, but that seems doomed or intractable. But I might be able to hack it if I go back with a more sedate approach of parsing the syntax in steps.
But I digress and digress.
The Squigglemark Wishlist
All of the was preamble and contextualization. I felt it illustrates the nature of my frustration more vividly than if I had just sketched out what I wanted to from the outset.
What is SquiggleMark? Nothing but vaporware, really. A wishlist, things I’d want markdown to have and might be able to implement this year, if I really felt the push.
And that’s part of why I even wrote this big rant. Is it worth it? Some of the arguments and anecdotes I recounted here I had almost forgotten about until this journalling exercise brought them back to mind.
So that’s why I’m writing this blog post. I want to lay out all of my reasonings and desirata for SquiggleMark to more cogently evaluate if it’s something that’s worth my time. SquiggleMark Wishlist Let’s take it from the top, one more time:~~~ seems pretty squiggly!endnote[]{}~foo for calling a function called foo on an empty span. (maybe even []~f for raw text?) This could enable piping — []{}~foo~bar~baz — but that makes me wonder if and how you could branch. Could you compose and split squiggle-filter output syntactically? Should you?endnote<details> support-# summary would begins a details block, and -# on its own would end it.endnote__underlining__||spoilers||^superscript^, ~~strikeout~~, and ~subscripts~ work in pandoc out of the box, so I’d want to port over those features, but that’s less exciting. The last two are potential vexations given the proposed ‘squiggle function’ syntax, though.endnote>>> to begin a quote block, <<< to end it. Much more pleasant than littering >s everywhere in a long quotation.endnote[]{"Quotes in curly brackets syntax to set title attributes"}
I think there’s some ideas I’m missing, but now if I remember I’ll have a convenient place to note it down. This blogpost was mostly for me, but if you found it interesting enough to read to end, I’m flattered.