Template Strings
2024-02-17Kudos!
This page got review from several people! Please check them out as you are able.- Joe LaFreniere
- Sam Grayson
Template strings make The Wrong Thing™ easier.
Disclosure: this post was written with considerable help from a neural network
More specifically, I rambled into a mic while using Speech Note in combination with Faster Whisper to turn my ramblings into text, which I then edited with a text editor.
Background
So let’s back up a bit – what do I mean by a template string? Template strings are a feature found in JavaScript and maybe soon Java that look like this:
Hang on… that’s just a multiline string literal. Python1, Ruby, and Java (non exhaustive) have those, and it’s not what I’m here to complain about.
No wait, this is string interpolation, and it’s basically a nicer version of sprintf
.
I don’t want to complain about that!
const {track_name} = sql`
select track_name from tracklist
where album = ${album_id}
and disk_number = ${disk_number}
and track_number = ${track_number}
`.single();
Ahhhhh here we go.
Notice the leading sql
in front of the backticks that make up the “body” of the template2.
This prefix takes the components of the template (the separate string and expression components) and here presumably turns them into some moral equivalent of a prepared statement, which is tacitly executed against the database.
This right here is the issue: Template strings don’t just take data and turn them into strings (which in the case of sql is an actively bad idea), they do some kind of processing based on a separation of the string literal segmants and the dynamic data segments.
Lossy Conversions
I understand why it seems immediately appealing, but let’s take a second to break down what exactly is happening here when you type a string into this template literal. You’re taking a structured representation of data in your head, in this case a SQL query, then turning it into a string (or bunch of strings), and then at runtime, the template will parse the strings into a more structured form, bundle it with the supplied data, and return the result.
Ideally your template processing step does some kind of parsing and validation on the string itself, which means it needs to parse and/or validate the whole string before you start executing whatever the thing you need to execute is. When one parses and validates a string, the string’d form is almost always turned into an abstract form of some kind, and when the template then executes or otherwise transforms this abstracted data, it has done 80% of of the work of having hosted a programing language inside the string literals of another programming language.
For most structured data that gets put into template strings (including but not limited to HTML and SQL), I think that’s really bad. It’s bad to take data that is fundamentally structured, and could be entered in a way that’s idiomatic to the host language and replace that with embedding a programming language in a string literal. We don’t by default embed data like arrays or simple structs/records/maps in strings if we can help it3, but template string literals encourage it by their nature.
Contratulations! You’ve Just Lost Your Host Language
When you’re embedding a language inside your string literals, you’ve now given up all the advantages that you would ordinarily have with the tooling of the language that you’re usually working with. If you’re writing Java or TypeScript, now you don’t have a type system to lean back on, but even if you don’t usually program with static analysis assistance, you lose autocomplete, go-to-definition, and syntax highlighting when operating within your template strings4.
If you’re you’re extremely fortunate, your library might have an ESLint plugin that will parse the insides of your template literals, or IntelliJ might already be parsing and linting regular string literals annotated by the library in question (such as with Hibernate).
This is very useful, and now means that ESLint and IntelliJ have multiple parsers designed to operate inside a single Java[Script]
source file: a parser for the usual Java[Script], and a conditional parser that needs to figure out if this particular [template] string literal is actually GraphQL, SQL, HTML, or if it’s actually supposed to just be plain text.
If the tools for creating the AST the template function eventually operates on were made easier to access, the values of the AST could just leverage the host language’s tooling for validation and be manipulated as regular ordinary expressions.
All these arguments apply almost equally well to general-purpose embedded template languages, like Jinja, ERB, or handlebars. At time of writing, this blog is generated with Jekyll, which uses a templating language called Liquid for any things that need to be templated (which, because of some of my own silly design include HTML files and at least one JSON file). Some of the pain points of templating strings apply just as much to templates – especially tooling ones. It is virtually impossible to use schema validation tools against e. g. a Liquid template of an XML or ASCII Protobuf file, and your tooling almost certainly doesn’t enjoy working with it.
You’ve Lost Your Embedded Language Too
Maybe, though, that’s not a persuasive argument to you. The most common case I’ve seen for string templates is for already existing languages, especially HTML and SQL rather than some invented language created to operate as a special dsl. In a case like that, the reason someone would adopt a template string is to lower the barrier between the thing that you’re typing in and the thing that is going to end up getting generated. While I certainly think it’s useful to be able to transparently write and see the language you’re going to generate, I still think this is a suboptimal solution to this problem, in part because this preexisting langauge probably already has good tools for writing it in its own source files. If there’s any amount of editor support for the language, you lose all of it by embedding it in a string, rather than in its own source file, and the more widely-adopted the language, the more likely it is to have good tools. SQL is an obvious example of a langauge that’s a prime candidate for template-stringification, but also benefits from being iterated on from a tool that understands SQL and is connected to your database. Allowing it to be written in its own file means not giving up on that by default.
You Wouldn’t Do This to JSON
But maybe this line of argument doesn’t appeal to you – after all, with a template string you get the benefits of writing your html/sql/latex5 inline without the security risks of direct string interpolation, or the inconvenience of learning a generator library or DSL. If you’re writing a little HTML, it seems really straightforward as to why you’d want to just write the HTML instead of fumbling with some some sort of possibly awful wrapper API, hoping it that turns into the HTML you want. This rationale strongly resonates with me, but I still don’t find myself fully persuaded.
Would you do this with JSON6? When writing your JavaScript or Python or whatever would you ever prefer this:
const json_str = jsonStringify`[{"key1": ${num1}, "key2": 3}]`
to this?
const json_str = JSON.stringify([{key1: num1, key2: 3}]);
I certainly wouldn’t! Even if the variable data is sparse, when there’s such an obvious correspondence between builtin data structures7, the first way has so limited a value add that no one would ever think to do it. If the correspondence between the AST of these would-be embedded languages and “normal” data structures and values were made more obvious or available, I think people would find very little incentive to reach for template strings.
Host Language Alternatives And Examples
DSLs
I’m aware of a handful of examples of libraries that allow embdding a separate programming language’s semantics layered on top of a host language in the form of a DSL8.
The Phlex library in Ruby is used to generate HTML, and uses Ruby’s block and method syntax to mimic some of the shape of HTML istelf.
Because Phlex uses stateful methods to represent its html generation, you unfortunately can’t manipulate parts of the html construction as variables – instead your primary means of abstraction is through function nesting and higher order methods. An example of an HTML DSL that is value oriented rather than method oriented (which I personally prefer) is JSX.
JSX kinda feels like cheating here, because it’s a language extension to the JavaScript that has merely become so pervasive, existing tooling has to accomodate it. That said, I appreciate JSX as a good solution to the language-embedding problem because your JSX expressions are JavaScript expressions and can be generated and manipulated like any another value inside the JSX-JavaScript language.
One other large example of DSLs in programming languages is in-language query builders.
Libraries like SQLAlchemy Core for Python, LINQ in C#, Ecto in Elixir, or LHJOOQ in Java are often reasonable choice for solving the language embdding problem.
In a type safe language the construction of the query is checked by your type checker, which might be able to significantly cut down on edit-query/rerun-application cycles. Even if you’re operating in a dynamic language, you still have the benefit of syntax highlighting, basic parsing validation, brace-balancing, and whatever other tooling support the host language usually has.
Hiccup-Likes
I’ve pulled a lot of these ideas from my outside view of the Clojure community, and especially from doing a little bit with Hiccup in Clojure and CLJS. Hiccup lets you describe your HTML like this:
In Clojure, hiccup and hiccup-like libraries are generally implemented as compile-time macros, but there’s no reason that other languages couldn’t implement similar libraries as functions, with (IMO) minimal loss of benefit. Hiccup has been especially inspirational within the clojure world, with similar ideas for SQL present in HoneySql, and libraries for hiccup-like DSLs for html have spread to clojure-family lisps like Janet and Fennel. In fairness, hiccup isn’t itself a wholly new idea, and similar ideas have existed in older lisps like SXML for Scheme, but I first came into contact with this style of library via hiccup, as I think have many people, so I’m calling them hiccup-like.
One of the things that’s very nice about hiccup-likes is that they not only embody a reasonably compact representation of the underlying language, they also have the benefit that every clojure vector/symbol/map you put into the tree is the same clojure vector/symbol/map you always work with and already know how to manipulate.
This has the significant downside that by encoding a different language inside the host language, you inherit significant chunks of the host language’s problems.
Incorporating Embedded Languages Better
One last cool thing of the Clojure ecosystem is the library called HugSQL. An often lauded feature of Clojure (and many other lisps) are their macros – user defineable code snippets that change the syntax of the language at compile time. HugSQL’s macros let you annotate a SQL source file and generate callable Clojure functions from the SQL.
Wowee, look at that! SQL in the SQL files, Clojure in the Clojure files, and all of my impedence sufficiently matched: no DSL required.
This isn’t a capability a lot of languages have – at least not at without non-standard tooling.
While you could write a tool that generates JavaScript, or a compiler plugin to generate Java, doing so often puts you out of reach of the rest of the language’s standard tools (type checkers, language servers), and it’s just a relatively unusual thing to do10.
There are languages without arbitrary macros where code generation programs are more normalized – go generate
has been a part of golang since at least 2014, and most C build systems and related tooling have to be able to accomodate code generators because of the pervasiveness of tools like yacc/bison, which were themselves enabled by the still-often-used make build system.
I would probably shy away from something like this if it’s not hygenic in your particular host language, but if it is, go nuts11.
Polyglot Tooling
You might be thinking, “If all your issues are about tooling inadequacy why don’t we just improve the tooling support for template strings?” My response is mostly that we’re not there yet, and it seems like a sufficiently difficult problem space that we might never be there.
There’s more than a few sub-languages that would benefit from this – GraphQL, and SQL come readily to mind, and some more general or configuration languages might benefit by being able to support sublanguage tooling when editing them, like shell (awk, bc) or YAML.
That said, I think you run into issues pretty quickly. They might not be insurmountable – heaven knows any web language tool ends up being a tool for all the other web languages – but in my nonexpert opinion, the problems are twofold:
-
Firstly, scaling polyglot tooling is at the moment a MxN problem, where M is the number of possible host languages, and N is the number of possible embedded languages. There are generally fewer of the embedded languages, so it’s still mostly manageable for e. g. the JetBrains IDEs to do pretty well, but they have some visible limitations because…
-
Determining which language you’re embedding from string literals seems likely to be a bothersome proposition. In a hypothetical future where both JSON and EDN were a commonly embedded language, determining if
{"key" "value"}
is a nonidiomatic edn map or if it’s invalid JSON isn’t a determinable problem without either a lot of context sensitive parsing or the developer just telling the tools on a case by case basis. This is a contrived example and I don’t have the expertise to make a stronger case than this, but I suspect that not only is this hard, it’s hard in ways that make errors seem nondeterministic and we’re better off pursuing strategies with more obvious failure modes.
Conclusion
I hate writing conclusions.
I don’t like dedicated template languages much, and template strings are just those but slightly better integrated. Either write the host language or write the embedded language. Don’t put important logic in string literals.
Footnotes
-
Python definitely has these, they’re just not set off as a distinct language feature in the language reference anywhere. They’re spelled as
'''text'''
or"""text"""
. If you can read EBNF, you can see it in the string literal parsing section as “longstring”. This surprised me! For a long time I thought they were called “docstrings”, because they’re the most common string literal used in docstrings, but it seems like they don’t have a dedicated name. ↩ -
MDN (and I assume the ECMAScript standard) refers to this feature as a Tagged Template, but the feature as a whole is known as a Template Literal. The (preview) Java feature has a similar preprocessing mechanism without referring to a preprocessed expression as a different thing. Since the other parts of the feature aren’t very interesting to me, I’m going to refer to the processing feature as a “Template” or as “Template Strings” when criticizing it. ↩
-
Oh, hi there shell. Please go away. ↩
-
My favorite way to syntax-highlight SQL is entirely in string-literal color (white). ↩
-
I haven’t seen anyone do this, but presumably you could do something scary by combining arbitrary latex in your templates with some kind of CAS or other complicated math tool. Please nobody implement this. ↩
-
While JSON is a data language, not a markup or query language, ultimately it operates on similar principles as other forms of commonly-embdedded languages ike HTML and SQL (and more proximally XML). ↩
-
Admittedly, this argument doesn’t work as well in Java which doesn’t support heterogeneous arbitrary data encoded as lists and maps nearly as easily. That said, if your only goal is to turn this data into a JSON string, you might be able to get away with using lots of
List.of
andMap.of
(tacitly decaying all the generics toObject
), and then punting the final expression to Jackson or something like it. I have to admit (with shame) that I have myself encoded JSON inside strings that I then called.formatted
on in Java, but only for unit tests with trusted/known outputs. ↩ -
I thought about trying to either define or justify using the term “DSL”, but I’m not sure I could define it, or that it refers to a single coherent idea within the collective programmer unconscious. ↩
-
Fun Fact! When trying to build this page after first writing this example, I got a build error, because apparently the rendered markdown is fed directly into liquid, which assumed this was part of a template and needed the rest of the conditional. This feels relevant somehow but I don’t care to do the work to figure out how. ↩
-
But not totally unheard of. Things like Protobuf end up generating code based on schema files in multiple languages (including Java), and the Apollo GraphQL client for Typescript does vastly improved typechecking if you can stomach its code generator. Neither of these is terribly pleasant to work with, but it’s often better than nothing. ↩
-
I was going to make a joke somewhere in here about people being able to use Rust macros or generalized C++ metaprogramming (!) to do hugsql like things, with the punchline being “but please don’t” but uh… they already are and honestly, it’s an easy punchline that I don’t think either language really deserves. I’d consider using libraries like that if I were working on something with a sql database in either language and had a pretty strong sense of ownership over the project. ↩