Problem
Macros are compiler extensions (really, mini compilers themselves) that translate from a Clojure-like syntax (s-expressions) into Clojure syntax (s-expressions). Most macros contains aspects of these stages:
- Parsing and validating the input syntax structure
- Transforming the structure
- Outputting new structure (often by quoting or syntax quoting new forms)
The first step is effectively a hand-written parser. This leads to the following problems with macro definitions:
- Validation is usually incomplete, checking only for the most common syntactic errors
- Error messages occur inside the macro rather than inside the user's original syntax
- Syntax documentation is hand-written and can thus be at odds with the parser
Due to the above problems, macros are often hard to write and hard to maintain, especially when more attention is paid to validation and error messages.
Proposal
Instead, define a standard grammar to describe the input syntax for a grammar and a generic parser facility that can validate the input according to the grammar. In the case of an error, the parser should automatically produce an error message describing what was provided as input, what the grammar expected, and what was found instead.
The facility provided for macro writers should be similar in spirit to the existing destructure macro - that is, given an input form and a grammar: produce a binding set. In the case of a failure, this function should provide either an error message and/or data describing where the parse failed.
What do grammars need to describe?
Grammars are conceptually similar to a regex. Like regexes, the grammar should describe both the macro structure to parse ("is this valid?") and also capture values as the parsing occurs for later use.
Additionally, we wish to produce (as automatically as possible) informative and useful error messages. Some extra information may need to be provided to assist.
- Terminals - keywords, symbols, strings (literal or regex?), characters, numbers, booleans, vectors, sets, maps, lists, sequentials
- Composition
- concatenation
- alternation
- optional (0 or 1)
- one or more
- zero or more
- repeat - fixed number in any order
- negative lookahead (NOT this branch)
- Capture
- mark beginnning
- mark ending
- create new binding
- update existing binding
- Error message production
- user-friendly rule names (for use in errors)
- custom validation and errors (for when that's needed)
Possible parsers
Questions to consider:
- What are the dependency constraints?
- What are the parsing capabilities?
- What are the capture capabilities?
- What are the error messages like when errors occur? Are there ways to customize?
- Are grammars composable? Nameable?
- Performance? (not a key driver, but worst case may be important)
Parsers:
error-test | seqex | Instaparse | |
---|---|---|---|
Deps | no external c.string/replace c.set/union | no external c.string/join c.set/union, difference, intersection, subset? | no external c.string/replace, join |
Parsing | all above | all above | most of above, but in terms of string matching at the bottom, not Clojure forms |
Capture | Create or update binding set during parse | cap/recap %2B arbitrary functions to build custom capture | Produces AST, with some control for post-processing |
Error messages | Generates error based on expected vs found allows for custom names. | Generates error based on expected vs found, allows for custom names. | ? |
Composable | Yes, can build up rules. | Yes, can build up models. | ? |
Perf | ? | ? | ? |
Docs | ? | ? | ? |
Size | small | medium | medium (not all needed though) |
Where is it used? | Cursive |
Some others:
Error message considerations
- How do we tie macro errors back to the point where user provided incorrect input?
- Does the reader need to provide more information to produce good error location information?
Integration
New function: destructure-parse
- How early during core.clj bootstrap can we integrate destructure-parse?
Produce grammars for existing macros/macro-parts in Clojure core:
- destructuring
- parameters (with and without rest)
- declare
- def* body
- arities
- defn
- fn
- letfn
- local-binding
- for
- defmethod
- gen-class
- ns
- in-ns
- defprotocol
- deftype
- defclass
- reify
- extend-type, extend-protocol
- proxy
- definterface
- condp
- case
Grammar examples
seqex let example:
error-test example:
Example bugs that could be resolved by macro grammars
These are some open bugs that require additional validation in a core macro that could be caught automatically if the macros were defined with a grammar:
- http://dev.clojure.org/jira/browse/CLJ-1630 - destructuring
- http://dev.clojure.org/jira/browse/CLJ-5 - destructuring
- http://dev.clojure.org/jira/browse/CLJ-1473 - defn
- http://dev.clojure.org/jira/browse/CLJ-1629 - defn
- http://dev.clojure.org/jira/browse/CLJ-888 - defprotocol
- http://dev.clojure.org/jira/browse/CLJ-1029 - ns
- http://dev.clojure.org/jira/browse/CLJ-1149 - ns
Example defn problems:
Example ns problems:
Example let/destructuring problems:
References
- "Macros-by-Example" - Kohlbecker & Wand - ftp://www.cs.indiana.edu/pub/techreports/TR206.pdf
- "Fortifying Macros" - Culpepper & Felleisen - http://www.ccs.neu.edu/racket/pubs/icfp10-cf.pdf
- "Illuminated Macros" - Houser & Claggett - https://www.youtube.com/watch?v=o75g9ZRoLaw
- "Improving Clojure's Error Messages with Grammars" - Fleming - https://www.youtube.com/watch?v=kt4haSH2xcs
- "Compiler Errors for Humans" - Czaplicki - http://elm-lang.org/blog/compiler-errors-for-humans
Other possible uses
These are not primary goals of this effort, but perhaps lie nearby:
- Tool for printing the syntax of a macro
- Using the same syntax for documenting the grammar of *functions*
- Tool for "grammar expanding" a macro, seeing what gets matched, backtracking, etc