Skip to end of metadata
Go to start of metadata

Problem

The AST nodes produced by the ClojureScript analyzer have a :children key to allow generic traversals. Unfortunately, this key is currently a denormalized view of child ast nodes.

For example, an :if node has :test, :then, and :else keys whose values are also contained in :children. This presents some difficulty when writing AST transformations because every transform function must be aware of this duplication. To correctly replace the :test of an :if node, you must also replace the first element of the :children vector.

Discussion

Design

  • Solution must retain "AST is just data" property. ie No API should be required to interpret it
  • Idea: store :children as keys into the AST map
    • would be nice: (defn children [ast] ((apply juxt (:children ast)) ast))
      • but need to handle implicit do blocks: (defn children [ast](mapcat #(if (coll? %) % [%]) ((apply juxt (:children ast)) ast)))
    • ticket: http://dev.clojure.org/jira/browse/CLJS-289
    • outstanding issue: interleaved bindings
  • Alternate idea: all child nodes are moved from the top level into a map in :children
    • For "if" this would change {:op :if :env {env} :test test-expr :then then-expr :else else-expr :children [test-expr then-expr else-expr]} into {:op :if :env {env} :children {:test test-expr :then then-expr :else else-expr}} 
      • This idea doesn't address blocks or bindings.
      • dnolen: What about forms like let where :children is a bit complex?
    •  
      :children (into (vec (map :init bes))
                           (conj (vec statements) ret)) 
       
       
      acohen: I think that should become: {:op :let :children {:init [inits] :statements [statements] :ret ret}
       We might want to discuss whether the inits are truly a child here, at the moment I think they are though.
       
       
      dnolen: So if a :children value entry is a vector it's something you need to iterate over? What about :ret? So you'll always need to look at :children if you want to know :ret?
        
      acohen: Yes, if :children entry is a seq, it must be a seq of nodes. For :let, I forgot that this is something else I changed in CinC. I made bindings an :op.
      So it should actually look like:
      {:op :let 
       :children {:inits [{:op :binding :name x :children {:init init-expr}} 
                          ...] 
                  :statements [{...} ...] 
                  :ret ret}}
       
      dnolen: What about :ret? So you'll always need to look at :children if you want to know :ret?
       
      acohen: Yep, that's right. Tree traversal would _always_ go through :children, rather than having this separation of "syntax-aware walkers go through the nodes that they know the names of" and "generic walkers go through :children"

      bbloom: I think this approach suffers another issue: Generic walkers might need to know evaluation order. I should be able to post-walk the :childrens nodes to get a strict top-to-bottom, inside-out, left-to-right sequence of nodes, ignoring conditional and repeated evaluation. Otherwise, I need knowledge of nodes to tell if side effects from :init may happen before those from :statements

      acohen: Can we mandate that the children map be a map-like structure that preserves order?

      bbloom: I don't think that's reasonable unless core gets such a map-like structure.
       
Labels:
  1. Aug 29, 2012

    I'm interested in re-invigorating this discussion. I'm currently working on robust ANF/CPS transforms and running into various complexity caused by this. It's impossible for any AST transformer to know if a down-stream compiler phase will use :children or not. The first issue I hit was modifying :items in the :vector op. Turns out that the emit phase looks at :children instead of :items. Whoops! I've been resorting to building a form and then calling analyze on that.

    1. Mar 09, 2013

      Is there really enough utility in :children to build it into the AST spec? What's wrong with providing a `children` multimethod for people to use when needed?

      We obviously need to completely eliminate redundancies. Redundant :children simply makes all but the most trivial code transformations nearly impossible to perform correctly. I have no idea how I'd start handling the subtleties of :children in a hygienic AST transformation, which is a simple transformation compared to CPS.

      I think it would be a difficult task to include the children keys in each :op. There are at least three different types of children expressions: single expressions, sequence of expressions (implicit do) and local-binding inits. Distinguishing them would most likely result in a convoluted interface for children. Even if we found a solution, do we expect each consumer of an AST to define their own `children` function? Is this any better than just providing a multimethod to :refer as needed?

      In my (biased) opinion: children are strictly convenience and they hinder much more than they help.

      1. Mar 12, 2013

        I'm trying to be guided by principles that Rich outlined in the email thread here

         

        :children is admittedly half-baked, but remember all things that are baked were once half-baked :)

        Here are the objectives:

        1) The AST as data. Full stop. That is part of the CLJS experiment. We've seen enough code- and API-based expression tree libraries.

        2) Supporting generic code walkers. Code walking has traditionally been a difficult and brittle thing, as everyone encodes the primitives. Thus, adding a new primitive becomes impossible. Plus, that encoded knowledge itself is quite complex and duplicated between code walkers. Given that the AST data is a mix of things that would need to be walked and other details, there needs to be a mechanism to distinguish them. While a multimethod is one way to do make it an open system, it means that to manipulate your new AST node I don't just need the AST, I need your code. What about code walkers and analyzers etc as a service?

        So, the problems I see with what we currently have vs these objectives are:

        a) Inconsistency in the contents and use of :children.
        b) Duplication

        I wonder somewhat about (b). Given they are just values, in memory at least, they are shared. But yes, when printed or otherwise reified it's a bear.

        :children as a vector of keys seems promising. Next steps in that direction would be to propose a standard way to do that, determine what that would imply for each construct we currently have, and assess the sufficiency for generic code walking. Along the way, consider what other semantic clues the AST might need to provide as data. Only then, implement for what we have.

        1. Mar 12, 2013

          At first, I was in favor of :children as a vector of keys, but now I'm not so sure.

          Transformations inherently involve creating new nodes. "assoc" is useless in the presence of :env, :children, or other synthesized, non-essential attributes, since you need to get out the essential fields and re-analyze that new AST node or re-analyze a raw form. Any cruft data from a previous analyze call is inherently a bug in an AST transformation, if you change an essential attribute. So this means that when constructing new nodes, you have three choices:

          1. (analyze-ast {:op foo :x y :z w})
          2. (analyze-form '(foo y w))
          3. (make-foo y w)

          But #3 is just gonna delegate to analyze anyway, since the result is a map, not some object with accessor methods. So that's right out. Given this reality, I think it's perfectly fine to include full copies of :children instead of going with a vector of keys. It's a little ugly for printing, but a helper function that strips out non-essential attributes.

          Day dreaming for a moment: I wish that I could have a map that had representational and computed attributes, such that if you update a representational key via assoc or whatever, all the computed attributes get thrown out.

          Ideally: The only API methods would be analyze-ast and analyze-form. The analyze-ast function should, optionally, validate that the ast passed in only contains essential attributes. Both analyze methods should be configurable for what passes and data to include. Children should be one such flag, just like type hints could be another such flag. The default would be "include everything!" such that the average client only needs to know a single API call: (analyze-form top-level)

          1. Mar 12, 2013

            I should note that I've written a working ANF transform and an incomplete CPS transform. My experience has shown:

            1. :children is in fact useful:
              https://github.com/brandonbloom/cljs-cps/blob/2c8b9e9f5980aff5e62ea0922c9f30101584fcf9/src/cps.clj#L54-L59
            2. For non-identity transformations, (analyze-form env ...) in in nearly every single tail position
            3. analyze-form can be (comp my-pass clojure.core/analyze) if my-pass is idempotent
            4. I badly want the ability to interleave my pass with the standard analyze logic
          2. Mar 12, 2013

            Hmm, assoc being useless doesn't match my experience.

            My big use-case in CinC is making several passes over the AST doing nothing more than adding attributes to the tree. (:box tells me if I need to emit a cast, :type tells me the java type of this expression)

            The need for those attributes to be propagated up and down the tree correctly is what causes the current :children redundancy to make me pull my hair out so much. There's no way for me to put the attributes in the correct place if I don't know which child is which.

            1. Mar 12, 2013

              assoc is useful for adding analysis results, but is useless for "changing" nodes.

              The only valid way to "change" a node is to create a completely new node and re-analyze it.