AST children

Skip to end of metadata

Page restrictions apply
Added by Brandon Bloom, last edited by Aaron Cohen on Mar 12, 2013 (view change)

Problem

The AST nodes produced by the ClojureScript analyzer have a :children key to allow generic traversals. Unfortunately, this key is currently a denormalized view of child ast nodes.

For example, an :if node has :test, :then, and :else keys whose values are also contained in :children. This presents some difficulty when writing AST transformations because every transform function must be aware of this duplication. To correctly replace the :test of an :if node, you must also replace the first element of the :children vector.

Discussion

Design

Solution must retain "AST is just data" property. ie No API should be required to interpret it
Idea: store :children as keys into the AST map
- would be nice: (defn children [ast] ((apply juxt (:children ast)) ast))
  - but need to handle implicit do blocks: (defn children [ast](mapcat #(if (coll? %) % [%]) ((apply juxt (:children ast)) ast)))
- ticket: http://dev.clojure.org/jira/browse/CLJS-289
- outstanding issue: interleaved bindings
Alternate idea: all child nodes are moved from the top level into a map in :children
- For "if" this would change {:op :if :env {env} :test test-expr :then then-expr :else else-expr :children [test-expr then-expr else-expr]} into {:op :if :env {env} :children {:test test-expr :then then-expr :else else-expr}}
- :children (into (vec (map :init bes))
  (conj (vec statements) ret))
  
  acohen: I think that should become: {:op :let :children {:init [inits] :statements [statements] :ret ret}
  We might want to discuss whether the inits are truly a child here, at the moment I think they are though.
  
  dnolen: So if a :children value entry is a vector it's something you need to iterate over? What about :ret? So you'll always need to look at :children if you want to know :ret?
  
  acohen: Yes, if :children entry is a seq, it must be a seq of nodes. For :let, I forgot that this is something else I changed in CinC. I made bindings an :op.
  So it should actually look like:
  {:op :let
  :children {:inits [{:op :binding :name x :children {:init init-expr}}
  ...]
  :statements [{...} ...]
  :ret ret}}
  
  dnolen: What about :ret? So you'll always need to look at :children if you want to know :ret?
  
  acohen: Yep, that's right. Tree traversal would _always_ go through :children, rather than having this separation of "syntax-aware walkers go through the nodes that they know the names of" and "generic walkers go through :children"
  
  bbloom: I think this approach suffers another issue: Generic walkers might need to know evaluation order. I should be able to post-walk the :childrens nodes to get a strict top-to-bottom, inside-out, left-to-right sequence of nodes, ignoring conditional and repeated evaluation. Otherwise, I need knowledge of nodes to tell if side effects from :init may happen before those from :statements
  
  acohen: Can we mandate that the children map be a map-like structure that preserves order?
  
  bbloom: I don't think that's reasonable unless core gets such a map-like structure.

Labels:

None

Edit Labels

7 Comments

Hide/Show Comments

Aug 29, 2012

Brandon Bloom
I'm interested in re-invigorating this discussion. I'm currently working on robust ANF/CPS transforms and running into various complexity caused by this. It's impossible for any AST transformer to know if a down-stream compiler phase will use :children or not. The first issue I hit was modifying :items in the :vector op. Turns out that the emit phase looks at :children instead of :items. Whoops! I've been resorting to building a form and then calling analyze on that.
- Permalink
1. Mar 09, 2013
  
  Ambrose Bonnaire-Sergeant
  Is there really enough utility in :children to build it into the AST spec? What's wrong with providing a `children` multimethod for people to use when needed?
  We obviously need to completely eliminate redundancies. Redundant :children simply makes all but the most trivial code transformations nearly impossible to perform correctly. I have no idea how I'd start handling the subtleties of :children in a hygienic AST transformation, which is a simple transformation compared to CPS.
  I think it would be a difficult task to include the children keys in each :op. There are at least three different types of children expressions: single expressions, sequence of expressions (implicit do) and local-binding inits. Distinguishing them would most likely result in a convoluted interface for children. Even if we found a solution, do we expect each consumer of an AST to define their own `children` function? Is this any better than just providing a multimethod to :refer as needed?
  In my (biased) opinion: children are strictly convenience and they hinder much more than they help.
  Permalink
  1. Mar 12, 2013
    
    Aaron Cohen
    I'm trying to be guided by principles that Rich outlined in the email thread here:
    
    :children is admittedly half-baked, but remember all things that are baked were once half-baked :)
    
    Here are the objectives:
    
    1) The AST as data. Full stop. That is part of the CLJS experiment. We've seen enough code- and API-based expression tree libraries.
    
    2) Supporting generic code walkers. Code walking has traditionally been a difficult and brittle thing, as everyone encodes the primitives. Thus, adding a new primitive becomes impossible. Plus, that encoded knowledge itself is quite complex and duplicated between code walkers. Given that the AST data is a mix of things that would need to be walked and other details, there needs to be a mechanism to distinguish them. While a multimethod is one way to do make it an open system, it means that to manipulate your new AST node I don't just need the AST, I need your code. What about code walkers and analyzers etc as a service?
    
    So, the problems I see with what we currently have vs these objectives are:
    
    a) Inconsistency in the contents and use of :children.
    b) Duplication
    
    I wonder somewhat about (b). Given they are just values, in memory at least, they are shared. But yes, when printed or otherwise reified it's a bear.
    
    :children as a vector of keys seems promising. Next steps in that direction would be to propose a standard way to do that, determine what that would imply for each construct we currently have, and assess the sufficiency for generic code walking. Along the way, consider what other semantic clues the AST might need to provide as data. Only then, implement for what we have.
    
    Permalink
    1. Mar 12, 2013
      
      Brandon Bloom
      
      At first, I was in favor of :children as a vector of keys, but now I'm not so sure.
      Transformations inherently involve creating new nodes. "assoc" is useless in the presence of :env, :children, or other synthesized, non-essential attributes, since you need to get out the essential fields and re-analyze that new AST node or re-analyze a raw form. Any cruft data from a previous analyze call is inherently a bug in an AST transformation, if you change an essential attribute. So this means that when constructing new nodes, you have three choices:
      (analyze-ast {:op foo :x y :z w})
      (analyze-form '(foo y w))
      (make-foo y w)
      But #3 is just gonna delegate to analyze anyway, since the result is a map, not some object with accessor methods. So that's right out. Given this reality, I think it's perfectly fine to include full copies of :children instead of going with a vector of keys. It's a little ugly for printing, but a helper function that strips out non-essential attributes.
      Day dreaming for a moment: I wish that I could have a map that had representational and computed attributes, such that if you update a representational key via assoc or whatever, all the computed attributes get thrown out.
      Ideally: The only API methods would be analyze-ast and analyze-form. The analyze-ast function should, optionally, validate that the ast passed in only contains essential attributes. Both analyze methods should be configurable for what passes and data to include. Children should be one such flag, just like type hints could be another such flag. The default would be "include everything!" such that the average client only needs to know a single API call: (analyze-form top-level)
      
      Permalink
      1. Mar 12, 2013
        
        Brandon Bloom
        
        I should note that I've written a working ANF transform and an incomplete CPS transform. My experience has shown:
        :children is in fact useful:
        https://github.com/brandonbloom/cljs-cps/blob/2c8b9e9f5980aff5e62ea0922c9f30101584fcf9/src/cps.clj#L54-L59
        For non-identity transformations, (analyze-form env ...) in in nearly every single tail position
        analyze-form can be (comp my-pass clojure.core/analyze) if my-pass is idempotent
        I badly want the ability to interleave my pass with the standard analyze logic
        
        Permalink
      2. Mar 12, 2013
        
        Aaron Cohen
        
        Hmm, assoc being useless doesn't match my experience.
        My big use-case in CinC is making several passes over the AST doing nothing more than adding attributes to the tree. (:box tells me if I need to emit a cast, :type tells me the java type of this expression)
        The need for those attributes to be propagated up and down the tree correctly is what causes the current :children redundancy to make me pull my hair out so much. There's no way for me to put the attributes in the correct place if I don't know which child is which.
        
        Permalink
        
        Mar 12, 2013
        
        Brandon Bloom
        
        assoc is useful for adding analysis results, but is useless for "changing" nodes.
        The only valid way to "change" a node is to create a completely new node and re-analyze it.
        
        Permalink