Skip to end of metadata
Go to start of metadata

Rationale

Clojure's dynamic vars and binding mechanism mimic the traditional dynamic variables. However, the traditional notions don't interact well with laziness or thread pools. Improve binding so it plays better with other Clojure constructs.

The problems:

  • When work is sent to thread-pool threads, e.g. via agent sends or future calls, the work isn't done with the same binding set as the invocation (without manual effort)
    • Yet that is often the expectation/desire
  • If a lazy sequence is created within a certain scope of bindings and returned outside it, it gets done in the dynamic scope of the consumption
    • This too is not what is expected

The objective:

  • Convey bindings to agent and future thread-pool threads
    • Must be cheap!
      • Means you can't pay the cost of establishing truly new bindings (copies)
    • Instead, adopt binding map of point of call wholesale, including mutable cells
      • Make adoption as simple as assignment of map to thread local
      • This means these threads truly share the bindings with the caller
      • just like nested code in the caller thread would
        • i.e. can see changes
    • But - must avoid concurrency mess
      • Give bindings the semantics of volatiles
      • and check thread identity on set! calls
        • thus only the thread creating the binding can do set!s
      • thread pool threads will see effects of set!s in launching thread
        • this isn't really a feature to advertise, but is race-free and safe, at least
          • merely a side effect of doing it fast
    • Need a way to adopt a binding set wholesale
    • Need volatile semantics on binding boxes
    • Need thread ids in binding boxes
      • new nested TBox type in Var
    • Need to check matching thread on set!
    • Build conveyance into send/send-off and future?
  • Lazy seqs + bindings
    • this is trickier, quite unlikely we can afford even binding adoption per seq step
    • perhaps another flavor of lazy-seq that does binding adoption
      • use only when you need this
      • since must establish bindings every step
        • only affordable for i/o bound logic
    • Deliver this separately from thread support
    • Ditto delay?

Issues

  • Var counts
    • maintaining these adds per-bound-var costs
    • and requires bindings cleanup on termination
      • if we didn't need to clean up then conveyance could just be assignment of same Frame to thread's dvals
    • in a world where few vars are used dynamically, and then if so almost always so, perhaps can do away with counter?
      • simple flag - has ever been dynamically bound
  • In all cases, people who don't care about or use bindings in the async/delayed work won't want to pay for the overhead of this
    • how to avoid parallel set of constructs or flags everywhere?
  • Think ahead to fork/join
    • if all of our forks are via our APIs then we can do same propagation there
Labels:
  1. Jan 23, 2011

    Now that we have binding Frames, and binding counters have gone, what is the issue with having LazySeq capturing the Frame at construction, and setting and restoring the Frame before and after calling the generating fn?

    Would this behaviour be flawed in some way, or is the concern just that it would be too slow?

    Number crunching mapping over vectors is based on chunked-cons, so perhaps the overhead of swapping the frame wouldn't be too bad; i/o applications are probably item at a time, but they probably wouldn't notice the overhead.

  2. Mar 15, 2012

    I think automatic, implicit conveyance of bindings like this is a big mistake. 
    The user should always do it manually, or he can explicitly, himself, add his binding to a global list
    of bindings that are to be conveyed automatically by agents/futures etc..
    This is totally reasonable:
    user> (binding [*db* true]
            (let [a (agent nil)]
              (send a (fn [_]
              (Thread/sleep 250)
              (println "*db* =>" *db*)))))
    *db* => #<Unbound Unbound: #'user/*db*>
    #<Agent@54789237: nil>
    user>
     
    I.e., the *DB* connection  has dynamic and predictable extent that shouldn't
    implicitly "leak into" spawned threads (agents/futures) where all track of
    extent is lost!
    The Clj-1.3+ (or 1.4.x; I forget) behavior is not good:
    user> (binding [*db* true]
    (let [a (agent nil)]
    (send a (fn [_]
    (Thread/sleep 250)
    (println "*db* =>" *db*)))))
    *db* => true
    #<Agent@29212307: nil>
    user>
     
    How does one deal with e.g. resources here? There is no way to do this in a reasonable way; it 
    should fail by default ("No database connection") and only convey if the user explicitly
    requests it.