Skip to end of metadata
Go to start of metadata

The problem

Clojure's support for dynamic development incurs runtime costs that may be undesirable in certain production environments.  These include:

  • Start time
  • Var.getRawRoot overhead

  • Heap use
  • Deployment size

Additionally, the indirection involved using a Var, especially via getting a reference to a Var via Var.intern(String, String) makes Clojure-generated bytecode hard to analyze via programs like ProGuard.

Startup time

Starting a Clojure program takes a significant amount of time, most of which is taken up by the Clojure runtime bootstrapping itself, i.e. loading and initialising namespaces and vars (see Why is Clojure bootstrapping so slow?).  For long-running applications, this is probably not an issue.  However, this is a problem in other scenarios:

Android

Android applications should ideally load about as fast as Java or Scala applications.  Currently, on a high-end telephone, a minimal Clojure program will take about one second longer, which is perceptible.  In some cases, having a Java splash screen thrown up while Clojure bootstraps may be an acceptable solution, but that doesn't work for all programs.

Command line programs/utilities

Depending on the type of utility, Clojure's start time can dominate the time used to do the actual work.  Current workarounds include using persistent JVMs, i.e. nailgun, or using ClojureScript with Node.js.

Google App Engine (GAE)

GAE imposes a strict sixty second time limit for responding to a request.  If the application hasn't warmed up, Clojure's startup can take up a significant portion of that time, resulting in the initial request timing out.

Var.getRawRoot overhead

Generally speaking, each time a Var is used within a function invocation, it's root binding must be retrieved.  While this is a very simple operation, it does require reading a volatile variable, which may impede optimisation.  In some programs, this overhead is measurable.  Currently, this best workaround for this is using a macro or definline, but this results in code that is harder to read and write.

Heap use

Clojure 1.5.1 uses over seventeen megabytes of heap size just starting a basic REPL.  Much of this heap use results from Clojure simply loading up a lot of Vars which may or may not be used in a given program. For servers with gigabytes of RAM, this may just be noise.  However, in more constrained environments, e.g. Android, this can be more of an issue.

Manual tree-shaking via commenting out portions of clojure.core has shown significant reductions in heap use.  Unfortunately, namespaces are currently atomic and cannot be easily broken up.

Deployment size

Clojure 1.5.1's JAR takes up about 3.5 megabytes.  This isn't an issue for a lot of environments, but this can be an issue for mobile applications or Clojure as a library.

Mitigation strategies

A number of mitigation strategies have been proposed to resolve one or more of the above issues:

  1. Static compilation
  2. Lazy loading of Vars
  3. Don't load the user namespace by default
  4. Eliminate the compiler

Static compilation

  • Overall strategy:
    • The general idea is to compile Vars down to static final fields and methods on namespace classes.
    • In the ideal case, there are no Vars involved at the point of use/invocation, just an invokestatic or getstatic op.
    • However, static methods aren't first class functions, so there is still a need to be able to invoke them through an object, possible a Var or another IFn instance.
    • Dynamic Vars would remain more or less the same.
  • Things to think about:
    • What's the best way to interoperate with code that explicitly manipulates namespaces or Vars, e.g. ns-resolve or with-redefs
    • Where does metadata go?
    • Should a 'static' Var invoke/get the static method/field, or should it have its w
    • In a more 'static' environment, is it possible to eliminate the volatile on a Var's root binding?
    • Given the changes in how things are compiled and possibly even changes to core Clojure classes, how do we solve the artifact distribution problem?  See Build Profiles.
    • Is there a way to resolve Vars that enable dynamic development starting from a largely static runtime?

Lazy loading of Vars

  • Currently, all Vars are created when their namespace is loaded.  What if that work could be deferred?
  • This strategy has been used in clojure-objc
  • While this ameliorates the startup time problem, all it does is put off the work to some other point in the program.  However, in conjunction with static compilation, it might mean that the Vars are only created for use in higher-order functions.
  • Is there an inexpensive way to keep track of whether a Var has been loaded?  Could we use the linker for this?

Don't load the user namespace by default

  • In some environments, creating the user namespace is pure overhead.  Removing this can make a difference at startup time and on memory use.
  • How do we know when to do this or not?

Eliminate the compiler

  • If it's not needed, why include it?  This can make a big difference in deployment size.
  • See Build Profiles

Other things to think about

  • The big one is Build Profiles.
    • At least in the Clojure open source world, most libraries are distributed as JARs of Clojure source files.  As such, library authors are largely off the hook as to make decisions about compilation decisions.
    • However, this isn't the case for everything, namely Clojure itself.
  • What should be the default compilation mode?
  • What sorts of tests should be run to ensure that the technical solution is optimal?
  • How independent should each of these changes be?  Is it just a master production-/development-mode switch or are individual compilation features independently switchable?

Development plan

  • There is a good chance we can have a Google Summer of Code student work on this.
  • It would be good to see some form of this in the next Clojure release.

Related Hammock topics

Labels:
  1. Apr 03, 2014

    There are some strong connections here between some of this and the goals of my "Kiss" language design experiment that it is worth being aware of (https://github.com/mikera/kiss)

    Key points:

    1. Kiss eliminates mutable vars by replacing namespaces with immutable environments (redefining vars is possible just like in Clojure but produces a new immutable environment). This has the potential to eliminate 100% of the var lookup overhead.
    2. Partly because of 1, is becomes much easier to do static type inference. Basically, we can have something analogous to core.typed running as part of the compiler, which nicely helps the static compilation issue.
    3. Kiss maintains a dependency graph of symbol references. I believe this creates some interesting opportunities for lazy compilation (you can efficiently compile some code and all of its dependencies on demand).

    Kiss as currently envisioned will probably be about 95% Clojure compatible (it directly borrows most stuff: syntax, reader, immutable data structures etc), but there are some edge cases around vars / environment handling that will inevitably break some existing code.

    Kiss isn't yet complete and it's definitely experimental - but I do think that the ideas are powerful and worth thinking about for the Clojure 2.0 timeframe. If the 'lean runtime' work can bear this in mind then it would be great!