defrecord
and deftype
improvements for Clojure 1.3
Motivation
The Java unification of records prevents them from being first class, in either the data or fn sense:
- record data is not first class
- can't read/write them
- crummy choice: maps are good as data, need records for protocol polymorphism
- user code cannot fix this
- anything that requires EvalRead is not a fix
- can't read/write them
- record creation is not first class
- no per-record factory fn (or access to any associated fn plumbing, e.g. apply)
- Clojure level use/require doesn't get you access to records
- user code can mostly fix this (defrecord+factory macro)
- Symmetrically, PO Java classes are also not first class
- A unified reader form would be ideal
- Reduced import complexities
Solutions
For the sake of discussion, focus will revolve around example defrecords and deftypes defined as
(ns myns) (defrecord MyRecord [a b]) (deftype MyType [a b])
Semantics of records as first class data
The semantics of record reader forms and record factory functions are defined as follows:
(-> (MyRecord. <initialization value> <initialization value>) (into {:a 1, :b 2}) <validation>)
note: The semantics illustrated above should not be taken as implementation detail. at the moment <validation> is undefined and should be considered a no-op.
The <initialization value>
refers to the same default default values for Java primitive types (as defined by type hinting on the record fields) or nil
for instances. For record reader forms, the keys and values must remain as constants as their semantics require that the readable form coincide with the evalable form.
Record and Type reader forms
There would be two additional reader forms added to Clojure.
Labelled record reader form
#myns.MyRecord{:a 1, :b 2}
Positional record and type reader forms
#myns.MyRecord[1 2]
and
#myns.MyType[1 2]
This syntax satisfies the need for a general-purpose Java class construction reader form. However, not all Java classes are considered fully constructed after the use of their constructors. Therefore, serialization support is not provided for any Java classes by default. For instances such as these, Clojure will continue to provide facilities via print-dup
in the known ways.
Generated factory functions
When defining a new defrecord, two functions will also be defined in the same namespace as the record itself. For new deftypes, only the positional constructor outlined below is generated.
Factory function taking a map (defrecord
only)
A factory function named map->MyRecord
taking a map is defined by defrecord
.
(myns/map->MyRecord {:a 1, :b 2}) ;=> #myns.MyRecord{:a 1, :b 2}
Factory function taking positional values (defrecord
and deftype
)
A factory function named ->MyRecord
taking positional values (as defined by the record ctor) is also defined by defrecord
.
(myns/->MyRecord 1 2) ;=> #myns.MyRecord{:a 1, :b 2}
and
(myns/->MyType 1 2) ;=> #<MyType myns.MyType@2ed277f2>
Writing records
When writing record data for the purposes of serialization, the positional reader form is used by default:
(binding [*print-dup* true] (pr-str (MyRecord. 1 2))) ;=> "#myns.MyRecord[1, 2]"
However, if you wish to use the map reader form instead, then the following would work:
(binding [*print-dup* true *verbose-defrecords* true] (pr-str (MyRecord. 1 2))) ;=> "#myns.MyRecord{:a 1, :b 2}"
note: printing forms for types are not provided by default
Tool support
Defining Clojure defrecords will also expose static class methods useable at the Java API level. These methods are not documented with the intention of public consumption and are considered implementation details.
Static factory for defrecords
The static factory exposed will mirror the map->MyRecord
function:
(MyRecord/create aMap)
Basis access
A static factory allowing access to the basis keys will also be provided:
(MyRecord/getBasis) ;=> [a b]
and
(MyType/getBasis) ;=> [a b]
The getBasis
method will return a PersistentVector
of Symbols
with (potentially) attached metadata for each field.
Old Ideas
Lesser Problems:
- generic factory fn
- like factory fn, but generic with name
- introduces weak-referencing, modularity issues, etc.
- don't have a good problem statement, so ignoring this for now
- which comes first: generic or specific?
- support for common creation patterns
- named arguments
- with more than a few slots, record construction is difficult to read
- default values
- maybe needs to be a property of factory fn, not record
- different factory fns can have different defaults
- validations
- are the patterns truly common?
- very solvable in user space, esp. if per-record factory fn available
- named arguments
- application code needing to know record fields
- synthesizing data
- creating factory fns if we don't provide them
Challenges:
- how evaluative should record read/write be?
- option 1: records are data++: no EvalReader needed, no non-data semantics
- option 2: records are more:
- maybe EvalReader required?
- maybe special eval loopholes for constructor fns?
- option 1 wins
- what happens when readers and writers disagree about a record's fields?
- positional approach would either fail or silently do the wrong thing
- k/v approach lets you get back to the data
- still on you to fix it
- does this have be a breaking change?
- data print/read: no
- constructor fn: yes
- any good generated name likely to collide with what people are using
- what if defrecord is not present on the read side?
- fail?
- create a plan map instead
- plus tag in data?
- plus tag in metadata?
- reify in a tagging interface
- attempt to load
- no – could lead to arbitrary code injection during read
Some Options:
- create reader/writer positional syntax, no constructor fn
- pros
- easy to deliver efficiently
- non-breaking
- introduces no logic (user or clojure) into print/read
- cons
- what happens if defrecord field count changes?
- what happens if field names change?
- no way to know
- feels like a non-starter
- pros
- create reader/writer kv syntax, no constructor fn
- pros
- non-breaking
- introduces no logic (user or Clojure) into print read
- can still recover data if defrecord structure has changed
- cons
- how to deliver read efficiently?
- create empty object + merge
- cache the empty object we merge against?
- reflect against object and manufacture reader fn
- who keeps track of this?
- how would this interact with constructor, if we add that separately?
- add a map-based constructor to defrecord classes
- what would its signature be?
- add a static map based factory fn to defrecord classes
- create empty object + merge
- how to deliver read efficiently?
- pros
- reader/writer syntax that depends on a new factory fn
- pros
- can be efficient
- can implement any policy in handling defrecord changes
- cons
- likely breaking (what will the fn names be?)
- read/write now depends on fns
- pros
- positional constructor fn
- no
- replicates the weakness of existing constructors
- kv constructor fn
- open questions
- autogenerated for all defrecords?
- optional?
- conveniences (defaults, etc.)
- no
- open questions
Tentative Proposal 1:
Define a k/v syntax for read and write that does not require a factory fn.
- adopt the existing print syntax as legal read syntax?
"#:user.P{:x 1, :y 2}"
- get Rich's input on efficient reader approach (4 possibilities listed above)
- if reader defrecord fields are different, merge and move on
- Undecided: if record class not loaded:
- TBD: error or make a plain ol map?
- hm, could fix on writer side: option to dumb records down to maps?
Tentative Proposal 2:
Autogenerate a k/v factory fn for all defrecords.
(new-foo :x 1 :y 2)
- class constructor is an interop detail
- factory fn is the Clojure way
- people can build their own defaults, validation, etc. easily with macros, given this
Some history:
The record multimethod was almost ready to go when Rich raised the GC issue. What happens when somebody creates a ton of record classes over time? GC can collect records that are not longer in use, but doesn't clean up the old multimethod functions.
Additional Reading
Some (non-contributed) code that demonstrates people's need for this:
- cemerick's defrecord slot defaults
- David McNeil's enhanced clojure records
14 Comments
Hide/Show CommentsJan 04, 2011
Rich Hickey
I have no idea what is being proposed from this, nor what else was considered, nor what the tradeoffs are.
Jan 05, 2011
Stuart Halloway
Worse than that, the proposal wouldn't work even if we had made it readable. Sleeping mind thinks this updated proposal would work. There are two questions waiting your input:
#=
for print/readable records? I would rather have something that kept serialization more separate from arbitrary execution of code.Jan 19, 2011
Alex Miller
It's hard for me to tell what the state of this doc is but we have used records extensively and added support for a number of features. Consider this an experience report from the field and take from it what you find interesting / useful.
Apr 06, 2011
David McNeil
I wrote up my thoughts and questions on the proposal here: http://david-mcneil.com/post/4403345585/defrecord-improvements-feedback It echoes much of what Alex said above.
Apr 06, 2011
Stuart Halloway
Thanks for writing this up. Responses to a few of your items:
4. We are worrying about a print form that will be readable, so omitting the namespace is not an option. There can be other print formats, of course.
5. The universal constructor may happen later, but not in the scope of the smallest shippable improvement.
6. Agreed, we should have a record? predicate.
8, 9. Automatic multimethod participation is tricky to do generally in a way that is performant but also class-loader and modularity friendly. Do you have working code that covers this?
May 07, 2011
David McNeil
> We are worrying about a print form that will be readable, so omitting the namespace is not an option. There can be other print formats, of course.
Hmm... "not an option", but "there can be other print formats"... I wasn't asking for this to be the default, but rather I was asking for an option to exclude the namespace. Seems like maybe this could be another print format? From my experience using records intensively on real code this is quite valuable when debugging and writing test code that uses trees of records.
> Automatic multimethod participation is tricky to do generally in a way that is performant but also class-loader and modularity friendly. Do you have working code that covers this?
Yes, https://github.com/david-mcneil/defrecord2
Thanks for the response (sorry for my delayed response).
-David
May 12, 2011
Fogus
You might find the
(->R ...)
to be more succinct and likewise more flexible in those cases.Sweet! I can't wait to look at your code more deeply.
Thanks
:F
May 16, 2011
David McNeil
> You might find the
(->R ...)
to be more succinct and likewise more flexibleI don't know what "->R" is and google was not able to help. Or was that a typo?
-David
May 07, 2011
Alexander Taggart
A few questions relating to the current patch on CLJ-374:
Why does
CtorReader
eval its arguments? The implementation ofCtorReader.resolve
parallels the implementation ofEvalReader.invoke
, except it (understandably) doesn't perform a*read-eval*
check. Is there a reason non-literal arguments to a record constructor should not need to use the eval reader macro?Records can be safely instantiated in the reader since we control their constructor implementation, but that's not necessarily true of other classes. Currently the
CtorReader
will, using the#myns.MyRecord[arg]
positional format, instantiate any class, e.g.,#java.util.Date[0]
. Is that openness of the reader intentional, given that it is not guarded by*read-eval*
?Assuming the above is acceptable, once the data structure from the reader is passed to the compiler, any class it doesn't recognize is emitted as a
ConstantExpr
. Is that always appropriate?My sense is that
CtorReader
should be restricted to instantiating instances of theIRecord
marker interface, and that it should, like all other non-EvalReader
readers, treat its arguments as literal clojure data structures, leaving the work of evaluating arguments to the eval reader macro as needed.May 12, 2011
Fogus
Hi Alex,
Thanks for the questions. They were mostly targeted at a previous version of the patch, but I will try to address them the best that I can.
Currently it does not:
Yes it will, but that's not the end of the story. First, the
#foo.bar.Klass...
reader form will attempt to call a ctor for the classKlass
but it will only get you so far.For objects that have
print-dup
definitions we can embed them in other Clojure forms – the compiler is happy. For those that do not we either need to define print-dup for them, or use some other method. For arbitrary Java classes we can not assume that a call to its constructor results in a fully constructed object. For records and types we, as you say, have control over their construction and can make different assumptions.I'm not sure what you mean. Do you mind rephrasing?
Thanks again.
:F
May 14, 2011
Alexander Taggart
Note that some of this might be better directed at Stu, assuming he was the one driving these changes.
Most of the eval'ing was removed, but it does try to eval symbol args as classes, and only as classes:
This is interesting behaviour considering classes are print-dup'd with the eval reader macro:
Which raises the question of what would be sending something like
#user.R[java.lang.String]
to the reader. Is this (now committed to master) functionality intended to enable reading of print-dup'd records or allow humans an alternate way to type record instances?If the literal notation is intended to be emitted by
print-dup
for non-records, then that's not true, as the(bean #java.util.Date[10101001])
example shows. The preceding assumes someone wrote aprint-dup
method forDate
, emitting that notation. If that is not correct, then what is the purpose of the constructor literal for non-records?Finally, it's not clear to me whether the "EvalRead is not a fix" requirement is meant to apply just to the record or to its arguments as well. The latter seems unlikely as the set of readable-without-eval types is outside the scope of this change (records excepted). If the former, then this all seems to be sugar for calling the record constructor while avoiding
#=
. If that's the case, then why not allow the reader to read the literal notation and emit a form that the compiler can then process? E.g., reading a string"#myns.ARec[#myns.BRec[5]]"
and return a clojure data structure of(new myns.ARec (new myns.BRec 5))
which will then be passed to the compiler.Though perhaps there is a desire for record values to be emitted as constants rather than as a call to a constructor. If so, then the process would need to parallel that for maps, namely that the reader creates some data structure (not an instance of the specific record class) which is passed to the compiler, which in turn checks if the arguments are all
LiteralExpr
before emitting as aConstantExpr
, otherwise a runtime call is made.May 16, 2011
Fogus
Hi Alex,
This was a result of my attempting to be too clever with the Reader and will be fixed in the next release.
I'll read your other questions more closely and respond post-haste.
:F
May 16, 2011
Rich Hickey
The Writing records section seems wrong. We should never be printing something that can't be read. Printing factory fn calls would require evaluation to restore. I.e. these should always print #something...
Also, deftype behavior needs to be spelled out in all cases.
May 16, 2011
Fogus
Indeed you're right and
#something
is what the implementation does currently. I will bring the text up to date regarding the Writing and the deftype behavior.