Here are my notes from Michał Marczyk's talk from EuroClojure 2013.
- "Choose immutability and see where it takes you" - Rich Hickey.
- The quote was about Datomic but the principle permeates the entire design of Clojure too. It makes possible certain architectural elements, such as the value/identity/state model. You can't do all that without values.
- Immutability simplifies reasoning about code. Observers are free to hang on to data.
Two libraries by Michał are especially relevant to the talk:
- core.rrb-vector extends built-in vectors with new operations.
- avl.clj is a new, drop-in replacement for sorted collections with new operations. Is also faster in certain scenarios.
Persistent Data Structures
- The basic approach: Persistent data structures offer the guarantee that immutable updates are fast. Internal structure is mostly reused. Achieved using trees and path copying. Performs well when updates are infrequent.
- The most complex data structure in Clojure core is the hash map. It was originally designed as a mutable data structure. The paper which descibes it talks about "ideal hash trees", so it has some "ideal" properties. We're using it in an immutable setting. With more frequent updates, this builds up GC pressure.
Speed-up Through Exclusivity
- A thing to realize about immutable trees is that they are not owned by anyone particular - anyone can hold on to them.
- However, you frequently need to produce a new version somewhere where the result is many operations removed from the original.
- It's important that both versions are persistent, but you can take ownership of the one updating as long as you're the only one interested.
- You can do this lazily.
- Install edit markers in the parts of the tree that "belong to us".
- When we perform a modification, and come across a node that has our edit marker, we can freely modify it in place.
- When done, we relinquish ownership of the marker (which is an
AtomicReference. The identity of the
AtomicReferenceis important. Inside it is a reference to the current thread).
- Tradeoff: AtomicReferences hanging around. But speedup sometimes extreme.
- For vectors (widely branching, shallow), there are not too many edit markers.
- Key point: This is a generally applicable methodology for speeding up some persistent operations, though only makes sense with predominantly persistent data structures. Once you commit to that, implementation can be somewhat abstract/open.
- If you build your own data structure, implement the protocols and you're off.
- avl.clj maps are faster than the built-in maps for lookups. They're written in Clojure, built-in ones in Java. Construction time with regular assoc probably suffers, but because they're transient-aware you can bash it in place. That's fast.
- Clojure's data-centric view leads to programming with pipelines and transformations.
- Transforming seqs allocates intermediate objects. Objects often superimposed on other data structures - vectors or random access java things (Strings, arrays).
- With tree things like vectors, where the leaf level stores arrays of entries, why not build a seq using those arrays rather than the entries? That's chunked seqs.
- Often we want to traverse, but in many cases don't. Instead we might want to map, reduce, etc. Those functions are aware of the chunking and use a ton of imperative code. Not pleasant to read, though we don't really have to care.
doseqare also chunk-aware. But
dorunis not. Don't use it for causing side-effects to happen.
- Can hook into all of this using the seq API. core.rrb-vector does so.
- More and more important in Clojure. Reducers, etc.
- You can reduce with first/next, but Clojure structures know better ways to reduce themselves because of the tree structure. Vectors, chunked seqs, seqs over random-access.
- A very pure, functional pattern. Exposed through protocols.
- Useful to lift reducers fork/join utilities for parallelization (Steele's foldl and foldr considered harmful).
- The abstractions are sufficiently widely applicable for using them in libraries. But that's assuming you know what they are and how to hook into them, as they are undocumented. But the code is well-written, of course.
- Saving grace on the JVM:
clojure.core/ancestorsof the class of a collection tells you the stuff you need to implement. Exception: reducers with protocols, doesn't show up here.
- The Clojure Programming book contains a useful function for this -
- In ClojureScript, you do need to read the code, but just the protocol definitions at the top.
(into reduce transients)- the basic landmarks of the Clojure conceptual landscape all make sense in separation, but come together in pleasing ways.
intofunction is blazing fast - the "to" collection is transient-enabled and we can take ownership. It's using
reducewith the "from" collection, which is smart about how it reduces itself.
intois what you use when you want a collection out of reducers, because you want something reduce-based.
- Possible to do it in Java, of course, but how do you stay maintainable if you do that?
Know Your AngularJS Inside Out
Build Your Own AngularJS helps you understand everything there is to understand about AngularJS (1.x). By creating your very own implementation of AngularJS piece by piece, you gain deep insight into what makes this framework tick. Say goodbye to fixing problems by trial and error and hello to reasoning your way through them