Here are my notes from Pat Shaughnessy's talk in RuPy 2013 about garbage collection in Ruby and Python.
- Ruby and Python are very similar languages, so GC probably similar too?
- Why talk about GC? Run code faster. Chance to look at some of the more interesting algorithms in CS. Some very old, some new - Rubinius' GC algorithm was invented in 2008.
- Garbage collectors do much more than just collect garbage. Three things: Allocate memory, identify which objects are garbage, reclaim the memory from the identified garbage.
- Body metaphor: We usually write business logic - the brain. GC is the beating heart: Provides memory, new objects, blood and nutrients. If your heart stops beating, you die. If your GC slows down (high blood pressure, clogged arteries), you suffer.
- The free list, one of the oldest CS algorithms out there.
- Invented by John McCarthy - a giant of computer science. Invented LISP. Worked at MIT. Also invented garbage collection. Explained in paper "Recursive Functions of Symbolic Expressions And Their Computation By Machine, Part I"
- Surprising thing about Ruby: Create an object, nothing really happens. Before your code even executes, it creates about 10000 objects, "r-value structures", in a linked list. When you create an object, Ruby doesn't have to do anything, it just takes on thing off the list.
- Python also uses a free list internally, but not like Ruby, just for recycling. It actually allocates memory when you create an object.
Identify And Reclaim
- In Ruby, when objects are left behind, they just sit there. It's like living in a dirty house, no one cleans up. Everything just lies around.
- Python does reference counting, like living in a very clean house. People are organized. Explained in paper "A Method for Overlapping And Erasure of Lists" by George Collins. Every time you create an object, Python saves a number inside the object. The reference count. A count of how many pointers point to that object. Initially 1. Create another reference, gets incremented. Remove a reference, decrements. GC comes alive immediately when it reaches zero - cleaned up right away. GC happens continuously over time, no pauses/gaps.
- Ruby's problem with free list that it'll run out. When that happens, Rub has to run a garbage collections. Uses mark and sweep: First, stops your code, the whole application, for a fairly long time. Marks objects as active (from Ruby 2.0 on using "bitmap marking"). Then walks through pointers (through object graph), unmarked objects go back on the free list, recycled back to the application. Objects don't die, they get "reincarnated to a new life". There's also "lazy sweeping", which only does a partial sweep.
- Create a list in python, then set it to None. It has to recursively figure out what's in the list and decrement its references too. Could cause a pause if it's a very big list.
- Reference counting doesn't always work: Cyclical references. A doubly linked list, each node has pointers both ways. Reference counts go up to 2. Set the nodes to None - you'd like the list to go away. But their ref counts remain at 1. This kind of garbage collects in memory - house still a little bit messy. For this, Python has a "live list", "generation 0". If too many are created vs. freed, it periodically goes to that lists and looks for cycles within the lists and decrements those. Other objects might still have refs from outside the application. Others moved to Generation 1 - Generational Garbage Collection.
- Weak generational hypothesis: Create an object. Most of the time they have a short lifetime. Used in one method, etc. In a request in a Rails app, lot of objects created, but after processing it's all garbage. Only a few objects remain active for a long time. Once an object lives a certain time, it's more likely to continue.
- Generational GC is now in Ruby 2.1 (has been in Rubinius for years). When free list is empty, divide objects into generations: Young (never marked before, likely to die quickly), mature (marked before). Move young, live objects to mature. Not going to mark them again and assume it's active and alive - probably long lived. In next GC, only run mark and sweep on the young objects. Most garbage is going to be in there. Not as likely for mature objects. Will eventually do a full collection when running out of memory.
- Problem with generational GC: What happens with new objects that are referred to from mature (e.g. an array)? Mark algorithm won't find it and free it. That would suck. Ruby needs to monitor the mature objects using a "write barrier", and make sure the new object is marked.
Problem with C extensions: No write barriers in C code. Called "shady objects".
Ruby and Python work very differently, although superficially very similar.
Book: "Ruby Under A Microscope" just gone into print. A lot about Ruby internals there.
Know Your AngularJS Inside Out
Build Your Own AngularJS helps you understand everything there is to understand about AngularJS (1.x). By creating your very own implementation of AngularJS piece by piece, you gain deep insight into what makes this framework tick. Say goodbye to fixing problems by trial and error and hello to reasoning your way through them