Santa repeatedly sleeps until wakened by either all of his nine reindeer, back from their holidays, or by a group of three of his ten elves. If awakened by the reindeer, he harnesses each of them to his sleigh, delivers toys with them and finally unharnesses them (allowing them to go off on holiday). If awakened by a group of elves, he shows each of the group into his study, consults with them on toy R&D and finally shows them each out (allowing them to go back to work). Santa should give priority to the reindeer in the case that there is both a group of elves and a group of reindeer waiting
Introduction
I discovered the The Santa Claus Problem and join patterns innocently enough: a post in the GoLang Nuts mailing list. I regularly read Golang Nuts since Stackless Python and the Go programming language’s concurrency model share a common ancestor: The Bell Lab’s family of languages (i.e., Newsqueak, Limbo). A comment (Stackless Python has no select!) in Richard Tew’s blog Stuff What I Posted comparing Stackless Python to Go was the inspiration for my EuroPython 2010 and Pycon 2011 talks “Prototying Go’s Select with stackless.py for Stackless Python.”
Googling join patterns led to Join Patterns for Visual Basic (Visual Basic!) which in turn led to Jingle Bells: Solving the Santa Claus Problem in Polyphonic C#” and Modern Concurrency Abstractions for C#. Along the way, I downloaded JoCAML. Eventually I encountered the Simon Peyton Jones’ STM based Haskell solution featured in the book Beautiful Code. I read the original John Trono solution featured in the paper A New Exercise in Concurrency: whoa! By the way, there is an interesting Youtube video on The Santa Claus Problem.
By the way, join patterns is a synchronisation and communications concurrency construct. Based on the Join Calculus of Cedris Fournet and George Gonthier, the essential features of join patterns are asynchrony, message passing and synchronisation based on pattern matching involving channels. Join patterns promise a more declarative and safer way of implementing a variety of concurrency solutions.
An Initial Stab at Santa
For Pycon 2011 version of “Prototyping Go’s Select,” I illustrated the utility of the select statement through an implementation of the Santa Claus Problem (the code can be found here). Here is the nucleus of the code:
def santa(reindeer, elves):
reindeerCases = [ch.receiveCase() for name, ch, waitTime in reindeer]
elvesCases = [ch.receiveCase() for name, ch, waitTime in elves]
reindeerQueue = []
elvesQueue = []
cases = reindeerCases + elvesCases
while True:
ch = stackless.select(cases)
print ch.value
if ch in reindeerCases:
reindeerQueue.append(ch.value)
if len(reindeerQueue) == 9:
harness(reindeerQueue)
deliveryToys(reindeerQueue)
unharness(reindeerQueue)
reindeerQueue = []
elif ch in elvesCases:
elvesQueue.append(ch.value)
if len(elvesQueue) == 3:
consultWithSanta(elvesQueue)
elvesQueue = []
else:
print "WHAT?"
(my solution is fashioned after the Benton Polyphonic C# solution in “Jingle Bells.”)
Though the solution is terse, it is imcomplete because the code to give reindeer priority was omitted. The Stackless version (and probably a Go version) is relatively straightforward because the non-preemptive scheduler and high level constructs like channels eliminate most of the problems that solutions using lower level concurrency mechanisms need to explicitly address (i.e., locking).
The main issue with my select based solution is that the Santa Claus tasklet does a lot of housekeeping in the form of maintaining lists and counters for reindeer and elves. This is clutter. Also what would happen if another process, say Mrs Claus, was competing for reindeer and elves?
An understanding of the underlying implementation would reveal that these data structures already exist in the form of the respective elf and reindeer channel queues. The trick to a cleaner solution would be to keep the elves and reindeer enqueued until needed…
Isn’t this what join patterns do?
Could a Stackless Python based solution win, place, or show against the Haskell “Beautiful Code” solution?
The Adventure Gets Under Way
Baby Steps/Game Plan
At Pycon 2011, I finally had the opportunity to meet Christian Tismer. Everything that could go wrong, went wrong at Pycon 2011. Meeting Christian and the Stackless sprint made the trip worth it. Amongst the many subjects discussed with Christian was extending the select algorithm to encompass some form of join patterns. This discussion was in the context of figuring out how to move more functionality out of Stackless Python’s C code base and into Python. A piece of advice Christian gave was to take shortcuts. For instance, just allow a channel to appear in one join pattern. I felt there was an 80/20 rule at work with join patterns. That is I could reap 80% of join pattern’s power easily by not implementing the difficult 20%.
Brainstorming
On return, I with the help of my friends Kevin Bulušek and Ted Ewanchyna, started work on join patterns. Join patterns has been the toughest project to date. Unlike select, join patterns do not cleanly map onto Stackless Python’s model. I did not have an implementation to examine (I suppose I could have looked at JoCAML’s code). I poured over papers that implemented join patterns in various languages (I can’t be the first to attempt join patterns in Python?).
As in the case with select, the project began with a flurry of email activity as Kevin and I exchanged ideas about how to go about implementing join patterns. Often, I would see that Kevin attached code attached to an email. After scratching my head looking at Kevin’s Haskell masquerading as Python code (I am so not a functional programmer), I found myself saying “thank god, he’s on our side!” The functional programming and the algebraic approach echoed what I would eventually see in the literature.
Since we worked on separate code bases, the parallel prototyping allowed for greater freedom to explore alternatives. Some of Kevin’s ideas don’t mesh well with my present design. Other solutions would have never crossed my mind (i.e., wow, you connect the queue to the chanop! Interesting!) and I would like to incorporate in a future branch.
The important point is when one can prototype quickly in a language like Python, exploring alternatives to see what works and what doesn’t becomes feasible.
“A new point of view is worth 80 IQ points.”
–Alan Kay
Beginning of a New Model
By mid May, I realised I was engaged in a game of wack-a-mole. I abandoned my prototypes and went back to the drawing board. Up to then, school occupied much of my time and defocused me. I started to carefully categorise the problem space:
- Interactions between “regular” channels and channels participating in a join pattern (inter-tasklet).
- Interactions between patterns in a join/select (intra-tasklet).
- Interactions between patterns and channels (channels are treated as a special case of a join pattern).
- Interactions between join patterns (inter-tasklet).
- Interactions that would cause deadlock.
Eventually I was able to enumerate the cases. Fortunately there was a manageable number of cases. Some edge cases I originally did not think were possible. However now I could decide the right behaviour and write tests. A post in Go Lang Nuts and a response from Ian Lance Taylor, persuaded me to add a new feature to the design: transferring data only when the pattern matched (“atomic transactions”). The original plan was to have join patterns work more like UNIX’s select. The original design would have made join patterns vulnerable to race conditions. That is just wrong.
Also at this time, I had read the paper “Scalable Join Patterns.” Much of the paper I didn’t think was relevant to my case. However I was exposed to concepts such as message stealing and operations having transaction states. I was also consulting Gregory Andrew’s “Concurrent Programming: Principles and Practices” for insights on message passing constructs.
Note: in a separate blog post, I will example the matching rules in more detail. I am very slow when it comes to making diagrammes. Perhaps I will simply scan my log notes.
With every mistake, we must surely be learning
–George Harrison
Two Man Programming
I can’t emphasize the power of two man programming. Eventually Ted entered the fray. Just having Ted looking over my shoulder and pointing out silly mistakes, asking pointed questions when a particular piece of code didn’t make sense and telling just telling me to hang in there during particularly rough stretches. Ted also pushed me to write additional debugging code. Once function, dumpPatternState() was indispensable. Also the WingIDE (I received a free copy as a result of participating in the Pycon sprints) on numerous occasions, saved our gravy. More work got done over the course of three weeks than the prior two months.
to run fast, run alone. To run far, run together.
– African Proverb
To add to the surrealism of the project, much of the programming was done at the Notman House mansion either at night on weekends when the place was deserted. We worked under the cover of quiet. Visions of Tyler Durden’s dilapidated house in “Fight Club” sprung to mind whenever Ted and I recessed in the lounge only to be annoyed by the hum of the loud vending machine. Thinking about the loud vending machine reminded me of a line from Tom Waits’ “Frank’s Wild Years”:
Never could stand that dog.
The Approach Part I: Interpreting Polyphonic C# Join Patterns to Stackless Python
To give the reader a flavour for join patterns, here is an example from “Modern Concurrency Abstracts in C#”:
public class Buffer
{
public string Get() & public async Put(string s) {
return s;
}
}
A join pattern or chord consists of a header (in this case Get() and Put()) and a body (return s). The join patterns’ header methods either send messages or receive them. Furthermore, these methods are invoked either synchronously or asynchronously. The body (or lambda) is invoked only when all the patterns messages are called. What is nice is that the and operator (‘&’) indicts this is a conjunction.
Since the thread invoked Get() synchronously, it would block until all of the reminding pattern’s methods are called. If the asynchronous method Put() was invoked first, the thread would go about its merry way and the message would be enqueued. When there is a match, the body (return s) is executed in the receiver’s thread. The receiver is unblocked and the value returned. If the receiver happened to be asynchronous, then a new thread would be created.
Something that worried me was that Stackless Python does not have direct support for asynchrony. As a matter of fact, in the literature, synchronous channels are implemented as a join pattern. This is an indication of the expressive power of join patterns. In Stackless Python, it is possible to achieve asynchrony in two ways:
- Subclass the stackless.channel class and create a buffered channel.
- Create a new tasklet to handle a potentially blocking operation and let the parent tasklet proceed on its merry way.
I made two additional assumptions:
- Functions that return values are equivalent to a channel’s receive() method. That is buffer.put() and buffer.get() is equivalent to buffer.send() and buffer.receive(). If this is so, why specify buffer.send() in the method signature since a receive() can only be triggered by a matching send()?
- Pattern bodies exist to do computations on internal messages and then return a value to a sender.
What if, upon pattern completion, rather than execute a function inside the sender’s tasklet, all the internally queued messages were transferred to the sender? What sort of expressiveness would be compromised with this simplification? Could this mechanism still be called a join pattern?
For now, my only real test is to see how many join pattern examples I can replicate with my implementation shortcuts……
The Approach Part II: Expanding stackless.select() to Support Join Patterns
stackless.select()
To understand how join patterns are implemented, it is necessary to understand select…..
select allows a tasklet to wait on multiple channels without resorting to busy polling. select is required due to the blocking nature of synchronous channels. The main difference between Go’s implementation and the Stackless Python prototype is that the former is a language feature and the later is provided via classes. This turns out to have its strengths and weaknesses.
Internally, the Stackless Python prototype’s implementation of stackless.select() comes directly from Plan9’s libthread/channel.c …
This is the select algorithm. Python is a great language for pseudo-code
def select(operations):
"""
The select function searches a list of channel operations
if one or more channels are ready, one is randomly selected
if no channel is ready, the tasklet suspends until a peer
tasklet is ready
returns a channel operation (_chanop)
"""
choice = None
source = getcurrent()
numberReady = 0
"""
check for operations for one or more ready channels
use Pike's one pass algorithm
"""
for operation in operations:
if operation.ready():
numberReady += 1
if nrand(numberReady) == 0:
choice = operation
if choice:
debug("[START OPERATION]")
choice.action()
debug("END OPERATION]")
else:
debug("START SUSPEND PROCESS")
for operation in operations:
debug("entering operation add")
operation.add()
debug("exiting operation add")
schedule_remove()
schedule()
choice = source._operation
source._operation = None
debug("RESUME PROCESS")
debug("exiting select")
return choice
Changes to stackless.py to support stackless.select()
To support select, it was necessary to represent channel operations as classes. Hence the chanop (channel operation) class and the channel.receiveCase() and channel.sendCase() that generate the proper chanop (for send or receive operations). Stackless’ channel object’s queue was modified to be populated by chanops rather than tasklets.
The biggest change was the implementation of the select algorithm. Note that in the select algorithm, it is the chanop via its ready() method that determines whether a tasklet ought to block. A chanop is akin to Dijkstra’s guard (guarded command). In turn, select is a disjunction of guards.
If so, why can’t a guard be composed of guards? For instance if we want to wait for three elf channels to be ready for receiving, why can’t we express this as some variation of :
elf[0].receiveCase().ready() and elf[1].receiveCase().ready() and elf[2].receiveCase().ready()
It turns out we can; huzzah!
aJoinPattern = stackless.joinPattern([list of channels])
Consequently the joinPattern and chanop classes share a common interface: abstractGuard. The joinPattern class has methods such as:
- ready() : is the channel operation ready. That is, is there a matching channel operation on the queue?
- add() : add the channel operation to the appropriate channel queue
- action() : transfer data
- remove() : remove unused channel operations from the reminder of select’s channel queues and remove other data structures.
hint, follow along with the select algorithm
In this fashion, anywhere a chanop is used in the select algorithm, a joinPattern can be substituted. Hence the select algorithm functions unaltered. An added bonus is that one can also support a disjunction of conjunctions. In other words, an orElse allowing for multiple join patterns.
It is interesting to note that in the Plan9 implementation, a channel method is implemented as a select with one operation.
def send(self, msg):
"""
channel.send(value) -- send a value over the channel.
If no other tasklet is already receiving on the channel,
the sender will be blocked. Otherwise, the receiver will
be activated immediately, and the sender is put at the end of
the runnables list.
"""
return select([SendChanop(self, msg)]).value
Consequently, a join pattern can be implemented as a select with one operation albeit a compound operation.
def join(self):
""" wait until all cases are ready """
return select([self])
Join Patterns
My main design goal was to maintain a synchronous style of programming in keeping with Stackless Python. I also wanted to maintain backward compatibility with the Stackless API. In the process I also wanted a “you get only what you pay for policy.” So if one does not want Join patterns, regular code should not suffer a noticeable penalty. Finally, I wanted to localise changes as much as possible. This meant not altering the select algorithm. And the scheduler was off limits as well as removing anything from the stackless.channel() API.
The Santa Claus Problem with Join Patterns
The following is the nucleus of the Santa Claus Problem implemented with an initial version of join patterns. Although properties like patterns exist because of the lack of language support, this version contains much less clutter than the select based version.
def santa(reindeer, elves):
print "in santa"
joinObject = stackless.join().addPattern([ch for _, ch, _ in reindeer]).\
addPattern([ch for _, ch, _ in elves],3)
reindeerPattern, elfPattern = joinObject.patterns
while True:
pattern = joinObject.join()
if reindeerPattern.ready():
reindeerPattern.join()
pattern = reindeerPattern
if pattern is reindeerPattern:
harness(reindeerPattern)
deliveryToys(reindeerPattern)
unharness(reindeerPattern)
elif pattern is elfPattern:
consultWithSanta(elfPattern)
An interesting note. There are two cases that have to be checked. The first case is when both reindeer and elves are ready from the get-go and the select algorithm randomly picks elves and removes the reindeer pattern as a part of the select algorithm’s clean up. The select algorithm has no notion of priority. Consequently the Santa tasklet must check the reindeer pattern to see if indeed there are nine reindeer ready.
The second case is more interesting. Let’s pretend, eight of the reindeer have arrived. However three elves have arrived, completing the Elf pattern. As in the first case, the reindeer pattern is removed and stackless.select() schedules Santa. Again, the algorithm does not give Santa scheduling priority (perhaps it ought to. Easy enough to do with Stackless API). More importantly exiting select() does not mean the eight reindeer are thrown away. Rather the eight reindeer are still sleeping on their channels. Now assume the runnable scheduler queue looks like this:
(ELF[1], ELF[2], RUDOLPH, SANTA)
By the time Santa is the currently executing tasklet, all nine reindeer are ready. However the Santa tasklet will not know this unless it explicitly checks the reindeer pattern.
Got to love concurrent programming….
Two for the Price of One: The Dining Philosopher’s Problem
It is safe to say that implementing the Santa Claus Problem with join patterns was a success. The only change to the existing Stackless Python API was rendering a property, channel.balance irrelevant (internally I moved to double queues). After a “sneak peak” got a bad reception, I concentrated more on the API. I was torn between the more verbose but descriptive style used by Concurrent Visual Basic and the chaining style used by Twisted Deferred. The Twisted style has won for now.
In retrospect, the Santa Claus Problem was not the stress tester. The must-have features were barrier synchronisation and the ability to atomically move the data while still operating in the safety of the “scheduler” (for conceptual simplicity, I include the channel as a part of the scheduler).
The distinction of ball breaker went to the Dining Philosopher’s problem. I tried to write an implementation similar to the one posted in Scalable Join Patterns My initial solution failed. Why? Because I needed asynchrony. It didn’t help that I initially forgot to set the chopsticks! In lieu of an asynchronous send, I launched a tasklet that would block on the synchronous send. Meanwhile the philosopher tasklet in question could go its merry way.
The Dining Philosophers
This is the nucleus of a solution for the Dining Philosopher’s problem with join patterns:
def release(aChannel, name):
def __release():
aChannel.send(name)
stackless.tasklet(__release)()
def thinker(name, thinking, left, right):
print "Philosopher %s left=%s right=%s" % (name, left.label, right.label)
utensils = stackless.join().addPattern([left, right])
while True:
print name, "thinking"
print name, "ready to eat"
result = utensils.join()
for p in result:
print p.value, " relinquished chopstick ", p.channel.label
print name, " finished eating"
release(left, name)
release(right, name)
Originally I had the philosophers wait an arbitrary amount of time before attempting to eat. However I removed the timing to better ensure that the robin robin scheduling produced the pathological case of all philosophers grabbing their left chopstick then right chopstick. Still the solution worked. I exampled the logging to see whether message stealing occurred. Nope. This initially threw me for a loop. I have to admit, I was skeptical about why join patterns work for the Dining Philosophers.
To the best of my knowledge, the following is happening. As per join patterns’ behaviour, the philosophers do not hold the chopsticks unless they can be acquired simultaneously. I suspect the join pattern machinery inside the scheduler takes on the role of butler. However the non-preemptive nature of the scheduler plays a role too. It allows the pattern matching in the active tasklet to check for availability of both chopsticks without producing a context switch. This greatly simplifies matters. Event if there was a context switch, the solution ought to still work since message stealing rules would come into play. Regardless the scheduler is acting like the butler of the Dijkstra solution. The important point is that join pattern’s declarative style allows one to get the butler for free.
Join Patterns, Concurrent ML, Transactional Events and Software Transactional Memory
I corresponded with the authors of “Scalable Join Patterns”: Claudio Russo of Microsoft and Aaron Turon from Northwest University. Claudio and Aaron were kind enough to recommend references and put up with my pestering about why join patterns work in terms of classic deadlock conditions. Two of the major take-aways from the “Scalable Join Patterns” paper is the use of atomic transactions and lock-free data structures. Fine grain locking is the great killer of performance and composability.
Amongst the recommendations were the John Reppy’s papers on Concurrent and Parallel ML and Kevin Donnelly and Matthew Fluet’s Transactional Events. I will admit that my reading of the papers leave much to be desired since I tend to skim over the formal definitions and fixate on parts that have a family resemblance to what I am doing.
Like Stackless Python, Concurrent ML uses synchronous channels. Concurrent ML’s goal is to create a model that separations the description of synchronization mechanisms from their operation. Concurrent ML sports a small number of operations (i.e., spawn) and primitives (i.e, channels, events). Events (receiveEvt, sendEvt) in Concurent ML are very much like sendCase/sendChanop, receiveCase/receiveChanop in my prototypes. Combinators such as guard, choose, wrap and sync (synchronises processes) are used to create more complex concurrency constructs.
The problem Concurrent ML is trying to address is that of composability. Composability involves the scalability of concurrent constructs. Issues surrounding composability appear in the Stackless section of PyPy literature.
Transactional events builds up from the Concurrent ML model. I won’t get into Transactional Events because I currently do not understand the nuances between it and Concurrent ML well enough to explain them coherently. Like Concurrent ML, transactional events aim to allow more flexible code modularisation (with an emphasis on protocol design). Like software transactional memory, atomic transactions are used in lieu of fine grain locking. And if I understand the literature and code examples, it easier to express thread interactions with Concurrent ML and transactional events than software transactional memory.
I will only truly know once I start writing examples with STM…..
Takeaways
I feel some important takeaways are:
- There are languages with concurrency models similar enough to Stackless Python. Moreover these more mature systems have tackled problems that Stackless Pythonistas have already encountered and have come up with questionable solutions (i.e., lock convoys). There is a lot to learn. Why re-invent the wheel for want of reading a research paper?
- There are numerous language implementations out there. It is important to play with them. This is a cheap way of learning what can and can’t work with Stackless Python (or Python)
- stackless.py and PyPy ought to play a bigger role in prototyping new solutions for Stackless Python (if not, Python). As was shown in “Prototyping Go’s Select,” the difference between prototying a feature in Python and C was ~150 lines to ~1000 lines, not to mention the difference in time and skill.
Status of stackless.py with Join Patterns
There is still much work to be done on the stackless_V3.py module. For now, I will move my work into GitHub and give access to whomever wishes to look at it. When things are sufficiently stabilised, I will move a version into the Stackless Repository. In the immediate future, this is my TODO:
- Write tests for join patterns that share common channels.
- Remove of internal counting for M out of N joins (implementation detail)
- Automating tests.
- Implementation of channel preferences for join patterns.
- Addition of asynchronous buffers
- Refinements of internal channel operations (implementation detail)
- Support for greenlets and continulets
Conclusions
It may be questionable what I implemented is indeed join patterns. Perhaps message passing along with barrier synchronisation with atomicity is good enough to cost effectively solve a wide variety of concurrency problems? Nevertheless, I have been able to implement two notorious concurrency problems, the Santa Claus Problem being one of them. I was able to do this by modifying the existing stackless.py code base. stackless.py accompanied by Python provides a surprisingly rich toolbox for experimentation.
It is possible to write a faithful implementation of join patterns (I have ideas). However I don’t know if the gains would be worth the effort (why create machinery to create a synchronous channel when Stackless already has them?). If anything, I believe more bang for the buck can be gained from exploring concepts featured in Concurrent ML.
stackless.py
As Stackless features are better integrated with the JIT, this will open up new opportunities. I believe if performance is not an issue, then a designer stackless.py rather than C-extension to Stackless Python may be the answer. If one has a need for join patterns or some other esoteric concurrency feature, then a custom tailored stackless.py becomes entirely feasible. If one does not want these features, then use stock stackless.py (or Stackless Python). At the end of the day, Stackless Python is not so much an application but a family of related products and technologies that can be tailored for different purposes. Otherwise how does one explain the success of gEvent and other greenlet based solutions?
For myself, I can see join patterns and some of the transactional event features simplifying the construction of SOA (Service Oriented Architecture) based applications. It was the desire to prototyping WS-BPEL processors that got me involved with Stackless Python.
Final Thoughts
Once again, I found it a fulfilling experience to implement join patterns for Stackless Python. Whether join patterns get into Stackless Python is another story. I can’t describe the thrill of reading a paper and saying “yeah I think I can implement that!” and having a language that allows one to fearless diving in. Okay, I exaggerate: one takes a deep breath first….
Seasons Greetings,
Andrew