This chapter explains how compose supports program decomposition using generators. It also clarifies how compose extends yield-from (PEP 380).
Consider this simplified generator that reads from and writes to a socket using yield:
def f(): request = yield (1) yield 'HTTP/1.0 200 Ok' (2) yield ... (3)
It reads an HTTP request (1), generates an HTTP response (2) and sends a body (3). When these steps get more complicated, one would like to extract parts of it into sub-generators and write something like:
def f(): request = yield readRequest() (1) yield sendResponse(200, 'Ok') (2) yield sendFile(...) (3)
The little program is decomposed into readRequest, sendResponse and sendFile and combined again by 'calling' them using 'yield'. PEP 380 suggests to use 'yield from' here:
def f(): request = yield from readRequest() (1) yield from sendResponse(200, 'Ok') (2) yield from sendFile(...) (3)
PEP380 will probably not be implemented before Python 3.3, so in the mean time, we use compose. Also, PEP380 is not sufficient for decomposing programs into generators. Two more things are needed, see Additional Functionality below. These are in compose as well.
Compose is a simple decorator for a generator that does what PEP 380 suggests. And a little bit more.
Consider the following code:
def one(): yield 'Hello!' def two(): yield one()
The intention of two is to delegate part of the work to one, or, to 'call' it. PEP 380 suggest to write this like:
def two(): yield from one()
Weightless does this as follows:
@compose def two(): yield one()
Alternatively, one can also omit the decorator and wrap the generator. Both situation are supported by compose:
g = compose(two())
Suppose we have code which calls readRequest and catch the return value in request:
request = yield readRequest()
Normal generators do not support return values, so readRequest uses StopIteration to return data to the caller as follows:
raise StopIteration('return value')
PEP 380 discusses this, and also handles the debate whether this is a good solution or not.
Although it looks natural to write, normally the code below does not catch exceptions thrown by readRequest:
try: request = yield readRequest() except: handle error
With compose however, exceptions thrown by generators can be catched like this.
Suppose the following decomposition of a program into three generators (see weightless/examples/fixtraceback.py):
26 def a(): 27 yield b() 28 def b(): 29 yield c() 30 def c(): 31 yield 'a' 32 raise Exception('b') 33 34 list(compose(a()))
When line 34 executes, you would like to see a traceback like:
Traceback (most recent call last): File "fixtraceback.py", line 34, inlist(compose(a())) File "fixtraceback.py", line 27, in a yield b() File "fixtraceback.py", line 29, in b yield c() File "fixtraceback.py", line 32, in c raise Exception('b') Exception: b
But without special measures, you will only see:
Traceback (most recent call last): File "fixtraceback.py", line 34, inlist(compose(a())) File "fixtraceback.py", line 32, in c raise Exception('b') Exception: b
Since this makes programming with generators next to impossible, compose maintains a stack of generators and, during an exception, it adds this stack to the traceback.
When using decomposed generators as a pipeline (see background on JSP), boundary clashes appear because, for example, TCP network messages do not correspond to HTTP chunks and those do not correspond to, say, your XML records.
JSP describes how to deal with boundary clashes in a structured way using lookahead. A lookahead in Weightless naturally corresponds to performing an additional yield to get the next input token. However, there must be a way to push back (part of) this token when it belongs to the next record.
The co-routine below reads a stream with records. A single record is read by readRecord:
def readAllRecords(): while ...: record = yield readRecord() (1)
Each record begins with the token STARTRECORD and runs until the next STARTRECORD. Here is what readRecord looks like (note that readRecord will be invoked over and over again):
def readRecord(): record = yield while True: token = yield (2) if token == STARTRECORD: raise StopIteration(record, token) (3) record += token
The look ahead takes place at (2). When the next value is the beginning of the next record, it returns the completed record (3) to (1). At the same it time pushes back token (2) so it will be read by the next yield (1) again.
PEP380 does not provide look-ahead functionality.
Technically (and most interestingly), there is no difference between the return value and push back. The return value is also just pushed back into the input stream. It will then be read by the next yield, which happens immediately after readRecord returns at (1). In fact, there can be an arbitrary number of tokens to be pushed back:
raise StopIteration(retval, token0, ..., tokenn)
This will simply push back all values in reverse order: <tokenn, ..., token0, retval>.
Suppose we have a simple consumer that reads requests (1) and writes out a response in n parts (2, 3):
def consumer(): while True: request = yield (1) for i in range(n): (2) yield(3)
A producer is supposed to first send a request and then read the response until... what? It would have to know
def responder(): requests = [] while True: if requests: request = requests.pop() else: request = yield (1) for i in range(n): (2) msg = yieldif msg: requests.append(msg)
This adds so much checking code that it makes decomposing programs into co-routines unfeasible. Therefor, compose supports flow control by means of The None Protocol.
Recall that whenever we want to accept data and we write:
message = yield
the Python VM interprets this as:
message = yield None
Similarly, when we want data and call next:
response = generator.next()
the Python VM interprets this as:
response = generator.send(None)There seems to be an implicit meaning of None in this case. Compose makes this explicit and wields the rule: None means 'I want data'.
This means that every co-routine must must obey it, and compose checks it.
The None Protocol turned out to be essential to make compose practical, natural, intuitive, easy to understand and consistent. Compose was just a nice intellectual exercise and until — after a year of remorse — I added the None Protocol.
Generetor Local is the equivalent of Thread Local variables in generator land. Weightless provides a function local() that gets locals from the stack (or backtrace, if you don't have a stack).
This subject goes back to an old discussion about how to use yield. Most people I met (and also most of the competitors mention in 'Related Work') propose this for communicating with a socket:
response = yield <command> # general form message = yield socket.read() yield socket.write("response")
While I propose this:
message = yield yield "response"
In the first case, the co-routine is communicating with some sort of scheduler that executes commands, in the second case, the co-routine is communication with nobody in particular. I prefer the latter because:
yield scheduler.sleep(1) yield scheduler.wait(lock)
But how to do this in the second proposal? The answer is the Side Kick.
Recall that the co-routine is already communicating with something that produces and consumes data. The side kick enables the co-routine to communicate with a second entity. An example:
@compose(scheduler) def coroutine(): messages = yield yield scheduler.wait(1) yield "hello " + message
to be continued...
Compose is finalized and stable. It has a Python implementation and a C extension for better speed. It is feasible to develop develop large programs with it. See the examples.
The idea of composing generators is formalized in Python Enhancement Proposal: PEP-380. Compose is intended compatible with this PEP, although it extends it as explained above.