This chapter explains how 'compose' supports program decomposition using generators. It also clarifies how 'compose' extends yield-from (PEP 380).
Consider this generator that reads from and writes to a socket using yield:
def f():
request = yield None (1)
yield 'HTTP/1.0 200 Ok' (2)
yield ... (3)
It reads an HTTP request (1), generates an HTTP response (2) and sends a body (3). When these steps get more complicated, one would like to extract parts of it into sub-generators and write something like:
def f():
request = yield readRequest() (1)
yield sendResponse(200, 'Ok') (2)
yield sendFile(...) (3)
The little program is decomposed into 'readRequest', 'sendResponse' and 'sendFile' and combined again by 'calling' them using 'yield'. PEP 380 suggests to use 'yield from' here:
def f():
request = yield from readRequest() (1)
yield from sendResponse(200, 'Ok') (2)
yield from sendFile(...) (3)
This PEP will probably not be implemented before Python 3.3, so in the mean time, we use 'compose'
Compose is a simple decorator for a generator that does what PEP 380 suggests. And a little bit more.
Consider the folling code:
def one():
yield 'Hello!'
def two():
yield one()
The intention of 'two' is to delegate part of the work to 'one', or, to "call" it. PEP 380 suggest to write 'two' like:
def two():
yield from one()
Weightless does this as follows:
@compose
def two():
yield one()
Alternatively, one can also omit the decorator and wrap the generator. Both situation are supported by 'compose':
g = compose(two())
Suppose we have this code which calls 'readRequest' which returns 'request':
request = yield readRequest()
Normal generators do not support return values, so readRequest uses StopIteration to return data to the caller:
raise StopIteration('return value')
PEP 380 discusses this, and also handles the debate whether this is a good solution or not.
Although it looks natural, catching exceptions like below is not possible without additional magic:
try:
request = yield readRequest()
except:
handle error
Compose, like 'yield from', fixes this, so it is possible to catch exceptions raised in subgenerators.
Suppose the following decomposition of an example program into three generators (line numbers included for clarity):
26 def a():
27 yield b()
28 def b():
29 yield c()
30 def c():
31 yield 'a'
32 raise Exception('b')
33
34 list(compose(a()))
When line 34 executes, you would like to see a traceback like:
Traceback (most recent call last):
File "fixtraceback.py", line 34, in
list(compose(a()))
File "fixtraceback.py", line 27, in a
yield b()
File "fixtraceback.py", line 29, in b
yield c()
File "fixtraceback.py", line 32, in c
raise Exception('b')
Exception: b
But without special magic, you will not see 'a' and 'b' in the traceback since they are not on the VM's call stack. So instead, you will see:
Traceback (most recent call last):
File "fixtraceback.py", line 34, in
list(compose(a()))
File "fixtraceback.py", line 32, in c
raise Exception('b')
Exception: b
Compose includes a fix for this if and when the tbtools package is available. This C-extension allows modification of Python traceback objects. It is available here: Sources from Armin Ronacher. Binary packages are available from CQ2: tbtools for Debian and tbtools for Redhat.
When using decomposed generators as a pipeline (see background on JSP), boundary clashes appear because, for example, TCP network messages do not correspond to HTTP chunks and those do not correspond to, say, your XML records.
JSP describes how to deal with boundary clashes in a structured way using lookahead. A lookahead in Weightless naturally corresponds to performing an additional 'yield' to get the next input token. However, there must be a way to push back this token when it belongs to the next record.
The coroutine below reads a stream with records. A single record is read by readRecord():
def readAllRecords():
while ...:
record = yield readRecord() (1)
Each record begins with the token STARTRECORD and runs until the next STARTRECORD. Here is what readRecord() looks like (note that readRecord will be invoked over and over again):
def readRecord():
record = yield #read first token: STARTRECORD
while True: #then read until next STARTRECORD
token = yield
if token == STARTRECORD:
raise StopIteration(record, token) #return record and push back token
record += token
The lookahead looks at the next token. If that belongs to the next record it is pushed back and the generator returns the record.
Technically, there is no difference between the return value and push back. The return value is also pushed back into the input stream. It will then be read by the next yield, which happens immediatly after readRecord returns at (1). In fact, there can be an arbitrary number of tokens to be pushed back:
raise StopIteration(retval, pushback, ..., pushback)
My take on the debate on this: since 'return' in a generator is an implicit 'raise StopIteration' I think it is natural to map 'return value' to 'raise StopIteration(value)' in a generator. Since this also yields a natural, clean implementation, I think it is a good solution.
Suppose we have a simple looping generator that reads requests and writes out a response in two parts:
def responder():
while True:
request = yield (1)
yield response_part_1 (2)
yield response_part_2 (3)
Now suppose that at (2) or (3) the yield expression returns a new request. The code above does not accept a new request until is gets at (1) again. It must check for new requests and queue them:
def responder():
requests = []
while True:
request = requests.pop() if requests else (yield)
msg = yield response_part_1
if msg:
requests.append(msg)
msg = yield response_part_2
if msg:
requests.append(msg)
This adds so much checking code that it makes generator pipelines unfeasible. Therefor, compose supports flow control by means of None messages:
message = yield None # yield None means: I want data
#or
message = yield # yields None implicitly
The agreement is to send data if and when the generator has yielded None. This avoids al the checking as introduced above.
The special meaning of None as its counter part at the other side:
generator.send(None) # send(None) means: I want data
#or
generator.next() # sends None implicitly
I hesitated for more than a year to introduce this special treatment of None for flow control, but it has been a turning point in the development of Weightless. After this, it was finally possible to decompose a program into generators in a natural, easy to understand and consistent way.
Compose is finalized and stable. It is possible to develop large programs with it. It supports developement by giving clear error messages that indicate the proper failure, despite the controll flowof the program being sort of inversed by the abundant use of generators.
The idea of composing generators is formalized in Python Enhancement Proposal: PEP-380. Compose is compatible with this PEP.