Program Decomposition using Co-routines

This chapter explains how compose supports program decomposition using generators. It also clarifies how compose extends yield-from (PEP 380).

Decomposition

Consider this simplified generator that reads from and writes to a socket using yield:

            def f():
                request = yield                 (1)
                yield 'HTTP/1.0 200 Ok'         (2)
                yield ...                       (3)

It reads an HTTP request (1), generates an HTTP response (2) and sends a body (3). When these steps get more complicated, one would like to extract parts of it into sub-generators and write something like:

            def f():
               request = yield readRequest()    (1)
               yield sendResponse(200, 'Ok')    (2)
               yield sendFile(...)              (3)

The little program is decomposed into readRequest, sendResponse and sendFile and combined again by 'calling' them using 'yield'. PEP 380 suggests to use 'yield from' here:

            def f():
                request = yield from readRequest()   (1)
                yield from sendResponse(200, 'Ok')   (2)
                yield from sendFile(...)             (3)

PEP380 will probably not be implemented before Python 3.3, so in the mean time, we use compose. Also, PEP380 is not sufficient for decomposing programs into generators. Two more things are needed, see Additional Functionality below. These are in compose as well.

Compose

Compose is a simple decorator for a generator that does what PEP 380 suggests. And a little bit more.

Basic Functionality (yield from)

'Calling' a subgenerator

Consider the following code:

            def one():
                yield 'Hello!'
            def two():
                yield one()

The intention of two is to delegate part of the work to one, or, to 'call' it. PEP 380 suggest to write this like:

            def two():
                yield from one()

Weightless does this as follows:

            @compose
            def two():
                yield one()

Alternatively, one can also omit the decorator and wrap the generator. Both situation are supported by compose:

            g = compose(two())

Returning values

Suppose we have code which calls readRequest and catch the return value in request:

            request = yield readRequest()

Normal generators do not support return values, so readRequest uses StopIteration to return data to the caller as follows:

            raise StopIteration('return value')

PEP 380 discusses this, and also handles the debate whether this is a good solution or not.

Catching exceptions

Although it looks natural to write, normally the code below does not catch exceptions thrown by readRequest:

            try:
                request = yield readRequest()
            except:
                handle error

With compose however, exceptions thrown by generators can be catched like this.

Additional Functionality (beyond yield from)

Fixing tracebacks

Suppose the following decomposition of a program into three generators (see weightless/examples/fixtraceback.py):

26          def a():
27              yield b()
28          def b():
29              yield c()
30          def c():
31              yield 'a'
32              raise Exception('b')
33
34          list(compose(a()))

When line 34 executes, you would like to see a traceback like:

            Traceback (most recent call last):
              File "fixtraceback.py", line 34, in 
                list(compose(a()))
              File "fixtraceback.py", line 27, in a
                yield b()
              File "fixtraceback.py", line 29, in b
                yield c()
              File "fixtraceback.py", line 32, in c
                raise Exception('b')
            Exception: b

But without special measures, you will only see:

            Traceback (most recent call last):
              File "fixtraceback.py", line 34, in 
                list(compose(a()))
              File "fixtraceback.py", line 32, in c
                raise Exception('b')
            Exception: b

Since this makes programming with generators next to impossible, compose maintains a stack of generators and, during an exception, it adds this stack to the traceback.

Push-back data

When using decomposed generators as a pipeline (see background on JSP), boundary clashes appear because, for example, TCP network messages do not correspond to HTTP chunks and those do not correspond to, say, your XML records.

JSP describes how to deal with boundary clashes in a structured way using lookahead. A lookahead in Weightless naturally corresponds to performing an additional yield to get the next input token. However, there must be a way to push back (part of) this token when it belongs to the next record.

The co-routine below reads a stream with records. A single record is read by readRecord:

            def readAllRecords():
                while ...:
                    record = yield readRecord()               (1)

Each record begins with the token STARTRECORD and runs until the next STARTRECORD. Here is what readRecord looks like (note that readRecord will be invoked over and over again):

            def readRecord():
                record = yield
                while True:
                    token = yield                             (2)
                    if token == STARTRECORD:
                        raise StopIteration(record, token)    (3)
                    record += token

The look ahead takes place at (2). When the next value is the beginning of the next record, it returns the completed record (3) to (1). At the same it time pushes back token (2) so it will be read by the next yield (1) again.

PEP380 does not provide look-ahead functionality.

Technically (and most interestingly), there is no difference between the return value and push back. The return value is also just pushed back into the input stream. It will then be read by the next yield, which happens immediately after readRecord returns at (1). In fact, there can be an arbitrary number of tokens to be pushed back:

            raise StopIteration(retval, token₀, ..., token_n)

This will simply push back all values in reverse order: <token_n, ..., token₀, retval>.

Flow Control

Suppose we have a simple consumer that reads requests (1) and writes out a response in n parts (2, 3):

            def consumer():
                while True:
                    request = yield               (1)
                    for i in range(n):            (2)
                        yield    (3)

A producer is supposed to first send a request and then read the response until... what? It would have to know n! Now suppose we allow the producer to send a new request while the consumer is not at (1) yet. We would have to write ugly code like this:

            def responder():
                requests = []
                while True:
                    if requests:
                        request = requests.pop()
                    else:
                        request = yield                 (1)
                    for i in range(n):                  (2)
                        msg = yield 
                        if msg:
                            requests.append(msg)

This adds so much checking code that it makes decomposing programs into co-routines unfeasible. Therefor, compose supports flow control by means of The None Protocol.

The None Protocol

Recall that whenever we want to accept data and we write:

            message = yield

the Python VM interprets this as:

            message = yield None

Similarly, when we want data and call next:

            response = generator.next()

the Python VM interprets this as:

            response = generator.send(None)

There seems to be an implicit meaning of None in this case. Compose makes this explicit and wields the rule: None means 'I want data'.

This means that every co-routine must must obey it, and compose checks it.

The None Protocol turned out to be essential to make compose practical, natural, intuitive, easy to understand and consistent. Compose was just a nice intellectual exercise and until — after a year of remorse — I added the None Protocol.

Generator Local

Generetor Local is the equivalent of Thread Local variables in generator land. Weightless provides a function local() that gets locals from the stack (or backtrace, if you don't have a stack).

Side Kick

This subject goes back to an old discussion about how to use yield. Most people I met (and also most of the competitors mention in 'Related Work') propose this for communicating with a socket:

             response = yield <command>          # general form
             message = yield socket.read()
             yield socket.write("response")

While I propose this:

             message = yield
             yield "response"

In the first case, the co-routine is communicating with some sort of scheduler that executes commands, in the second case, the co-routine is communication with nobody in particular. I prefer the latter because:

It separates protocol from transport.
It is better testable.

In both proposals, is is possible to synchronize the execution of the co-routine to read and write events on sockets. In the first proposal, it is straightforward to synchronize on other events like timers, or locks:

             yield scheduler.sleep(1)
             yield scheduler.wait(lock)

But how to do this in the second proposal? The answer is the Side Kick.

Recall that the co-routine is already communicating with something that produces and consumes data. The side kick enables the co-routine to communicate with a second entity. An example:

             @compose(scheduler)
             def coroutine():
                 messages = yield
                 yield scheduler.wait(1)
                 yield "hello " + message

to be continued...

Status

Compose is finalized and stable. It has a Python implementation and a C extension for better speed. It is feasible to develop develop large programs with it. See the examples.

The idea of composing generators is formalized in Python Enhancement Proposal: PEP-380. Compose is intended compatible with this PEP, although it extends it as explained above.

By Seecr