Archive

Archive for the ‘Software Design’ Category

Writing Framework-less Python WSGI Applications for Fun and Profit

July 22nd, 2010 jeremy No comments

After writing WSGI applications over the past year or so, I’ve decided that the best general strategy to approach application design is to think of each request with the following steps:

Logical organization of WSGI middleware

Every block of the diagram is a WSGI application (middleware) save the adapter and request handler. Here is an overview of each block:

Common Middleware

Common Middleware applies to all requests across the entire site. This may be none or many. For intranet systems, common middleware includes session management (beaker), authentication/login screen, and user tracking. For a public site, common middleware could be gzip compression or localization negotiation. Nearly all sites will have some exception handling middleware that emails you when something AWFUL happens (Ooops!).

This is also a common place to put middleware that handles specific exceptions that triggers actions. This style of programming may be controversial in some circles, but no one can deny the practical benefit of being able to raise HTTPRedirect(’/login’) from any where and have it send the user to the given screen. I could actually write a whole post on designing site-wide middleware. Some day I just might!

Dispatcher

The purpose of the dispatcher is to examine the incoming request and invoke another WSGI application to handle the request. There are quite a few existing dispatching modules floating around the net, and I’ve written a few myself. I’ll explore them later, but I want to make one point: routes is not a dispatcher. Routes would be common middleware that sets an environ variable (routing_args) that would be used by a dispatcher to find a handler. The publicly-available dispatcher I’ve had the most experience with is Ian Bicking’s paste.urlparser. I’ve used it to good effect.

Object-request brokers (ORB’s) also fall into this category. I’ll just say that the ORB should invoke a method on an object with a WSGI interface. Everything should be WSGI all the way until the very final step.

Application Middleware

Now, for some requests, the WSI application that is invoked by the dispatcher is the end of the road. It handles the request and that is that. However, one is missing out on many time saving (and code saving) layers that can be built here.

The most common application middleware I’ve used is secondary dispatching middleware. Let’s say that you’re doing a good old HTML form with submit handler. The best thing to do here is to use an application-level dispatcher that works on HTTP Method. I have a module with 3 WSI applications. The site dispatcher sends the request to the “main” WSGI application. This WSGI application then sends the request to one of the other 2 dispatchers depending on the REQUEST_METHOD. Something like this:

application = MethodDispatch()
 
@application
def get(start_response, environ):
     pass
 
@application
def post(start_response, environ):
     pass

There are countless other ways of deciding different applications to run, but this is just to wet the appetite. I know their are “everything and the kitchen sink” dispatching systems that could take care of this a level up, but I like the locality of this approach. Like all the other categories, there is a lot more I can say about this block. Another time!

Request-specific Middleware

Request middleware applies to a single request made by the browser. I have found that decorators are a great way to introduce middleware at the request level. It’s hard to talk about request-specific middleware without also talking about adapters, so I’ll move on.

Adapters

An adapter breaks the chain from WSGI to another interface. WSGI is great as a standard, but trying to write your application logic using it as an interface is not very fun. Adapters are right at the top of ways I increase my productivity writing these web applications. The idea is simple: What kind of interface works best for this particular kind of request?

If the request is for JSON data. It might be best to have the procedure return a data structure that is turned into JSON and sent back to the browser. If a mako template gets used to render a HTML page, maybe a good interface would be to have a mako object passed into the request handler so that it can populate the template with data. When the request ends, the mako template is automatically rendered.

These adapters should not be generalized solutions to fit everyone’s potential use cases. They are very easy to write yourself. If you have a few ways mako templates can be handled, have a few different mako adapters. In the past, I’ve tried to write complex, catch-all type adapters. I’ve learned that simple, application-specific adapters are definitely the way to go.

Application-specific middleware can be used in tandem with adapters to augment the adapters functionality. The middleware can do something before or after the adapter (and handler) runs, so you can customize to your hearts content. As always though KISS!

To give an example of adapters and application middleware in action, here is a piece of real code:

import app.wsgilib as W
reg = W.PathDispatch()
application = reg.get_wsgi_app()
 
@reg.default
@W.mako('ppayment.tmpl')
def main(req, res):
    pass
 
@reg
@W.json
def trans_search(req, res):
    def eq(key, field, values):
        return ["%s = %%s" % field], [values[key]]
 
    def ilike(key, field, values):
        return ["%s ILIKE %%s" % field], [values[key]]
    # Tons of SQL/database code snipped
    res['sql'] = sql
    res['transactions'] = [dict(c) for c in cursor]

here W.json and W.mako adapt the WSGI interface into a more application-programmer friendly interface. I’m creating webob request objects and custom derived webob response objects as arguments for the request handler. PathDispatch is just a simple dispatcher that selects a registered method based on what’s in PATH_INFO.

When I say simple, I mean “SIMPLE”

class PathDispatch(dict):
    def __call__(self, func):
        self['/%s' % func.__name__] = func
        return func
 
    def default(self, func):
        self[None] = func
 
    def lookup(self, path_info):
        try:
            return self[path_info]
        except KeyError:
            return self[None]
 
    def get_wsgi_app(self):
        return wsgi(self.app)
 
    def app(self, req, res):
        proc = self.lookup(req.path_info)
        return proc(req, res)

The simplicity of this utility code is key. It should be easy and fun to hack on, transparent and not require a huge cognitive load to understand.

I’ve only scratched the surface. I hope to explore each level in greater detail in future posts!

Categories: Python, Software Design, WSGI Tags:

Abandoned Python Projects

May 17th, 2010 jeremy No comments

I don’t know about you, but I’ve abandoned many, if not dozens, of projects over the years. I get a cute little idea, implement it as a library (maybe?). I might even use it on a real project. At some point, however, I decide that the idea wasn’t that great after all.

I’ve also spent a great deal of time reimplementing the same type of code before. My prime example is HTML form libraries. They’re so very easy to start writing, but there comes a point when the codes collapses on top of itself due to its complexity or inflexibility. I must’ve written 5 server-side HTML form generation libraries at this point.

I would like to believe that I’ve learned some things with all of these attempts. So about a year ago, I started to catalog all of the dead projects and libraries that I have worked on or conceptualized over the years. I just wanted to capture the fact that I’d already been down that road.

Most of these projects were attempts at solving general, classes of problems. If I’ve learned one thing, it’s that general solutions are very hard to design.

Eventually, the abandoned projects catalog project found its way on the abandoned projects pile (I do love recursion don’t ya know).

I ran across the file in my Someday directory and thought that some of these might be interesting to share. Or maybe someone knows of completed libraries in production that do these things!

The Block Parser

The idea behind block parse was to create a way to specify a lot of different
kinds of artifacts in a file, have different parsers for each type of artifact
and have these things turned into Python objects.

Each artifact in the file would be represented as a text block, the type of the
object on the first line (with possibly some preamble or header information)
and then the body of the artifact would be indented under the header.

The main motivation was to create some kind of templating schema for SQL
because embedding SQL in code is so ugly. I wanted to be able to create my own
simple syntaxes and have them load into Python.

I have come to realize that it’s just easier to use a Python class with a
specific interface and pass off instances of that class to a loader. If syntax
really is prohibitive, additional files can be provided to the loader to create
the runtime object, along with Python objects.

Overlay Config Parser

This was an attempt at reimplementing apache’s hierarchical configuration
system. In apache, configuration inherits from parent directories and so
the total configuration of a particular file system location is an
aggregate of all configurations up the hierarchy. I really like how this
configuration system works. Configuration variables are inherited from parent
objects in the hierarchy, and can be overridden or augmented.

It turned out that it would be a lot of work to write a general configuration
system based on these rules. It’s easiest to assume that the hierarchy is a
file system, or at least formatted like a file system path
(/parent/child/grandchild).

This project might have some merit, but as of right now I can’t justify the
time it would take to write it. I don’t really have a use for it as of now
anyway.

SQL Pipe

SQL pipe was the result of playing with an idea I had on how to organize code.
This organization was called seed/step. Seed/Step worked like this:

An object would be constructed which would serve as the seed. This object would
have a lot of members that were blank and needed to be populated.

Step objects would be applied to the seed, which would populate the seed with
objects. I can’t exactly remember how it worked, but there were some
interesting ideas there.

The result was a linearlized sequence of commands that built a complex
structure, like a SQL query.

Ultimately, it turned out to be a very complicated builder pattern. The calling
sequences concerning how the objects were put together were very complex. The
library failed to separate the interface from the implementation, so a lot of
strange syntactic sugar had to be introduced for the code to look passable.

For example, here is code to create a select query:

select_pipe([
    select.l((self.table, "po_id", "po_id")),
    select.l((self.table, "po_number", "PO_No")),
    select.l(("orders", "dt_number", "DT_No")),
    select.l((self.table, "phone", "Phone")),
    select.l((self.table, "status", "Status")),
    expr.l(" DATE_FORMAT(purchase_order.date, 'YYYYMMDD') as Date"),
    expr.l(" CONCAT(employee.last_name,', ',employee.first_name) as Employee"),
    select.l(("vendor", "company", "Vendor"))])(self.baseSeed())

The baseSeed was a seed that had a lot of other information added to it. This
expression only specified what columns to return. The baseSeed() provided the
actual tables, join structures and conditionals.

As I’ve said, a simple builder would’ve worked better.

The Dicer

The dicer was a fun little project that build a minilanguage to query complex
Python data structures. For example: lists of dicts, whose values had lists. An
example unit test:

data = {"foo" : [{"bar" : 1},
                 {"bar" : 2},
                 {"bar" : 3}],
        "spam" : [
             {"ham" : {
                 "eggs" : "monty!"
             }}],
        "neo" : "5"
       }
 
assert dicer("foo[:].bar")(data) == [1, 2, 3]
assert dicer("foo.bar")(data) == [1, 2, 3]
assert dicer("foo[2].bar")(data) == 3
assert dicer("spam.ham.eggs")(data) == ["monty!"]
assert dicer("neo")(data) == "5"

Looking back, the dicer really does seem like a neat utility with a very
focused purpose and clean interface. I believe it was originally written to
parse out complex configuration data. I should really remember to try and use
the dicer some more.

One thing I don’t like about the dicer is its use of lex and yacc packages,
which create parsetab.py files everywhere. This would have to be retooled if I
wanted to release the dicer to the public.

The Stamper

The stamper was an exercise in writing a validation library that had basic
logic support. I believe it was inspired from some examples in SICP.

For example, the stamper has a predicate expression when:

r = when(have('first_name'), have('last_name'))
assert r({'first_name' : 'a', 'last_name' : 'b'}) == True

Here, ‘r’ becomes the stamper and the dict literal is the record which is
tested against. There were also stamps that allowed for custom procedures to be
plugged in and used, like match, which would take a procedure and return the
result of applying that procedure to the value for the field.

def is_phone_number_alt1(str):
    return re.match(r'^\(\d{3}\) \d{3}-\d{4}$', str)
 
rec = {
    'first_name' : 'John',
    'last_name' : 'Smith',
    'phone_number' : '444-444-4444',
    'cell_phone' : '333-333-3333'
}
 
assert match(is_phone_number, 'phone_number')(rec) == True

The checking could get decently complex:

rule = when(all(have('length'), have('width')),
                check(lambda x, y: x == y, ['length', 'width']))
assert rule({'length' : '2', 'width' : '2'}) == True
assert rule({'length' : '2', 'width' : '1'}) == False
assert rule({'length' : '1', 'width' : '2'}) == False

This rule assures that the record has a square when dimensions are provided.

You know, the stamper is pretty powerful, but one thing it’s lacking is any
notion of types. I believe the stamper is type agnostic. It would need to have
much better error reporting capabilities than returning True or False for it to
be useful in a real system.

I also wonder what the pay off is of going through the trouble of building this
functional representation of these rules. It would seem that the stamper should
really be a middle man between some declarative data of rules and the
processing of those rules. In this event, why have all the syntactic sugar?

The stamper might be an example of me trying to turn Python into Scheme.

And I’ve saved the best for last:

Casrel

Casrel was an exploration in designing data files. Casrel data is short for
CAScading RELation. Each data file consists of a set of data definitions,
grouped as properties on objects. The layout was inspired by CSS. For example,
an account table definition pulled from a database would be:

account
    schema-name: public
    relation-type: table
    read-only: no

account.account_name
    label: account_name
    sql-domain: varchar
    type: varchar
    read-only: no
    sql-required: no
    required: no
    maximum-length: 255

account.label
    label: label
    sql-domain: varchar
    type: varchar
    read-only: no
    sql-required: yes
    required: yes
    maximum-length: 255

account.owner_name
    label: owner_name
    sql-domain: varchar
    type: varchar
    read-only: no
    sql-required: no
    required: no
    maximum-length: 255

account.account_type
    label: account_type
    sql-domain: account_type
    type: account_type
    read-only: no
    sql-required: no
    required: no

The object specifier consists of a series of dotted properties, forming a
hierarchy to a parent object. account.account_name describes properties on the
account_name object, which is a member of the account object.

One interesting property of this file format is that it effectively builds a
tree of data without the multi-level indentation or curly-brace structure that
is common in other file formats. The hierarchy is implicit in how the object
labels are written.

Casrel was designed to define metadata for applications to use to create forms,
SQL data definitions and other information. It’s purpose was to serve as a
central point of truth for schema data.

The cascading nature of casrel was that it supported merging the attribute
values on objects from multiple definitions (the same way CSS works).

This could allow specific properties to be overridden and object hierarchies to
be piece-meal, depending on the situation.

Take, for example, a customer table definition that is created by inspecting a
SQL table.

customer.first_name
    label: first_name
    type: varchar
    maximum-length: 255

On a particular form, you want to show a field for the first_name, you can
override the label attribute to be more user friendly, and also add in another
attribute:

customer.first_name
    label: First Name
    help: The common name of the person

This merges to form:

customer.first_name
    label: First Name
    type: varchar
    maximum-length: 255
    help: The common name of the person

In the source code, this operation is called a “join” though I think that merge
would be a better term to use.

There are many more ideas that could be incorporated into the data format,
wildcards or regular expression matching for example:

# Set the default type on customer fields to be varchar
customer.*
    type: varchar
    maximum-length: 255

or more xpath-like expressions:

# Set the maximum-length on customer fields with the type attribute of varchar
customer.[type=varchar]
    maximum-length: 25

Using wildcards, regular expressions of xpath-esque expressions, you could
build collections of attributes that could be bundled (and then applied by
using a special matching attribute that had no bearing on the real data
definition.) Would have to come up with a jargon term for this attribute. Maybe
a class (since it’s function is the same as using a css class to define a
common set of attributes for use on specific objects.)

# a custom type to be used for fields
*.[x-type=zipcode]
    type: varchar
    maximum-length: 10
    match-expression: \d{5}(?:-\d{4})

# This would expand
customer.zip_code
    x-type: zipcode

All of these are exciting ideas, but I’m constantly reminded that there are
many similar technologies out there that, with some post processing, could
provide all of the functionality mentioned above. XML, JSON, XPATH, XSLT, YAML,
INI, CSS. The list is long and varied.

This project was actually abandoned because too much time was being spent on
it, another victim of feature creep. I decided that too much time was going
into this library, so I just settled on YAML to define data values.

Using YAML turned out to be the worst of both worlds. It provided no real
benefit from just using python code. The lesson here is that, unless you are
creating new means of combination and more functionality than simple data
definitions, you’re wasting your time using a data language.

Categories: Python, Software Design Tags: