Archive

Author Archive

Installing Python Packages on Newer Ubuntu’s with easy_install

February 9th, 2011 jeremy No comments

for the longest time I’ve been installing packages using –prefix=/usr/local with easy_install. This seems to be the correct place to put them packages, since they are not managed by Ubuntu.

Newer versions of Ubuntu have split /usr/local/lib/pythonX.X/site-packages into 2.

/usr/local/lib/pythonX.X/site-packages works with versions of python installed that are not part of the base system (under /usr/local/bin/python) and /usr/local/lib/pythonX.X/dist-packages is for user-installed packages that use the version of python that ships with Ubuntu.

I personally use the Python the operating system comes from, so if I want to install packages using easy_install the right command is:

 easy_install --prefix=/usr/local  \
      --install-dir=/usr/local/lib/python2.6/dist-packages PACKAGE-NAME

the –prefix will take care of scripts and data while –install-dir will take care of the non-standard site-packages.

Categories: Uncategorized Tags:

Writing Framework-less Python WSGI Applications for Fun and Profit

July 22nd, 2010 jeremy No comments

After writing WSGI applications over the past year or so, I’ve decided that the best general strategy to approach application design is to think of each request with the following steps:

Logical organization of WSGI middleware

Every block of the diagram is a WSGI application (middleware) save the adapter and request handler. Here is an overview of each block:

Common Middleware

Common Middleware applies to all requests across the entire site. This may be none or many. For intranet systems, common middleware includes session management (beaker), authentication/login screen, and user tracking. For a public site, common middleware could be gzip compression or localization negotiation. Nearly all sites will have some exception handling middleware that emails you when something AWFUL happens (Ooops!).

This is also a common place to put middleware that handles specific exceptions that triggers actions. This style of programming may be controversial in some circles, but no one can deny the practical benefit of being able to raise HTTPRedirect(’/login’) from any where and have it send the user to the given screen. I could actually write a whole post on designing site-wide middleware. Some day I just might!

Dispatcher

The purpose of the dispatcher is to examine the incoming request and invoke another WSGI application to handle the request. There are quite a few existing dispatching modules floating around the net, and I’ve written a few myself. I’ll explore them later, but I want to make one point: routes is not a dispatcher. Routes would be common middleware that sets an environ variable (routing_args) that would be used by a dispatcher to find a handler. The publicly-available dispatcher I’ve had the most experience with is Ian Bicking’s paste.urlparser. I’ve used it to good effect.

Object-request brokers (ORB’s) also fall into this category. I’ll just say that the ORB should invoke a method on an object with a WSGI interface. Everything should be WSGI all the way until the very final step.

Application Middleware

Now, for some requests, the WSI application that is invoked by the dispatcher is the end of the road. It handles the request and that is that. However, one is missing out on many time saving (and code saving) layers that can be built here.

The most common application middleware I’ve used is secondary dispatching middleware. Let’s say that you’re doing a good old HTML form with submit handler. The best thing to do here is to use an application-level dispatcher that works on HTTP Method. I have a module with 3 WSI applications. The site dispatcher sends the request to the “main” WSGI application. This WSGI application then sends the request to one of the other 2 dispatchers depending on the REQUEST_METHOD. Something like this:

application = MethodDispatch()
 
@application
def get(start_response, environ):
     pass
 
@application
def post(start_response, environ):
     pass

There are countless other ways of deciding different applications to run, but this is just to wet the appetite. I know their are “everything and the kitchen sink” dispatching systems that could take care of this a level up, but I like the locality of this approach. Like all the other categories, there is a lot more I can say about this block. Another time!

Request-specific Middleware

Request middleware applies to a single request made by the browser. I have found that decorators are a great way to introduce middleware at the request level. It’s hard to talk about request-specific middleware without also talking about adapters, so I’ll move on.

Adapters

An adapter breaks the chain from WSGI to another interface. WSGI is great as a standard, but trying to write your application logic using it as an interface is not very fun. Adapters are right at the top of ways I increase my productivity writing these web applications. The idea is simple: What kind of interface works best for this particular kind of request?

If the request is for JSON data. It might be best to have the procedure return a data structure that is turned into JSON and sent back to the browser. If a mako template gets used to render a HTML page, maybe a good interface would be to have a mako object passed into the request handler so that it can populate the template with data. When the request ends, the mako template is automatically rendered.

These adapters should not be generalized solutions to fit everyone’s potential use cases. They are very easy to write yourself. If you have a few ways mako templates can be handled, have a few different mako adapters. In the past, I’ve tried to write complex, catch-all type adapters. I’ve learned that simple, application-specific adapters are definitely the way to go.

Application-specific middleware can be used in tandem with adapters to augment the adapters functionality. The middleware can do something before or after the adapter (and handler) runs, so you can customize to your hearts content. As always though KISS!

To give an example of adapters and application middleware in action, here is a piece of real code:

import app.wsgilib as W
reg = W.PathDispatch()
application = reg.get_wsgi_app()
 
@reg.default
@W.mako('ppayment.tmpl')
def main(req, res):
    pass
 
@reg
@W.json
def trans_search(req, res):
    def eq(key, field, values):
        return ["%s = %%s" % field], [values[key]]
 
    def ilike(key, field, values):
        return ["%s ILIKE %%s" % field], [values[key]]
    # Tons of SQL/database code snipped
    res['sql'] = sql
    res['transactions'] = [dict(c) for c in cursor]

here W.json and W.mako adapt the WSGI interface into a more application-programmer friendly interface. I’m creating webob request objects and custom derived webob response objects as arguments for the request handler. PathDispatch is just a simple dispatcher that selects a registered method based on what’s in PATH_INFO.

When I say simple, I mean “SIMPLE”

class PathDispatch(dict):
    def __call__(self, func):
        self['/%s' % func.__name__] = func
        return func
 
    def default(self, func):
        self[None] = func
 
    def lookup(self, path_info):
        try:
            return self[path_info]
        except KeyError:
            return self[None]
 
    def get_wsgi_app(self):
        return wsgi(self.app)
 
    def app(self, req, res):
        proc = self.lookup(req.path_info)
        return proc(req, res)

The simplicity of this utility code is key. It should be easy and fun to hack on, transparent and not require a huge cognitive load to understand.

I’ve only scratched the surface. I hope to explore each level in greater detail in future posts!

Categories: Python, Software Design, WSGI Tags:

Pausing the terminal in an OS independent way in Python

June 15th, 2010 jeremy No comments
import os
import sys
import tty
 
def pause():
    """ Prompt the user to press a key to continue. Should probably use a pager
    most of the time though.
    """
    if sys.platform in ('win32', 'win64'):
        os.system('PAUSE')
    elif sys.stdin.isatty():
        sys.stdout.write('Press any key to continue')
        tty.setraw(sys.stdin.fileno())
        try:
            sys.stdin.read(1)
        finally:
            os.system("stty sane")
 
print 'good times'
pause()
print 'CLOBBER\n' * 100
Categories: Python Tags:

Abandoned Python Projects

May 17th, 2010 jeremy No comments

I don’t know about you, but I’ve abandoned many, if not dozens, of projects over the years. I get a cute little idea, implement it as a library (maybe?). I might even use it on a real project. At some point, however, I decide that the idea wasn’t that great after all.

I’ve also spent a great deal of time reimplementing the same type of code before. My prime example is HTML form libraries. They’re so very easy to start writing, but there comes a point when the codes collapses on top of itself due to its complexity or inflexibility. I must’ve written 5 server-side HTML form generation libraries at this point.

I would like to believe that I’ve learned some things with all of these attempts. So about a year ago, I started to catalog all of the dead projects and libraries that I have worked on or conceptualized over the years. I just wanted to capture the fact that I’d already been down that road.

Most of these projects were attempts at solving general, classes of problems. If I’ve learned one thing, it’s that general solutions are very hard to design.

Eventually, the abandoned projects catalog project found its way on the abandoned projects pile (I do love recursion don’t ya know).

I ran across the file in my Someday directory and thought that some of these might be interesting to share. Or maybe someone knows of completed libraries in production that do these things!

The Block Parser

The idea behind block parse was to create a way to specify a lot of different
kinds of artifacts in a file, have different parsers for each type of artifact
and have these things turned into Python objects.

Each artifact in the file would be represented as a text block, the type of the
object on the first line (with possibly some preamble or header information)
and then the body of the artifact would be indented under the header.

The main motivation was to create some kind of templating schema for SQL
because embedding SQL in code is so ugly. I wanted to be able to create my own
simple syntaxes and have them load into Python.

I have come to realize that it’s just easier to use a Python class with a
specific interface and pass off instances of that class to a loader. If syntax
really is prohibitive, additional files can be provided to the loader to create
the runtime object, along with Python objects.

Overlay Config Parser

This was an attempt at reimplementing apache’s hierarchical configuration
system. In apache, configuration inherits from parent directories and so
the total configuration of a particular file system location is an
aggregate of all configurations up the hierarchy. I really like how this
configuration system works. Configuration variables are inherited from parent
objects in the hierarchy, and can be overridden or augmented.

It turned out that it would be a lot of work to write a general configuration
system based on these rules. It’s easiest to assume that the hierarchy is a
file system, or at least formatted like a file system path
(/parent/child/grandchild).

This project might have some merit, but as of right now I can’t justify the
time it would take to write it. I don’t really have a use for it as of now
anyway.

SQL Pipe

SQL pipe was the result of playing with an idea I had on how to organize code.
This organization was called seed/step. Seed/Step worked like this:

An object would be constructed which would serve as the seed. This object would
have a lot of members that were blank and needed to be populated.

Step objects would be applied to the seed, which would populate the seed with
objects. I can’t exactly remember how it worked, but there were some
interesting ideas there.

The result was a linearlized sequence of commands that built a complex
structure, like a SQL query.

Ultimately, it turned out to be a very complicated builder pattern. The calling
sequences concerning how the objects were put together were very complex. The
library failed to separate the interface from the implementation, so a lot of
strange syntactic sugar had to be introduced for the code to look passable.

For example, here is code to create a select query:

select_pipe([
    select.l((self.table, "po_id", "po_id")),
    select.l((self.table, "po_number", "PO_No")),
    select.l(("orders", "dt_number", "DT_No")),
    select.l((self.table, "phone", "Phone")),
    select.l((self.table, "status", "Status")),
    expr.l(" DATE_FORMAT(purchase_order.date, 'YYYYMMDD') as Date"),
    expr.l(" CONCAT(employee.last_name,', ',employee.first_name) as Employee"),
    select.l(("vendor", "company", "Vendor"))])(self.baseSeed())

The baseSeed was a seed that had a lot of other information added to it. This
expression only specified what columns to return. The baseSeed() provided the
actual tables, join structures and conditionals.

As I’ve said, a simple builder would’ve worked better.

The Dicer

The dicer was a fun little project that build a minilanguage to query complex
Python data structures. For example: lists of dicts, whose values had lists. An
example unit test:

data = {"foo" : [{"bar" : 1},
                 {"bar" : 2},
                 {"bar" : 3}],
        "spam" : [
             {"ham" : {
                 "eggs" : "monty!"
             }}],
        "neo" : "5"
       }
 
assert dicer("foo[:].bar")(data) == [1, 2, 3]
assert dicer("foo.bar")(data) == [1, 2, 3]
assert dicer("foo[2].bar")(data) == 3
assert dicer("spam.ham.eggs")(data) == ["monty!"]
assert dicer("neo")(data) == "5"

Looking back, the dicer really does seem like a neat utility with a very
focused purpose and clean interface. I believe it was originally written to
parse out complex configuration data. I should really remember to try and use
the dicer some more.

One thing I don’t like about the dicer is its use of lex and yacc packages,
which create parsetab.py files everywhere. This would have to be retooled if I
wanted to release the dicer to the public.

The Stamper

The stamper was an exercise in writing a validation library that had basic
logic support. I believe it was inspired from some examples in SICP.

For example, the stamper has a predicate expression when:

r = when(have('first_name'), have('last_name'))
assert r({'first_name' : 'a', 'last_name' : 'b'}) == True

Here, ‘r’ becomes the stamper and the dict literal is the record which is
tested against. There were also stamps that allowed for custom procedures to be
plugged in and used, like match, which would take a procedure and return the
result of applying that procedure to the value for the field.

def is_phone_number_alt1(str):
    return re.match(r'^\(\d{3}\) \d{3}-\d{4}$', str)
 
rec = {
    'first_name' : 'John',
    'last_name' : 'Smith',
    'phone_number' : '444-444-4444',
    'cell_phone' : '333-333-3333'
}
 
assert match(is_phone_number, 'phone_number')(rec) == True

The checking could get decently complex:

rule = when(all(have('length'), have('width')),
                check(lambda x, y: x == y, ['length', 'width']))
assert rule({'length' : '2', 'width' : '2'}) == True
assert rule({'length' : '2', 'width' : '1'}) == False
assert rule({'length' : '1', 'width' : '2'}) == False

This rule assures that the record has a square when dimensions are provided.

You know, the stamper is pretty powerful, but one thing it’s lacking is any
notion of types. I believe the stamper is type agnostic. It would need to have
much better error reporting capabilities than returning True or False for it to
be useful in a real system.

I also wonder what the pay off is of going through the trouble of building this
functional representation of these rules. It would seem that the stamper should
really be a middle man between some declarative data of rules and the
processing of those rules. In this event, why have all the syntactic sugar?

The stamper might be an example of me trying to turn Python into Scheme.

And I’ve saved the best for last:

Casrel

Casrel was an exploration in designing data files. Casrel data is short for
CAScading RELation. Each data file consists of a set of data definitions,
grouped as properties on objects. The layout was inspired by CSS. For example,
an account table definition pulled from a database would be:

account
    schema-name: public
    relation-type: table
    read-only: no

account.account_name
    label: account_name
    sql-domain: varchar
    type: varchar
    read-only: no
    sql-required: no
    required: no
    maximum-length: 255

account.label
    label: label
    sql-domain: varchar
    type: varchar
    read-only: no
    sql-required: yes
    required: yes
    maximum-length: 255

account.owner_name
    label: owner_name
    sql-domain: varchar
    type: varchar
    read-only: no
    sql-required: no
    required: no
    maximum-length: 255

account.account_type
    label: account_type
    sql-domain: account_type
    type: account_type
    read-only: no
    sql-required: no
    required: no

The object specifier consists of a series of dotted properties, forming a
hierarchy to a parent object. account.account_name describes properties on the
account_name object, which is a member of the account object.

One interesting property of this file format is that it effectively builds a
tree of data without the multi-level indentation or curly-brace structure that
is common in other file formats. The hierarchy is implicit in how the object
labels are written.

Casrel was designed to define metadata for applications to use to create forms,
SQL data definitions and other information. It’s purpose was to serve as a
central point of truth for schema data.

The cascading nature of casrel was that it supported merging the attribute
values on objects from multiple definitions (the same way CSS works).

This could allow specific properties to be overridden and object hierarchies to
be piece-meal, depending on the situation.

Take, for example, a customer table definition that is created by inspecting a
SQL table.

customer.first_name
    label: first_name
    type: varchar
    maximum-length: 255

On a particular form, you want to show a field for the first_name, you can
override the label attribute to be more user friendly, and also add in another
attribute:

customer.first_name
    label: First Name
    help: The common name of the person

This merges to form:

customer.first_name
    label: First Name
    type: varchar
    maximum-length: 255
    help: The common name of the person

In the source code, this operation is called a “join” though I think that merge
would be a better term to use.

There are many more ideas that could be incorporated into the data format,
wildcards or regular expression matching for example:

# Set the default type on customer fields to be varchar
customer.*
    type: varchar
    maximum-length: 255

or more xpath-like expressions:

# Set the maximum-length on customer fields with the type attribute of varchar
customer.[type=varchar]
    maximum-length: 25

Using wildcards, regular expressions of xpath-esque expressions, you could
build collections of attributes that could be bundled (and then applied by
using a special matching attribute that had no bearing on the real data
definition.) Would have to come up with a jargon term for this attribute. Maybe
a class (since it’s function is the same as using a css class to define a
common set of attributes for use on specific objects.)

# a custom type to be used for fields
*.[x-type=zipcode]
    type: varchar
    maximum-length: 10
    match-expression: \d{5}(?:-\d{4})

# This would expand
customer.zip_code
    x-type: zipcode

All of these are exciting ideas, but I’m constantly reminded that there are
many similar technologies out there that, with some post processing, could
provide all of the functionality mentioned above. XML, JSON, XPATH, XSLT, YAML,
INI, CSS. The list is long and varied.

This project was actually abandoned because too much time was being spent on
it, another victim of feature creep. I decided that too much time was going
into this library, so I just settled on YAML to define data values.

Using YAML turned out to be the worst of both worlds. It provided no real
benefit from just using python code. The lesson here is that, unless you are
creating new means of combination and more functionality than simple data
definitions, you’re wasting your time using a data language.

Categories: Python, Software Design Tags:

Turtle Sets the Bar for Python Programming

May 12th, 2010 jeremy No comments

Last week I was showing a non-programmer friend the basics of coding. We did some interactive programming using the logo-inspired turtle package that comes with Python Tkinter. We did some fun things like drawing random pinwheels.

The final program we wrote drew the tri-force from “The Legend of Zelda.” Here’s the code:

from turtle import *
 
def triangle(size):
    begin_fill()
    for i in range(3):
        forward(size)
        left(120)
    end_fill()
 
def triforce(size):
    color('yellow')
    triangle(size)
    right(120)
    forward(size)
    left(120)
    triangle(size)
    forward(size)
    triangle(size)
 
triforce(100)
done()

Don’t you wish all programming was like that? What if your ajax-driven Python web application’s code was that simple and elegant?

Categories: Uncategorized Tags:

Python AMQPlib Common Idioms

November 12th, 2009 jeremy 2 comments

I’ve been looking at implementing AMQP for quite some time now to replace our globs of ad-hoc system communication, and so far I’ve been enjoying learning the technology. I’ve been going over the AMQP spec, Rabbit MQ’s Documentation and have read Rabbits and Warrens a handful of times.

One thing that I’m yet to run across is some examples of the common patterns you can do with AMQP and how to write them with amqplib. I’d like for this page to help answer the question, “so what do I do with it?” after someone has learned about channels, exchanges, queues and routing keys. When looking at the following examples, you will notice a lot of similarities in the code, but it’s the small differences that causes the nuances that have given me the most insight.

Set up

Below is the channel setup (and python script initialization) that all of these examples will use. This configuration worked fine for me on Ubuntu after installing the .deb file from the rabbit-mq website.

#!/usr/bin/env python
from amqplib import client_0_8 as amqp
conn = amqp.Connection(host="localhost:5672 ", userid="guest",
    password="guest", virtual_host="/", insist=False)
chan = conn.channel()

Broadcast

When broadcasting, a producer sends out messages to the queue and does not care about who receives them or if their is someone at the other end to receive them. Consumers join and leave the exchange at their leisure. Providing sports score updates is an example where broadcasting would be appropriate. Every time a team scores, a message is sent out to all interested parties with what happened and the new score. Many clients would only want to know scores that happen in real time. Scores that happen in the past would be of no consequence.

When broadcasting, the exchange name alone identifies the destination of the message.

chan.exchange_declare(exchange="score_updates", type="fanout", durable=False,
                      auto_delete=True)
 
# Produce Script
msg = amqp.Message("AL 24: FL 7\nTouch down by Tim Tebow")
chan.basic_publish(msg, exchange="score_updates")
 
# Consume Script
queue = chan.queue_declare("", durable=False, exclusive=True, auto_delete=True)
def score_update_callback(msg):
    # tell person using this device
    pass
chan.basic_consume(queue=queue[0], callback=score_update_callback, no_ack=True)

Notify

This idiom allows consumers to subscribe to events and guarantee that all messages get delivered, even when the consumer is not running. The setup is similar to broadcast with the following exceptions: The queues are given explicit names that relate to the consumer. The name selection must be coordinated across all consumers to be sure they don’t overlap. Queues are made durable and persistent, giving them permanency. In this scheme a consumer should be ran at least once to register its queue. After the initial run, messages sent to the exchange will start to pile up in the queue for the consumer to process.

Let’s say that data records entered in one system should be sent to numerous other systems across an enterprise. The Notifier is the way to go.

When notifying, the exchange name alone identifies the destination of the message.

chan.exchange_declare(exchange="notify.patient.add", type="fanout", durable=True,
                      auto_delete=False)
 
# Producer
msg = amqp.Message("given:John\nsurname:Smith", delivery_mode=2)
chan.basic_publish(msg, exchange="notify.patient.add")
 
# Consumer for Scheduling System
chan.queue_declare("notify.patient.add.scheduling", durable=True, auto_delete=False,
                            exclusive=False)
chan.queue_bind(queue="notify.patient.add.scheduling", exchange="notify.patient.add")
def scheduling_patient_add(msg):
     # add to local database maybe?
     chan.basic_ack(msg.delivery_tag)
 
chan.basic_consume(queue="notify.patient.add.scheduling",
                            callback=scheduling_patient_add)
 
# Consumer for Reporting System
chan.queue_declare("notify.patient.add.reporting", durable=True, auto_delete=False,
                            exclusive=False)
chan.queue_bind(queue="notify.patient.add.reporting", exchange="notify.patient.add")
def reporting_patient_add(msg):
     # add to local database maybe?
     chan.basic_ack(msg.delivery_tag)
 
chan.basic_consume(queue="notify.patient.add.reporting",
                            callback=reporting_patient_add)

It’s important that each consumer has it’s own queue uniquely named. We don’t want
different consumers gobbling up each other’s messages.

Request/Response

This one took me the longest to figure out. It’s an implementation of the standard request/response idiom ubiquitously found in client/server computing. This is also the underlying mechanism to RPC, so if you’re looking to do your own RPC protocol on top of AMQP, then this is the way to go.

I learned the proper way to implement this by looking at the source code of rabbit mq’s java client library along with reading the specification, so if anyone has any pointers to make this code better, I’d love to hear them!

The idea is to use two queues. One for sending the request and one for the response.

chan.exchange_declare(exchange="api", type="direct", durable=False,
                      auto_delete=True)
 
# Responder/Callee/Server
chan.queue_declare(queue="add_patient", durable=False,
    exclusive=False, auto_delete=True)
chan.queue_bind(exchange="api", queue="add_patient", routing_key="add_patient")
def add_patient(msg):
    reply = amqp.Message("Return Value")
    chan.basic_publish(reply, routing_key=msg.reply_to)
chan.basic_consume(queue="add_patient", callback=add_patient, no_ack=True)
 
# Requester/Caller/Client
reply_queue = chan.queue_declare("", durable=False, exclusive=True,
                                                auto_delete=True)
reply_queue = reply_queue[0]
msg = amqp.Message("Test message!")
msg.properties["reply_to"] = reply_queue
chan.basic_publish(msg, exchange="api", routing_key="add_patient")
def handle_response(msg):
    print "Response:", msg.body
chan.basic_consume(queue=reply_queue, callback=handle_response, no_ack=True)
chan.wait()

There is a lot going on in this example, so I’ll try to go through it step-by-step. In this scenario the exchange is defined as a container, while the queue is the application end point. The caller creates a temporary, exclusive queue to serve as a response channel for the responder. AMQP messages have a standard reply_to field for just this purpose.

In this case, the response queue is defined in the global exchange, since it is only temporary and is passed by name to the responder.

Something that I didn’t get much time to go in to is the choice of which queues and exchanges to make durable and set auto delete. In request and response, you generally do not want messages to persist and pile up in the queue if no responder is running, so a time out mechanism will have to be implemented in the application layer. I’m sure there are many other considerations that I haven’t thought about. But I’ll hopefully cross those paths when I come to them.

Categories: Python Tags:

WSGI DIY Application

October 4th, 2009 jeremy 1 comment
# feed me to mod_wsgi
import os
 
from beaker.middleware import SessionMiddleware
from paste import httpexceptions
from paste.exceptions.errormiddleware import ErrorMiddleware
from paste.urlparser import make_url_parser
 
site_dir = os.path.abspath(os.path.dirname(__file__))
app_dir = os.path.join(site_dir, 'apps')
 
os.environ.update({
     'DBURI': 'postgres://XX:XXXX@localhost/XX',
     'MAKO_CACHE_DIR': '/tmp/XX_mako',
     'MAKO_TEMPLATE_DIR': '%s/mako' % app_dir
})
 
application = make_url_parser({}, app_dir, 'XX.apps')
application = SessionMiddleware(application, {
    'session.cookie_expires': 30000,
    'session.type' : 'file',
    'session.data_dir' : '/tmp/XX_session',
    'session.key' : '_sid'
})
application = httpexceptions.HTTPExceptionHandler(application)
application = ErrorMiddleware(application, debug=True)
Categories: Python, WSGI Tags:

First Thoughts on WSGI and Paste

July 27th, 2009 jeremy No comments

I’ve finally had some time to look into WSGI, and I’m starting my first project using it. I made the jump because of how easy mod_wsgi was to get working. I particularly like daemon mode where multiple processes can be configured to run the same application. If, on a particular instance, a wsgi application goes horribly wrong (like calls into a shared object that causes a seg fault), mod_wsgi will happily replace the process with a new copy. I find process isolation in a production environment to be manditory. So go mod_wsgi!

I’ve found Ian Bicking’s work on Python Paste to be most illuminating and good brain food to chew on. I particularly enjoyed his article on a do-it-yourself framework.

The smorgasbord approach to pure WSGI development is very appealing: simple interfaces, low coupling, transparent code, small contexts. It really embodies the values I’ve developed over the years writing Python. Middleware is absolutely scrumptious, and I hope to have some packages suitable for release to the rest of the community after I finish this first project.

So far, I’m using:

I have to say that so far, I’ve been very impressed with paste.urlparser. I’ve written and used so many web dispatch systems over the years, and this one really seems to get it right.

The fact that the dispatch sends to the module level is perfect. Other systems, like cherrypy require you to explicitly import all modules we controller code, usually resulting in many import statements in __init__ modules. paste.urlparser, automatically scans for modules with WSGI applications.

Each module, and in turn each file, is its own application. Simple, clean and to the point. Convention over Configuration.

Categories: Python, WSGI Tags:

Automated Ignore Files with Subversion

May 10th, 2007 jeremy No comments

We always have tons of garbage files lying around that we don’t want going into version control. Subversion’s svn:ignore property doesn’t have the best documentation around. So, in order help out, here are two little scripts that work in tandem to ignore whatever you want.

The easiest way is to place .svnignore files in all of your directories that you wish to ignore files in. This file has a list of patterns or file names, one on each line, that should be ignored in this directory.

For example, to ignore foo.py and bar.py the file would have something like this:

   foo.py
   bar.py

Now, what if we have a set of files that we want to ignore across a project? The easiest way is to write scripts that automatically manage these .svnignore files.

In one project, we have template directories named ‘t’ with compiled cheetah files that we do not want in subversion. However, we need the __init__.py files in those directories to be in subversion.

Here is a python script to manage those ignore files:

#!/usr/bin/env python
""" Adds all python files in directories named t except
for __init__.py to the .svnignore files in those
directories recursively from the current directory.
"""
import os
import glob
 
for root, dirs, files in os.walk(os.curdir):
    if os.path.basename(root) <> 't':
        continue
    files = glob.glob("%s/*.py" % root)
    files = [os.path.basename(f) for f in files]
    files = [f for f in files if f != '__init__.py']
    ignore_path = os.path.join(root, '.svnignore')
    if os.path.exists(ignore_path):
        igh = open(ignore_path)
        ignores = dict([(f.strip(), True) for f in igh])
        igh.close()
    else:
        ignores = {}
    ignores.update(dict([(f, True) for f in files]))
 
    igh = open(ignore_path, 'w')
    for ignore in ignores.keys():
        igh.write("%s\n" % ignore)
    igh.close()

If a ignore file exists, we pull it in and update it with what we want to ignore. It’s always important that scripts do not break any previous work. So if I manually put an entry in the .svnignore file, the script better not erase it.

Alright, so there we go. We now have ignore files everywhere. The other piece of the puzzle is a script that uses these ignore files and sets the svn properties on the directories.

So here is the last piece of the puzzle:

#!/usr/bin/env python
""" Sets the contents of all .svnignore files to svn:ignore properties
recursively in the current directory. Also attemps to add the
.svnignore file to subversion.
"""
import os
for root, dirs, files in os.walk(os.curdir):
    if '.svnignore' not in files:
        continue
    path = os.path.join(root, '.svnignore')
    os.system('svn propset svn:ignore -F "%s" "%s"' % (path, root))
    os.system('svn add "%s"' % path)
Categories: Python, Sysadmin Tags: