Abandoned Python Projects
I don’t know about you, but I’ve abandoned many, if not dozens, of projects over the years. I get a cute little idea, implement it as a library (maybe?). I might even use it on a real project. At some point, however, I decide that the idea wasn’t that great after all.
I’ve also spent a great deal of time reimplementing the same type of code before. My prime example is HTML form libraries. They’re so very easy to start writing, but there comes a point when the codes collapses on top of itself due to its complexity or inflexibility. I must’ve written 5 server-side HTML form generation libraries at this point.
I would like to believe that I’ve learned some things with all of these attempts. So about a year ago, I started to catalog all of the dead projects and libraries that I have worked on or conceptualized over the years. I just wanted to capture the fact that I’d already been down that road.
Most of these projects were attempts at solving general, classes of problems. If I’ve learned one thing, it’s that general solutions are very hard to design.
Eventually, the abandoned projects catalog project found its way on the abandoned projects pile (I do love recursion don’t ya know).
I ran across the file in my Someday directory and thought that some of these might be interesting to share. Or maybe someone knows of completed libraries in production that do these things!
The Block Parser
The idea behind block parse was to create a way to specify a lot of different
kinds of artifacts in a file, have different parsers for each type of artifact
and have these things turned into Python objects.
Each artifact in the file would be represented as a text block, the type of the
object on the first line (with possibly some preamble or header information)
and then the body of the artifact would be indented under the header.
The main motivation was to create some kind of templating schema for SQL
because embedding SQL in code is so ugly. I wanted to be able to create my own
simple syntaxes and have them load into Python.
I have come to realize that it’s just easier to use a Python class with a
specific interface and pass off instances of that class to a loader. If syntax
really is prohibitive, additional files can be provided to the loader to create
the runtime object, along with Python objects.
Overlay Config Parser
This was an attempt at reimplementing apache’s hierarchical configuration
system. In apache, configuration inherits from parent directories and so
the total configuration of a particular file system location is an
aggregate of all configurations up the hierarchy. I really like how this
configuration system works. Configuration variables are inherited from parent
objects in the hierarchy, and can be overridden or augmented.
It turned out that it would be a lot of work to write a general configuration
system based on these rules. It’s easiest to assume that the hierarchy is a
file system, or at least formatted like a file system path
(/parent/child/grandchild).
This project might have some merit, but as of right now I can’t justify the
time it would take to write it. I don’t really have a use for it as of now
anyway.
SQL Pipe
SQL pipe was the result of playing with an idea I had on how to organize code.
This organization was called seed/step. Seed/Step worked like this:
An object would be constructed which would serve as the seed. This object would
have a lot of members that were blank and needed to be populated.
Step objects would be applied to the seed, which would populate the seed with
objects. I can’t exactly remember how it worked, but there were some
interesting ideas there.
The result was a linearlized sequence of commands that built a complex
structure, like a SQL query.
Ultimately, it turned out to be a very complicated builder pattern. The calling
sequences concerning how the objects were put together were very complex. The
library failed to separate the interface from the implementation, so a lot of
strange syntactic sugar had to be introduced for the code to look passable.
For example, here is code to create a select query:
select_pipe([ select.l((self.table, "po_id", "po_id")), select.l((self.table, "po_number", "PO_No")), select.l(("orders", "dt_number", "DT_No")), select.l((self.table, "phone", "Phone")), select.l((self.table, "status", "Status")), expr.l(" DATE_FORMAT(purchase_order.date, 'YYYYMMDD') as Date"), expr.l(" CONCAT(employee.last_name,', ',employee.first_name) as Employee"), select.l(("vendor", "company", "Vendor"))])(self.baseSeed())
The baseSeed was a seed that had a lot of other information added to it. This
expression only specified what columns to return. The baseSeed() provided the
actual tables, join structures and conditionals.
As I’ve said, a simple builder would’ve worked better.
The Dicer
The dicer was a fun little project that build a minilanguage to query complex
Python data structures. For example: lists of dicts, whose values had lists. An
example unit test:
data = {"foo" : [{"bar" : 1}, {"bar" : 2}, {"bar" : 3}], "spam" : [ {"ham" : { "eggs" : "monty!" }}], "neo" : "5" } assert dicer("foo[:].bar")(data) == [1, 2, 3] assert dicer("foo.bar")(data) == [1, 2, 3] assert dicer("foo[2].bar")(data) == 3 assert dicer("spam.ham.eggs")(data) == ["monty!"] assert dicer("neo")(data) == "5"
Looking back, the dicer really does seem like a neat utility with a very
focused purpose and clean interface. I believe it was originally written to
parse out complex configuration data. I should really remember to try and use
the dicer some more.
One thing I don’t like about the dicer is its use of lex and yacc packages,
which create parsetab.py files everywhere. This would have to be retooled if I
wanted to release the dicer to the public.
The Stamper
The stamper was an exercise in writing a validation library that had basic
logic support. I believe it was inspired from some examples in SICP.
For example, the stamper has a predicate expression when:
r = when(have('first_name'), have('last_name')) assert r({'first_name' : 'a', 'last_name' : 'b'}) == True
Here, ‘r’ becomes the stamper and the dict literal is the record which is
tested against. There were also stamps that allowed for custom procedures to be
plugged in and used, like match, which would take a procedure and return the
result of applying that procedure to the value for the field.
def is_phone_number_alt1(str): return re.match(r'^\(\d{3}\) \d{3}-\d{4}$', str) rec = { 'first_name' : 'John', 'last_name' : 'Smith', 'phone_number' : '444-444-4444', 'cell_phone' : '333-333-3333' } assert match(is_phone_number, 'phone_number')(rec) == True
The checking could get decently complex:
rule = when(all(have('length'), have('width')), check(lambda x, y: x == y, ['length', 'width'])) assert rule({'length' : '2', 'width' : '2'}) == True assert rule({'length' : '2', 'width' : '1'}) == False assert rule({'length' : '1', 'width' : '2'}) == False
This rule assures that the record has a square when dimensions are provided.
You know, the stamper is pretty powerful, but one thing it’s lacking is any
notion of types. I believe the stamper is type agnostic. It would need to have
much better error reporting capabilities than returning True or False for it to
be useful in a real system.
I also wonder what the pay off is of going through the trouble of building this
functional representation of these rules. It would seem that the stamper should
really be a middle man between some declarative data of rules and the
processing of those rules. In this event, why have all the syntactic sugar?
The stamper might be an example of me trying to turn Python into Scheme.
And I’ve saved the best for last:
Casrel
Casrel was an exploration in designing data files. Casrel data is short for
CAScading RELation. Each data file consists of a set of data definitions,
grouped as properties on objects. The layout was inspired by CSS. For example,
an account table definition pulled from a database would be:
account
schema-name: public
relation-type: table
read-only: no
account.account_name
label: account_name
sql-domain: varchar
type: varchar
read-only: no
sql-required: no
required: no
maximum-length: 255
account.label
label: label
sql-domain: varchar
type: varchar
read-only: no
sql-required: yes
required: yes
maximum-length: 255
account.owner_name
label: owner_name
sql-domain: varchar
type: varchar
read-only: no
sql-required: no
required: no
maximum-length: 255
account.account_type
label: account_type
sql-domain: account_type
type: account_type
read-only: no
sql-required: no
required: no
The object specifier consists of a series of dotted properties, forming a
hierarchy to a parent object. account.account_name describes properties on the
account_name object, which is a member of the account object.
One interesting property of this file format is that it effectively builds a
tree of data without the multi-level indentation or curly-brace structure that
is common in other file formats. The hierarchy is implicit in how the object
labels are written.
Casrel was designed to define metadata for applications to use to create forms,
SQL data definitions and other information. It’s purpose was to serve as a
central point of truth for schema data.
The cascading nature of casrel was that it supported merging the attribute
values on objects from multiple definitions (the same way CSS works).
This could allow specific properties to be overridden and object hierarchies to
be piece-meal, depending on the situation.
Take, for example, a customer table definition that is created by inspecting a
SQL table.
customer.first_name
label: first_name
type: varchar
maximum-length: 255
On a particular form, you want to show a field for the first_name, you can
override the label attribute to be more user friendly, and also add in another
attribute:
customer.first_name
label: First Name
help: The common name of the person
This merges to form:
customer.first_name
label: First Name
type: varchar
maximum-length: 255
help: The common name of the person
In the source code, this operation is called a “join” though I think that merge
would be a better term to use.
There are many more ideas that could be incorporated into the data format,
wildcards or regular expression matching for example:
# Set the default type on customer fields to be varchar
customer.*
type: varchar
maximum-length: 255
or more xpath-like expressions:
# Set the maximum-length on customer fields with the type attribute of varchar
customer.[type=varchar]
maximum-length: 25
Using wildcards, regular expressions of xpath-esque expressions, you could
build collections of attributes that could be bundled (and then applied by
using a special matching attribute that had no bearing on the real data
definition.) Would have to come up with a jargon term for this attribute. Maybe
a class (since it’s function is the same as using a css class to define a
common set of attributes for use on specific objects.)
# a custom type to be used for fields
*.[x-type=zipcode]
type: varchar
maximum-length: 10
match-expression: \d{5}(?:-\d{4})
# This would expand
customer.zip_code
x-type: zipcode
All of these are exciting ideas, but I’m constantly reminded that there are
many similar technologies out there that, with some post processing, could
provide all of the functionality mentioned above. XML, JSON, XPATH, XSLT, YAML,
INI, CSS. The list is long and varied.
This project was actually abandoned because too much time was being spent on
it, another victim of feature creep. I decided that too much time was going
into this library, so I just settled on YAML to define data values.
Using YAML turned out to be the worst of both worlds. It provided no real
benefit from just using python code. The lesson here is that, unless you are
creating new means of combination and more functionality than simple data
definitions, you’re wasting your time using a data language.