Monday, April 27, 2009

Revisiting Pyshards

So I had this idea last year after reading several white papers on database scaling techniques. I also had about a week's time between paid projects to spend however I liked. So I turned my ideas into pyshards, a quick and dirty horizontal database partitioning library. At the end of that week, I published my effort on Google Code for a number of reasons:


  1. I really couldn't find a Python-based toolkit like the one I was imagining at the time, and I needed it.

  2. I was looking for Python gigs and wanted to be able to easily refer hiring technologists to something I had written in Python. (Most of my previous work had been written in JAVA or C++, or could not be made public.)

  3. After years of using great free and open source, I was ready to give something back.

  4. I was curious if others would volunteer to help me build the library.



I received a number of messages from other coders saying they were looking a tool like the one I was building and would be interested in joining the project. But the messages were about as far as it went. Actual participation from the outside was nill.

I went on to use pyshards in my next project, but not quite in the capacity I had originally envisioned. I did use the tool to configure my shards and I used its distribution mechanism to evenly spread data across the many databases. I didn't end up use it for querying, as I needed something a little different. In the following months I went on to create a new page (in the Django sub-project) that visually communicated the shard organization and remaining capacity, but that was the only new work done.

Though the library was imperfect and incomplete, it certainly worked for my purposes. I gave it little thought over the next several months. My hands were full building a new system for the company I had started with my partners.

Jump ahead to PyCon 2009 in Chicago. I had a few hours to kill on the last day before catching my plane and decided to attend an OpenSpaces session called "Is my code Pythonic?" I had intended to simply listen in, but when no one offered up their code for review, I volunteered. A lot of my Python code is proprietary, so I decided to offer up a file from pyshards for review, since it was public. There was a LOT of feedback.

At this point you may be wondering, what do they mean by Pythonic? I generally understood it to mean that you should follow the Zen of Python coding principals and stick to the "pythonic" coding style.

In case you missed it, the Zen of Python is always near and dear if you are working in a Python interpreter.


me@mrroboto:~$ python
Python 2.5.2 (r252:60911, Jul 31 2008, 17:28:52)
[GCC 4.2.3 (Ubuntu 4.2.3-2ubuntu7)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> import this
The Zen of Python, by Tim Peters

Beautiful is better than ugly.
Explicit is better than implicit.
Simple is better than complex.
Complex is better than complicated.
Flat is better than nested.
Sparse is better than dense.
Readability counts.
Special cases aren't special enough to break the rules.
Although practicality beats purity.
Errors should never pass silently.
Unless explicitly silenced.
In the face of ambiguity, refuse the temptation to guess.
There should be one-- and preferably only one --obvious way to do it.
Although that way may not be obvious at first unless you're Dutch.
Now is better than never.
Although never is often better than *right* now.
If the implementation is hard to explain, it's a bad idea.
If the implementation is easy to explain, it may be a good idea.
Namespaces are one honking great idea -- let's do more of those!



So that takes care of the Zen, but what is the Pythonic coding style? There are lots of opinions, but in general it is whatever the core developers and expert users say it is, and that evolves over time even as the fabric of the community evolves.

Jump to the present. This morning I'm finally sitting down to take a good look at the patch that Jack Diederich submitted as well as the notes I took while discussing the code with Moshe Zadka. And as I bring the project back up for testing, I remember that the setup and configuration steps were very incomplete. Okay Devin, quit blogging and get to work on it.

No comments: