Categories: IT, Architecture, Linnaeus Award, Python, Cation, CherryPy, Dejavu, WHELPS, WSGI, Robotics and Engineering

Pages: 1 2 3 4 5 6 7 8 9 10 11 ... 17 >>

03/20/13

Permalink 11:01:52 pm, by fumanchu Email , 18 words   English (US)
Categories: IT, Python, Architecture

PyData 2013 Slides

The presentation deck from my talk at PyData 2013 is up! Thanks to everyone for their interest and feedback.

02/12/13

Permalink 10:22:04 am, by fumanchu Email , 108 words   English (US)
Categories: IT

Addictive to check out

From 37 Signals, about their Basecamp iPhone app launch:

Our top priority was fast access to news. You’ll find the app makes it addictive to check in and feel the pulse of your projects throughout the day. You can quickly bounce in and out of projects. Project screens on the phone show the latest news first rather than static project contents.

Cool. As a manager, that's exactly what I want: to feel the pulse.

As an architect and designer and developer, I want the opposite. Now, can someone make an app that makes it addictive to get in the flow instead of to be interrupted all the time?

02/24/11

Permalink 02:58:42 pm, by fumanchu Email , 1214 words   English (US)
Categories: Python, CherryPy

Wow. Does isinstance blow up with ABC's?

Python 2.6.1. Here's a call to "isinstance(value, basestring)":

--[  (_cprequest:782)
--]  (_cprequest:782)  0.044ms

versus "isinstance(value, io.IOBase)":

--[  (_cprequest:791)
----> __instancecheck__ (abc:117)
----. __instancecheck__ (abc:120)
------[  (abc:120)
------]  (abc:120)  0.046ms
----. __instancecheck__ (abc:121)
----. __instancecheck__ (abc:123)
----. __instancecheck__ (abc:124)
----. __instancecheck__ (abc:125)
----. __instancecheck__ (abc:126)
----. __instancecheck__ (abc:127)
----. __instancecheck__ (abc:130)
------> __subclasscheck__ (abc:134)
------. __subclasscheck__ (abc:137)
------. __subclasscheck__ (abc:140)
------. __subclasscheck__ (abc:144)
------. __subclasscheck__ (abc:147)
--------[ ABCMeta.__subclasshook__ (abc:147)
--------] ABCMeta.__subclasshook__ (abc:147)  0.043ms
------. __subclasscheck__ (abc:148)
------. __subclasscheck__ (abc:156)
--------[  (abc:156)
--------]  (abc:156)  0.043ms
------. __subclasscheck__ (abc:160)
------. __subclasscheck__ (abc:165)
--------[ ABCMeta.__subclasses__ (abc:165)
--------] ABCMeta.__subclasses__ (abc:165)  0.045ms
------. __subclasscheck__ (abc:166)
--------[  (abc:166)
----------> __subclasscheck__ (abc:134)
----------. __subclasscheck__ (abc:137)
----------. __subclasscheck__ (abc:140)
----------. __subclasscheck__ (abc:144)
----------. __subclasscheck__ (abc:147)
------------[ ABCMeta.__subclasshook__ (abc:147)
------------] ABCMeta.__subclasshook__ (abc:147)  0.043ms
----------. __subclasscheck__ (abc:148)
----------. __subclasscheck__ (abc:156)
------------[  (abc:156)
------------]  (abc:156)  0.046ms
----------. __subclasscheck__ (abc:160)
----------. __subclasscheck__ (abc:165)
------------[ ABCMeta.__subclasses__ (abc:165)
------------] ABCMeta.__subclasses__ (abc:165)  0.043ms
----------. __subclasscheck__ (abc:166)
------------[  (abc:166)
--------------> __subclasscheck__ (abc:134)
--------------. __subclasscheck__ (abc:137)
--------------. __subclasscheck__ (abc:140)
--------------. __subclasscheck__ (abc:144)
--------------. __subclasscheck__ (abc:147)
----------------[ ABCMeta.__subclasshook__ (abc:147)
----------------] ABCMeta.__subclasshook__ (abc:147)  0.043ms
--------------. __subclasscheck__ (abc:148)
--------------. __subclasscheck__ (abc:156)
----------------[  (abc:156)
----------------]  (abc:156)  0.043ms
--------------. __subclasscheck__ (abc:160)
--------------. __subclasscheck__ (abc:165)
----------------[ ABCMeta.__subclasses__ (abc:165)
----------------] ABCMeta.__subclasses__ (abc:165)  0.042ms
--------------. __subclasscheck__ (abc:170)
----------------[ set.add (abc:170)
----------------] set.add (abc:170)  0.043ms
--------------. __subclasscheck__ (abc:171)
--------------< __subclasscheck__ (abc:171): False 1.690ms
------------]  (abc:166)  1.887ms
----------. __subclasscheck__ (abc:165)
----------. __subclasscheck__ (abc:170)
------------[ set.add (abc:170)
------------] set.add (abc:170)  0.042ms
----------. __subclasscheck__ (abc:171)
----------< __subclasscheck__ (abc:171): False 3.745ms
--------]  (abc:166)  3.952ms
------. __subclasscheck__ (abc:165)
------. __subclasscheck__ (abc:166)
--------[  (abc:166)
----------> __subclasscheck__ (abc:134)
----------. __subclasscheck__ (abc:137)
----------. __subclasscheck__ (abc:140)
----------. __subclasscheck__ (abc:144)
----------. __subclasscheck__ (abc:147)
------------[ ABCMeta.__subclasshook__ (abc:147)
------------] ABCMeta.__subclasshook__ (abc:147)  0.044ms
----------. __subclasscheck__ (abc:148)
----------. __subclasscheck__ (abc:156)
------------[  (abc:156)
------------]  (abc:156)  0.044ms
----------. __subclasscheck__ (abc:160)
----------. __subclasscheck__ (abc:165)
------------[ ABCMeta.__subclasses__ (abc:165)
------------] ABCMeta.__subclasses__ (abc:165)  0.045ms
----------. __subclasscheck__ (abc:166)
------------[  (abc:166)
--------------> __subclasscheck__ (abc:134)
--------------. __subclasscheck__ (abc:137)
--------------. __subclasscheck__ (abc:140)
--------------. __subclasscheck__ (abc:144)
--------------. __subclasscheck__ (abc:147)
----------------[ ABCMeta.__subclasshook__ (abc:147)
----------------] ABCMeta.__subclasshook__ (abc:147)  0.042ms
--------------. __subclasscheck__ (abc:148)
--------------. __subclasscheck__ (abc:156)
----------------[  (abc:156)
----------------]  (abc:156)  0.043ms
--------------. __subclasscheck__ (abc:160)
--------------. __subclasscheck__ (abc:165)
----------------[ ABCMeta.__subclasses__ (abc:165)
----------------] ABCMeta.__subclasses__ (abc:165)  0.043ms
--------------. __subclasscheck__ (abc:166)
----------------[  (abc:166)
------------------> __subclasscheck__ (abc:134)
------------------. __subclasscheck__ (abc:137)
------------------. __subclasscheck__ (abc:140)
------------------. __subclasscheck__ (abc:144)
------------------. __subclasscheck__ (abc:147)
--------------------[ ABCMeta.__subclasshook__ (abc:147)
--------------------] ABCMeta.__subclasshook__ (abc:147)  0.044ms
------------------. __subclasscheck__ (abc:148)
------------------. __subclasscheck__ (abc:156)
--------------------[  (abc:156)
--------------------]  (abc:156)  0.049ms
------------------. __subclasscheck__ (abc:160)
------------------. __subclasscheck__ (abc:165)
--------------------[ ABCMeta.__subclasses__ (abc:165)
--------------------] ABCMeta.__subclasses__ (abc:165)  0.044ms
------------------. __subclasscheck__ (abc:166)
--------------------[  (abc:166)
----------------------> __subclasscheck__ (abc:134)
----------------------. __subclasscheck__ (abc:137)
----------------------. __subclasscheck__ (abc:140)
----------------------. __subclasscheck__ (abc:144)
----------------------. __subclasscheck__ (abc:147)
------------------------[ ABCMeta.__subclasshook__ (abc:147)
------------------------] ABCMeta.__subclasshook__ (abc:147)  0.043ms
----------------------. __subclasscheck__ (abc:148)
----------------------. __subclasscheck__ (abc:156)
------------------------[  (abc:156)
------------------------]  (abc:156)  0.042ms
----------------------. __subclasscheck__ (abc:160)
----------------------. __subclasscheck__ (abc:165)
------------------------[ ABCMeta.__subclasses__ (abc:165)
------------------------] ABCMeta.__subclasses__ (abc:165)  0.042ms
----------------------. __subclasscheck__ (abc:170)
------------------------[ set.add (abc:170)
------------------------] set.add (abc:170)  0.042ms
----------------------. __subclasscheck__ (abc:171)
----------------------< __subclasscheck__ (abc:171): False 1.574ms
--------------------]  (abc:166)  1.772ms
------------------. __subclasscheck__ (abc:165)
------------------. __subclasscheck__ (abc:170)
--------------------[ set.add (abc:170)
--------------------] set.add (abc:170)  0.042ms
------------------. __subclasscheck__ (abc:171)
------------------< __subclasscheck__ (abc:171): False 4.394ms
----------------]  (abc:166)  4.592ms
--------------. __subclasscheck__ (abc:165)
--------------. __subclasscheck__ (abc:166)
----------------[  (abc:166)
------------------> __subclasscheck__ (abc:134)
------------------. __subclasscheck__ (abc:137)
------------------. __subclasscheck__ (abc:140)
------------------. __subclasscheck__ (abc:144)
------------------. __subclasscheck__ (abc:147)
--------------------[ ABCMeta.__subclasshook__ (abc:147)
--------------------] ABCMeta.__subclasshook__ (abc:147)  0.042ms
------------------. __subclasscheck__ (abc:148)
------------------. __subclasscheck__ (abc:156)
--------------------[  (abc:156)
--------------------]  (abc:156)  0.044ms
------------------. __subclasscheck__ (abc:160)
------------------. __subclasscheck__ (abc:165)
--------------------[ ABCMeta.__subclasses__ (abc:165)
--------------------] ABCMeta.__subclasses__ (abc:165)  0.044ms
------------------. __subclasscheck__ (abc:166)
--------------------[  (abc:166)
----------------------> __subclasscheck__ (abc:134)
----------------------. __subclasscheck__ (abc:137)
----------------------. __subclasscheck__ (abc:140)
----------------------. __subclasscheck__ (abc:144)
----------------------. __subclasscheck__ (abc:145)
----------------------< __subclasscheck__ (abc:145): False 0.350ms
--------------------]  (abc:166)  0.553ms
------------------. __subclasscheck__ (abc:165)
------------------. __subclasscheck__ (abc:170)
--------------------[ set.add (abc:170)
--------------------] set.add (abc:170)  0.043ms
------------------. __subclasscheck__ (abc:171)
------------------< __subclasscheck__ (abc:171): False 2.682ms
----------------]  (abc:166)  2.876ms
--------------. __subclasscheck__ (abc:165)
--------------. __subclasscheck__ (abc:170)
----------------[ set.add (abc:170)
----------------] set.add (abc:170)  0.042ms
--------------. __subclasscheck__ (abc:171)
--------------< __subclasscheck__ (abc:171): False 9.633ms
------------]  (abc:166)  9.855ms
----------. __subclasscheck__ (abc:165)
----------. __subclasscheck__ (abc:166)
------------[  (abc:166)
--------------> __subclasscheck__ (abc:134)
--------------. __subclasscheck__ (abc:137)
--------------. __subclasscheck__ (abc:140)
--------------. __subclasscheck__ (abc:144)
--------------. __subclasscheck__ (abc:147)
----------------[ ABCMeta.__subclasshook__ (abc:147)
----------------] ABCMeta.__subclasshook__ (abc:147)  0.042ms
--------------. __subclasscheck__ (abc:148)
--------------. __subclasscheck__ (abc:156)
----------------[  (abc:156)
----------------]  (abc:156)  0.043ms
--------------. __subclasscheck__ (abc:160)
--------------. __subclasscheck__ (abc:165)
----------------[ ABCMeta.__subclasses__ (abc:165)
----------------] ABCMeta.__subclasses__ (abc:165)  0.043ms
--------------. __subclasscheck__ (abc:170)
----------------[ set.add (abc:170)
----------------] set.add (abc:170)  0.042ms
--------------. __subclasscheck__ (abc:171)
--------------< __subclasscheck__ (abc:171): False 1.562ms
------------]  (abc:166)  1.755ms
----------. __subclasscheck__ (abc:165)
----------. __subclasscheck__ (abc:166)
------------[  (abc:166)
--------------> __subclasscheck__ (abc:134)
--------------. __subclasscheck__ (abc:137)
--------------. __subclasscheck__ (abc:140)
--------------. __subclasscheck__ (abc:144)
--------------. __subclasscheck__ (abc:147)
----------------[ ABCMeta.__subclasshook__ (abc:147)
----------------] ABCMeta.__subclasshook__ (abc:147)  0.043ms
--------------. __subclasscheck__ (abc:148)
--------------. __subclasscheck__ (abc:156)
----------------[  (abc:156)
----------------]  (abc:156)  0.043ms
--------------. __subclasscheck__ (abc:160)
--------------. __subclasscheck__ (abc:165)
----------------[ ABCMeta.__subclasses__ (abc:165)
----------------] ABCMeta.__subclasses__ (abc:165)  0.042ms
--------------. __subclasscheck__ (abc:170)
----------------[ set.add (abc:170)
----------------] set.add (abc:170)  0.043ms
--------------. __subclasscheck__ (abc:171)
--------------< __subclasscheck__ (abc:171): False 1.569ms
------------]  (abc:166)  1.772ms
----------. __subclasscheck__ (abc:165)
----------. __subclasscheck__ (abc:166)
------------[  (abc:166)
--------------> __subclasscheck__ (abc:134)
--------------. __subclasscheck__ (abc:137)
--------------. __subclasscheck__ (abc:140)
--------------. __subclasscheck__ (abc:144)
--------------. __subclasscheck__ (abc:147)
----------------[ ABCMeta.__subclasshook__ (abc:147)
----------------] ABCMeta.__subclasshook__ (abc:147)  0.042ms
--------------. __subclasscheck__ (abc:148)
--------------. __subclasscheck__ (abc:156)
----------------[  (abc:156)
----------------]  (abc:156)  0.043ms
--------------. __subclasscheck__ (abc:160)
--------------. __subclasscheck__ (abc:165)
----------------[ ABCMeta.__subclasses__ (abc:165)
----------------] ABCMeta.__subclasses__ (abc:165)  0.042ms
--------------. __subclasscheck__ (abc:170)
----------------[ set.add (abc:170)
----------------] set.add (abc:170)  0.042ms
--------------. __subclasscheck__ (abc:171)
--------------< __subclasscheck__ (abc:171): False 1.647ms
------------]  (abc:166)  1.842ms
----------. __subclasscheck__ (abc:165)
----------. __subclasscheck__ (abc:170)
------------[ set.add (abc:170)
------------] set.add (abc:170)  0.043ms
----------. __subclasscheck__ (abc:171)
----------< __subclasscheck__ (abc:171): False 18.252ms
--------]  (abc:166)  18.443ms
------. __subclasscheck__ (abc:165)
------. __subclasscheck__ (abc:166)
--------[  (abc:166)
----------> __subclasscheck__ (abc:134)
----------. __subclasscheck__ (abc:137)
----------. __subclasscheck__ (abc:140)
----------. __subclasscheck__ (abc:144)
----------. __subclasscheck__ (abc:147)
------------[ ABCMeta.__subclasshook__ (abc:147)
------------] ABCMeta.__subclasshook__ (abc:147)  0.043ms
----------. __subclasscheck__ (abc:148)
----------. __subclasscheck__ (abc:156)
------------[  (abc:156)
------------]  (abc:156)  0.044ms
----------. __subclasscheck__ (abc:160)
----------. __subclasscheck__ (abc:165)
------------[ ABCMeta.__subclasses__ (abc:165)
------------] ABCMeta.__subclasses__ (abc:165)  0.044ms
----------. __subclasscheck__ (abc:166)
------------[  (abc:166)
--------------> __subclasscheck__ (abc:134)
--------------. __subclasscheck__ (abc:137)
--------------. __subclasscheck__ (abc:140)
--------------. __subclasscheck__ (abc:144)
--------------. __subclasscheck__ (abc:147)
----------------[ ABCMeta.__subclasshook__ (abc:147)
----------------] ABCMeta.__subclasshook__ (abc:147)  0.044ms
--------------. __subclasscheck__ (abc:148)
--------------. __subclasscheck__ (abc:156)
----------------[  (abc:156)
----------------]  (abc:156)  0.043ms
--------------. __subclasscheck__ (abc:160)
--------------. __subclasscheck__ (abc:165)
----------------[ ABCMeta.__subclasses__ (abc:165)
----------------] ABCMeta.__subclasses__ (abc:165)  0.044ms
--------------. __subclasscheck__ (abc:166)
----------------[  (abc:166)
------------------> __subclasscheck__ (abc:134)
------------------. __subclasscheck__ (abc:137)
------------------. __subclasscheck__ (abc:140)
------------------. __subclasscheck__ (abc:144)
------------------. __subclasscheck__ (abc:147)
--------------------[ ABCMeta.__subclasshook__ (abc:147)
--------------------] ABCMeta.__subclasshook__ (abc:147)  0.045ms
------------------. __subclasscheck__ (abc:148)
------------------. __subclasscheck__ (abc:156)
--------------------[  (abc:156)
--------------------]  (abc:156)  0.043ms
------------------. __subclasscheck__ (abc:160)
------------------. __subclasscheck__ (abc:165)
--------------------[ ABCMeta.__subclasses__ (abc:165)
--------------------] ABCMeta.__subclasses__ (abc:165)  0.043ms
------------------. __subclasscheck__ (abc:170)
--------------------[ set.add (abc:170)
--------------------] set.add (abc:170)  0.043ms
------------------. __subclasscheck__ (abc:171)
------------------< __subclasscheck__ (abc:171): False 1.624ms
----------------]  (abc:166)  1.867ms
--------------. __subclasscheck__ (abc:165)
--------------. __subclasscheck__ (abc:170)
----------------[ set.add (abc:170)
----------------] set.add (abc:170)  0.041ms
--------------. __subclasscheck__ (abc:171)
--------------< __subclasscheck__ (abc:171): False 3.866ms
------------]  (abc:166)  4.063ms
----------. __subclasscheck__ (abc:165)
----------. __subclasscheck__ (abc:170)
------------[ set.add (abc:170)
------------] set.add (abc:170)  0.043ms
----------. __subclasscheck__ (abc:171)
----------< __subclasscheck__ (abc:171): False 5.968ms
--------]  (abc:166)  6.159ms
------. __subclasscheck__ (abc:165)
------. __subclasscheck__ (abc:170)
--------[ set.add (abc:170)
--------] set.add (abc:170)  0.042ms
------. __subclasscheck__ (abc:171)
------< __subclasscheck__ (abc:171): False 31.110ms
----< __instancecheck__ (abc:130): False 32.160ms
--]  (_cprequest:791)  32.350ms

11/19/10

Permalink 01:08:45 am, by fumanchu Email , 1007 words   English (US)
Categories: Python, CherryPy

logging.statistics

Statistics about program operation are an invaluable monitoring and debugging tool. How many requests are being handled per second, how much of various resources are in use, how long we've been up. Unfortunately, the gathering and reporting of these critical values is usually ad-hoc. It would be nice if we had 1) a centralized place for gathering statistical performance data, 2) a system for extrapolating that data into more useful information, and 3) a method of serving that information to both human investigators and monitoring software. I've got a proposal. Let's examine each of those points in more detail.

Data Gathering

Just as Python's logging module provides a common importable for gathering and sending messages, statistics need a similar mechanism, and one that does not require each package which wishes to collect stats to import a third-party module. Therefore, we choose to re-use the logging module by adding a statistics object to it.

That logging.statistics object is a nested dict:

import logging
if not hasattr(logging, 'statistics'): logging.statistics = {}

It is not a custom class, because that would 1) require apps to import a third-party module in order to participate, 2) inhibit innovation in extrapolation approaches and in reporting tools, and 3) be slow. There are, however, some specifications regarding the structure of the dict.

    {
   +----"SQLAlchemy": {
   |        "Inserts": 4389745,
   |        "Inserts per Second":
   |            lambda s: s["Inserts"] / (time() - s["Start"]),
   |  C +---"Table Statistics": {
   |  o |        "widgets": {-----------+
 N |  l |            "Rows": 1.3M,      | Record
 a |  l |            "Inserts": 400,    |
 m |  e |        },---------------------+
 e |  c |        "froobles": {
 s |  t |            "Rows": 7845,
 p |  i |            "Inserts": 0,
 a |  o |        },
 c |  n +---},
 e |        "Slow Queries":
   |            [{"Query": "SELECT * FROM widgets;",
   |              "Processing Time": 47.840923343,
   |              },
   |             ],
   +----},
    }

The logging.statistics dict has strictly 4 levels. The topmost level is nothing more than a set of names to introduce modularity. If SQLAlchemy wanted to participate, it might populate the item logging.statistics['SQLAlchemy'], whose value would be a second-layer dict we call a "namespace". Namespaces help multiple emitters to avoid collisions over key names, and make reports easier to read, to boot. The maintainers of SQLAlchemy should feel free to use more than one namespace if needed (such as 'SQLAlchemy ORM').

Each namespace, then, is a dict of named statistical values, such as 'Requests/sec' or 'Uptime'. You should choose names which will look good on a report: spaces and capitalization are just fine.

In addition to scalars, values in a namespace MAY be a (third-layer) dict, or a list, called a "collection". For example, the CherryPy StatsTool keeps track of what each worker thread is doing (or has most recently done) in a 'Worker Threads' collection, where each key is a thread ID; each value in the subdict MUST be a fourth dict (whew!) of statistical data about
each thread. We call each subdict in the collection a "record". Similarly, the StatsTool also keeps a list of slow queries, where each record contains data about each slow query, in order.

Values in a namespace or record may also be functions, which brings us to:

Extrapolation

def extrapolate_statistics(scope):
    """Return an extrapolated copy of the given scope."""
    c = {}
    for k, v in scope.items():
        if isinstance(v, dict):
            v = extrapolate_statistics(v)
        elif isinstance(v, (list, tuple)):
            v = [extrapolate_statistics(record) for record in v]
        elif callable(v):
            v = v(scope)
        c[k] = v
    return c

The collection of statistical data needs to be fast, as close to unnoticeable as possible to the host program. That requires us to minimize I/O, for example, but in Python it also means we need to minimize function calls. So when you are designing your namespace and record values, try to insert the most basic scalar values you already have on hand.

When it comes time to report on the gathered data, however, we usually have much more freedom in what we can calculate. Therefore, whenever reporting tools fetch the contents of logging.statistics for reporting, they first call extrapolate_statistics (passing the whole statistics dict as the only argument). This makes a deep copy of the statistics dict so that the reporting tool can both iterate over it and even change it without harming the original. But it also expands any functions in the dict by calling them. For example, you might have a 'Current Time' entry in the namespace with the value "lambda scope: time.time()". The "scope" parameter is the current namespace dict (or record, if we're currently expanding one of those instead), allowing you access to existing static entries. If you're truly evil, you can even modify more than one entry at a time.

However, don't try to calculate an entry and then use its value in further extrapolations; the order in which the functions are called is not guaranteed. This can lead to a certain amount of duplicated work (or a redesign of your schema), but that's better than complicating the spec.

After the whole thing has been extrapolated, it's time for:

Reporting

A reporting tool would grab the logging.statistics dict, extrapolate it all, and then transform it to (for example) HTML for easy viewing, or JSON for processing by Nagios etc (and because JSON will be a popular output format, you should seriously consider using Python's time module for datetimes and arithmetic, not the datetime module). Each namespace might get its own header and attribute table, plus an extra table for each collection. This is NOT part of the statistics specification; other tools can format how they like.

Turning Collection Off

It is recommended each namespace have an "Enabled" item which, if False, stops collection (but not reporting) of statistical data. Applications SHOULD provide controls to pause and resume collection by setting these entries to False or True, if present.

Usage

    import logging
    # Initialize the repository
    if not hasattr(logging, 'statistics'): logging.statistics = {}
    # Initialize my namespace
    mystats = logging.statistics.setdefault('My Stuff', {})
    # Initialize my namespace's scalars and collections
    mystats.update({
        'Enabled': True,
        'Start Time': time.time(),
        'Important Events': 0,
        'Events/Second': lambda s: (
            (s['Important Events'] / (time.time() - s['Start Time']))),
        })
    ...
    for event in events:
        ...
        # Collect stats
        if mystats.get('Enabled', False):
            mystats['Important Events'] += 1

09/22/10

Permalink 03:36:05 pm, by fumanchu Email , 1233 words   English (US)
Categories: IT, Python

A replacement for sessions

I'm tired of sessions. They lock for too long, reducing concurrency, and in my current case, don't fail gracefully when a request takes longer than the session timeout.

Problem: Session locks

Session implementations typically lock very near the beginning of a request, and unlock near the end of a request. They tend to do this even if the current request handler does no writing to the session. Why so aggressive? Because the typical test case trotted out for sessions is that of a page hit counter: session.counter += 1. What if the user opens two tabs pointing at the same page at once? The count might be off by one!

But if you don't do any counting, what's the benefit of such aggressive, synchronous locking? What we could really use is a system that used atomic commits instead of large, pessimistic locks.

Problem: Session timeouts

Sessions are often used for sites with thousands, even millions, of users. When any one of those users walks away from their computer, the servers usually try to free up resources by expiring any such inactive sessions. But lots of my admin-y sites have a few dozen users, not thousands. I'm just not that concerned with expiration of session state. I'm a little bit concerned, still, with cookies, so I still want to expire auth tokens. But there's no need to aggressively expire user data. But I find my current apps are so aggressive at expiring data that we frequently get errors in production where request A locked the session, and while it was processing a large job, request B locked the session because A was taking too long. B finishes normally, but then A chokes because it had the session lock forcibly taken away from it. Not fun.

What we could really use is a system that allows tokens to expire, or be reused concurrently, without forcing user data to expire or other, concurrent processes to choke.

Problem: Session conflation

Sessions are used for more than one kind of data. In my current apps, it's used to store:

  1. Cookie tokens. In fact, the session id is the cookie name.
  2. Common user information, like user id, name, and permissions, and
  3. Workflow state, such as when a user builds up an action over multiple pages using multiple forms.

The problem is that each of these three kinds of data has a different lifecycle. The session id tends to get recreated often as sessions and cookies time out (taking all of the rest of the data with it). The user info tends to change very rarely, being nearly read-only, but is often read on every page request (for example, to display the user's name in a corner, or to apply the user's timezone to time output). Workflow data, in contrast, persists for a few seconds or minutes as the user completes a particular task, and is then discardable at the end of the process; it never needs concurrency isolation, because the user is working synchronously through a single task.

Sessions traditionally lump all of these together into a single bag of attributes, and place the entire bag under a single large lock. What we could really use is a solution that had finer-grained control over locking for each kind of data, even for each kind of info or workflow!

Solution: Slates

We can achieve all of the above by abandoning sessions. Let's face it: sessions were cool when they were invented but they're showing their age. And rather than try to patch them up and keep calling them "sessions", I'm inventing something new: "slates".

I'm implementing slates in MongoDB, but you don't have to in order to get the benefits of slates. All you need is some sort of storage that uses atomic commits, and that allows you to partition such that you have a moderate number of "collections" (one for each user, plus a special "_auth" collection), and a moderate number of "documents" (one for each use case) in each collection. Let's look at an example:


$ mongo
MongoDB shell version: 1.6.2
connecting to: 127.0.0.1/test
> use slates
switched to db slates
> show collections
_auth
admin
> db.admin.find()
{ "_id" : "user", "userid" : 999, "readonly" : false,
  "timezone" : null, "panels" : [
    [1, "pollingpoint"],
    [2, "unsampled"],
    [4, "test_redirect"],
    [6, "test_redirect_manual"]
], "staff" : true }
{ "_id" : "new_id_set", "name" : "My set",
  "ids" : [ 84095, 3943, 39845, 112, 9458, ... ] }

As you can see, there is a collection for the username "admin". It contains 2 documents.

User info

The first returned document is what I called "user info" above: things most pages want to know about the logged-in user. They're read for almost every request but changed hardly ever, and when they're read, it's very near the beginning of the request. Here's the Python code I use to grab the whole document:

request.user = Slate(username).user

...which is API sugar for:

request.user = pool.slates[username].find_one('user') or {}

Most pages perform this quick read and never write it back.

Workflow data

The second document returned above is workflow data for a domain-specific process I called 'new_id_set': the user uploads a large number of id's in a CSV file and gives them a name. But if there are problems with a few of the id's, we want to ask the user whether to discard the conflicts or continue anyway. But we don't want to go making records in our Postgres database tables until the numbers are confirmed, and it's prohibitive to have the client upload the same file again after confirmation. So we need a temporary place to stick this data while the user is in the middle of the activity.

Slates to the rescue! Unlike sessions, which tend to dump all their data into a single big bag, when we use slates we store our data in multiple 'bags'. That means that our user can upload their ids, be prompted for confirmation, go elsewhere to investigate the conflicts further, and come back and confirm the ids. The time they spend investigating incurs no performance penalty, because those pages don't load and re-save the 'new_id_set' slate--only the pages directly concerned with that particular slate do. Once the user has confirmed the upload, the slate is deleted.

Auth tokens

Most of the use cases for slates fit nicely into "user slates"; that is, a collection that is identified by the user's username. But when you receive an auth token in a cookie, how do you match it to a username so you can look up the slate?

The answer is to create a special, global slate which I named "_auth" in my implementation. You can name it whatever you like. This collection contains a map from tokens to usernames:


> db._auth.find()
{ "_id" : "abcdef09345", "token" : "94ee8f572",
  "username" : "admin",
  "expires" : "Wed Sep 22 2010 13:39:51 GMT-0700 (PDT)"}

When a user visits a page, their token is searched for in the "_auth" collection, the username is retrieved, and that value is stored for the request. Typically, their "user info" slate is then retrieved. Finally, if they are visiting a page that participates in a slate-based workflow, that slate is retrieved (and saved if any changes are made).

Conclusion

Slates provide finer-grained locking than sessions in order to meet the varying needs of auth tokens, user info, and workflow data. They lock for much shorter durations, over smaller scopes, and take advantage of the native atomicity of the storage layer (MongoDB, in my case) allowing much more parallelism between requests.

09/01/10

Permalink 02:26:27 pm, by fumanchu Email , 130 words   English (US)
Categories: IT

Shoji Catalog Protocol version 2

I've updated the Shoji Catalog Protocol to draft version 02. See http://www.aminus.org/rbre/shoji/shoji-draft-02.txt

The only significant change is that shojiCatalogs, shojiFragments, and shojiViews elements now use an object instead of an array for their IRI's. That is, instead of:

{"element": "shoji:catalog",
 "self": "http://example.org/users",
 "catalogs": ["bills", "sellers", "sellers{?sold_count}"],
}

one would now write something like:

{"element": "shoji:catalog",
 "self": "http://example.org/users",
 "catalogs": {"bills": "bills",
              "sellers": "sellers",
              "sellers by sold count": "sellers{?sold_count}"
              },
}

This allows clients to bind to a more meaningful name across varying documents rather than a potentially opaque and varying URI. In this way, the names function somewhat like link relation types (e.g. the "rel" attributes in HTML, or the relation types in Link headers).

03/11/10

Permalink 08:29:40 pm, by fumanchu Email , 11 words   English (US)
Categories: IT, Python, CherryPy

Zen of CherryPy video

My PyCon 2010 talk video is up. Enjoy: The Zen of CherryPy

07/14/09

Permalink 08:10:47 pm, by fumanchu Email , 43 words   English (US)
Categories: IT, Python, Dejavu, CherryPy

The Ronacher Manifesto

Link: http://lucumr.pocoo.org/2009/7/14/free-vs-free

I heartily agree with the bold bits at least:

So dear users: Use my stuff, have fun with it. And letting me know that you're doing is the best reward I can think of. And if you can contribute patches, that's even better.

06/03/09

Permalink 01:47:57 pm, by fumanchu Email , 624 words   English (US)
Categories: IT, General

Code overload

I'm tired of codes.

By "code" I mean a mapping from one set of terms to another.

IDValue
1Active
2Inactive
3Closed

That's a code.

Codes are good for reducing space and/or time if you really need to. A 4-byte integer takes less space than an 8-byte+overhead string. 'grep -u' takes less typing than 'grep --unix-byte-offsets'

Codes are good if names vary. Internationalization techniques like gettext map various translations to a single key (often the phrase as rendered in the dominant language). But even within the same language, people change the names they use to refer to things all the time.

Codes are good at hiding information.

Whether you want them to or not.

That's a problem.

Because codes hide information, the user of the code, whether willing or not, has to have access to the code. That means either a copy of the mapping table in its entirety, or a copy of an algorithm for performing the mapping.

Some of these you can keep in your head, but there's only so much space in your head.

We invented paper to keep more of these than could fit in our brains, but paper is slower than brain.

We invented computers to manage the volume of paper but 'command --help' and 'man command' are still slower than brain.

If a code exists to save space but space becomes microscopically cheap, do you still need a code?

If a code exists to save a person time but the person wastes more time looking up the code than they save using it, do you still need a code?

If a code exists to save a computer time but the computer wastes more time looking up the code than it saves using it, do you still need a code?

Codes don't just introduce the cost of mapping. They're far worse. Codes take a domain A which has its own syntax (the relationship of one thing in the domain to another thing in the domain) and introduce a second domain B with its own syntax (again, within the domain), in addition to the new semantic (the relationship between domains). (A) <-> (B). That's 3 analytic elements in place of 1.

But it's even worse in information systems since domain A is probably already a set of names with its own set of referents to things in the real world R. So instead of (R) <-> (A) we now have (R) <-> (A) <-> (B). If I have to map from B to R, that's 6 sets of interactions I now need to understand. You're pushing the 7±2 boundary.

Names refer to things.

If you need a name to refer to a name, that's a code.

Codes add complexity.

If you have a choice, expose directly. Many of you don't have a choice because you still think the unix command line is the best UI ever. You need to get out more. There are UI's out there that can show you the mapping without interrupting your flow). Many of you don't have a choice because you think in C or some other close-to-the-metal language which requires manual memory management and lots of numbered wires. Please keep using codes there. But please don't bring them into high-level languages: we're better off without them.

My brain is full and I'm tired of being slowed down by codes that return worse than nothing for their investment. Please stop inventing new ones. I know, you're a computer scientist and that's what computer scientists do. But you're good at it (aren't you?), both authoring and using them. Most people aren't. The rest of us are busy.

05/05/09

Permalink 10:24:41 am, by fumanchu Email , 356 words   English (US)
Categories: IT

Somebody needs to discover JSON

In The text/plain Semantic Web, Benjamin Carlyle argues:

Perhaps the most important media type in an enterprise-scale or world-scale semantic web or REST architecture is text/plain. The text/plain type is essentially schema free, and allows a representation to be retrieved or PUT with little to no jargon or domain-specific knowledge required by server or client. It is applicable to a wide range of problems and contexts, and is easily consumed by tools and humans alike.

Substitute 'application/json' and that paragraph starts to make sense. But then, the author also says "To my mind the best resource in formatting and processing of simple text-compatible data types can be found in the specification for XML Schema." So perhaps I shouldn't be too hard on the poor refugee. He comes tantalizingly close:

Part of the problem that emerges is that text/plain is not specific enough. It doesn't have sub-types that are clearly tied to a specification document or standards body. This makes interoperability a potential nightmare of heuristic detection.

...and...

Another problem with using text/plain in its bare form is its default assumption of a US-ASCII character type. This can lead to obvious problems in a modern internationalised world.

Both of which JSON solves nicely: it has basic types and SHALL be encoded with a Unicode encoding (utf8 by default).

Again, ideally we would be making use of a well-defined standards body to own and maintain the media types used to communicate very basic information.

The IANA and IETF sound like well-defined standards bodies to me...

Perhaps the clearest indication that you are overusing text/plain is that you are experiencing an explosion in hyperlinks. When you start to need a document to provide links for consumers to find these text/plain-centric resources, you should probably consider incorporating the information directly into these documents themselves.

A. Hyperlinks are a Good Thing.

B. You should first consider providing hyperlinks in a machine-discoverable fashion; text/plain is not it. A nice version of "it" is using XHR to GET/PUT application/json resources.

C. Allow comments on your blog.

1 2 3 4 5 6 7 8 9 10 11 ... 17 >>

November 2017
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
      1 2 3 4
5 6 7 8 9 10 11
12 13 14 15 16 17 18
19 20 21 22 23 24 25
26 27 28 29 30    

Search

The requested Blog doesn't exist any more!

XML Feeds

free blog software