| « The Timeless Way of Software | Resources are concepts for the client » |
Most programs, especially libraries and frameworks, need "configuration". But exactly how to implement that is a murky subject, mostly because the boundary between "configuration" and "code" is itself ill-defined.
So let's try to define it. The first thing you might notice is that the dictionary definition of "configuration", an arrangement of parts, is quite different from what you typically find in a modern "configuration file". For example, take a typical Apache httpd.conf file. It does contain several directives which identify components: LoadModule, for example. But far outweighing these are directives which set attributes, usually on an object or on the system as a whole. Directives like "Listen 80", "ThreadsPerChild 250", and "LogLevel debug", even though they could be implemented via arrangements of pieces, probably aren't. Instead, the values are most likely implemented as permanent cell variables which never appear, or move, or disappear, but instead only change in value. Even the LoadModule directive doesn't really arrange any pieces within a space; it merely identifies and includes them in an abstract set of "loaded modules". One might argue that the Location context directive deals with arrangements of URL's, but those aren't really arranged; they simply exist. You can't rearrange /path/to/resource to be above /path. No, the dictionary definition of "configuration" as "arrangement" is a holdover from our mostly-hardware past, where even the most dynamic "configuration system" still required moving cards and jumpers around in physical space.
There are some notable exceptions, of course, but the vast majority of software on the market today that is "configurable" consists of a fairly static set of objects, plus a formalized means of tweaking a subset of attributes of those objects. The most common exception to this, the "plugin", is also rarely arranged with respect to other plugins or components; instead, it is merely "turned on" or included. I believe this tendency is due to a natural human limitation: we just don't reason about graphs and networks very well yet, at least not nearly as well as we reason about vectors (of instructions) and sets. We feel good when working on serial problems, and bad when working on parallel ones. As Chris Alexander said:
There is little purpose, then, in saying: It would be better if this force did not exist. For if it does exist anyway, designs based on such wishful thinking will fail.
So then, let's discuss ways to implement this kind of "configuration". Again, let's look at Apache's httpd.conf: here we find almost a DSL, in that http_config.h defines functions to tokenize and parse a config file into another representation, a config vector. Then that intermediate structure is transformed into the actual used values like, say, request_rec->server->keep_alive_timeout.
Or take a typical postgresql.conf file. The entries therein are translated (via the ConfigureNamesBool array) to their internal variable names, and set globally. For example, check_function_bodies is implemented as an extern in guc.h. When a block of code needs to switch on the value of check_function_bodies, it #includes that header and reads the global value directly.
These designs carry with them several problems:
server package, for example, config.c has more lines of code than any other C module except core.c.There is a way to implement "configuration" as we have defined it above (setting values on named attributes) which avoids the above problems. Rather than defining a layer where external names, types, and values get translated to internal names, types, and values in an ad-hoc mapping, we can define a better translation step by obeying 3 very simple constraints:
For example, if you have an internal "database" object with a "default_encoding" string attribute, the conventional approach might yield a config file entry like:
DatabaseDefaultEncoding: utf8
But if we follow the above constraints, we instead see config entries like this:
database.default_encoding = 'utf8'
We can generalize that to:
(path.to.object).key = value
...and in fact, we can write a simple parser which performs just that mapping. In the simplest implementation, only the set of objects is defined, and the set of keys is open-ended (that is, any attribute of the given object(s) is overridable):
for key, value in config.pairs():
objname, attrname = key.rsplit(".", 1)
obj = configurables[objname]
setattr(obj, attrname, value)
In contrast to the conventional approach, in the "Direct Attribute Configuration" pattern:
That's enough for now; feel free to expand in the comments.
You might be interested in "The Nature of Lisp" over at defmacro: http://www.defmacro.org/ramblings/lisp.html
In order to explain the deal with lisp, the author looks deeply at configuration. In Java, people use xml for ANT, whereas the same thing could be lisp in lisp.
I like your hierarchical names.
You might have glossed-over the issue of how you might parse expression values for those names.
When does configuration need scripting capabilities? Sometimes the configuration format requirements grows into the need for a scripting language in its own right - RESIST THE URGE TO WRITE YOUR OWN mini scripting language.
And finally, check if you can re-use anothers configuration code/libraries.
- Paddy.
My takeaway lesson for this post is that configuration files can be implemented as valid python code.
config.py
=========
database = "mysql"
username = "lord"
password = "commander"
In framework.py
===============
import config;
connect (config.database,config.user, config.password)