« Web Site Process BusThe Fu Filter »

Python concurrency syntax


Permalink 02:19:41 pm, by admin Email , 408 words   English (US)
Categories: IT, Python

Python concurrency syntax

via Bill de hÓra, I ran across this thread on LtU wherein Peter Van Roy comments:

The real problem is not threads as such; it is threads plus shared mutable state. To solve this problem, it's not necessary to throw away threads. It is sufficient to disallow mutable state shared between threads (mutable state local to one thread is still allowed).

...and Allan McInnes adds:

The "problem with threads" lies in the current approach to sharing state by default, and "pruning away nondeterminism" to get a correctly functioning system.

...and "dbfaken" adds:

Perhaps we should have strong syntax distinctions for mutation.

Since the first versions of Dejavu (my Python mediated-DB/ORM), I've noticed that this "pruning away nondeterminism" approach is exactly the wrong direction for systems which are designed to be thread-safe; we could instead explore languages and systems which allow us to "prune away determinism". By that I mean, mutable state should not be shared between threads by default; any mutable state which needs to be shared should be explicitly declared as such. This would make systems like Dejavu much simpler to create, use, and maintain.

I've often wondered what a "strong syntax distinction for [shared] mutation" would look like in Python. The simplest solution would probably have to:

  1. Make class.__dict__'s immutable. This is a natural choice given the normal usage patterns of classes by developers in the wild: generally, a class exists to share methods between instances. There are valid use cases for classes which are mutable, but they are rare; perhaps a sentinel of some kind provided by object could re-enable mutability for classes, but it should be off by default.
  2. Make all module.__dict__'s immutable. This has already been suggested on python-dev (IIRC by GvR himself), although I believe it was suggested as a way to reduce monkeypatching.
  3. Provide a @shared annotation for explicitly declaring shared mutable data.

This is just one solution to a small set of use cases: threaded programs where the explicit shared state is small compared to the total lines of code. I haven't the experience to state whether such a model is inherently damaging to other concurrent needs and designs. It has the benefit, however, of having little impact on single-threaded programs.

Would such a feature help catapult Python into the "large systems" space?


Comment from: Doug Napoleone [Visitor] · http://www.dougma.com/

Question: What makes you think Python is not already big in the "large systems" space?
There are numerous examples of it already being there (Google, YouTube, SoE, DreamWorks, VMWare, EVE, etc, etc, etc, etc).

I do believe that a strict 'thread private by default' system would be fantastic for python. My only concern would be for those of us who like python for prototyping, and implement in other languages. The GIL can be a stumbling block already, and such a change would be a death blow to prototyping threaded systems for implementation in C++.

06/05/07 @ 16:02
Comment from: Collin Winter [Visitor] · http://oakwinter.com/code/

Any of the "make such-and-such immutable" ideas would greatly limit what I regard to be Python's greatest strength: testability. Currently I can manipulate a module's namespace to influence how it perceives (say) the os module, perhaps replacing rmdir() with a function that will test an otherwise-hard-to-test code path in my library.

I wish people would just accept that threading is not one of Python's strengths and move on -- or better, write a decent interprocess queue so that MP code is easier.

06/05/07 @ 16:03
Comment from: admin [Member] Email

Doug, I realize Python is already used widely for large systems. I simply mean that when concurrency-aware code is mentioned, Erlang et al are mentioned far more often than Python.

06/05/07 @ 16:06
Comment from: admin [Member] Email

Collin, could that not be solved with cooperative declaration between authors and users? You're right; allowing authors the sole power to @share would be to inter-thread sharing what Java's "private" modifier is to inter-class sharing--certainly not Pythonic. But in the cases where I've monkeypatched someone else's module, I would not have minded adding an explicit declaration to do so--in fact, I can see that being a beneficial construct for understanding and documentation:

import anotherlib

with shared(anotherlib):
anotherlib.os.rmdir = myrmdir

...especially if the name is fairly easy to grep for. Maybe we could name it "monkey" instead of "shared"? ;)

06/05/07 @ 16:54
Comment from: mike bayer [Visitor]

Hi Robert -

can you expound upon what the "shared" syntax would look like? suppose I declare "x=[]" at the top of my module. Now I use the "thread" module to run two threads, each using a worker function that randomly appends values to "x". How does the "shared" syntax prevent that list from being modified by both threads? does "x" create copies of itself local to each thread when it detects a modify operation ? how does making dict on classes/modules immutable have anything to do with that (unless you mean, no more module-level globals or class-level variables )?

06/05/07 @ 16:59
Comment from: admin [Member] Email

Hi Mike,

In my very humble musing, yes, each thread would get its own "x", but only because the "object" to which the name "x" refers is a mutable object. If "x" referred to an immutable, all threads could share it without any further intervention. This is admittedly tricky with Python's dynamic typing, but I think this automatic creation of threadlocals would only have to occur for globals and object attributes (and possibly cell references), not locals, and not function arguments.

I would imagine the copy would occur on the first get/set/del operation, just as it does for the current threading.local implementation.

In this scheme, if you don't make dict on classes/modules immutable, you end up localizing almost everything; you might as well go with multiple processes at that point. Maybe I should run through Dejavu (and perhaps CherryPy) and put some numbers to these vague intuitions of mine... ;)

06/05/07 @ 17:44
Comment from: mike bayer [Visitor]

oh. why not have dict on classes/modules be marked as "shareable"? ? other than you've now opened up a "non-thread-safe" collection....or is that the reason ?

also what does it really mean for class.dict to be immutable...does it mean I cant say MyClass.foo = something outside of the "class" declaration itself ?

06/05/07 @ 18:14
Comment from: admin [Member] Email

Right; avoiding "non-thread-safe" collections is the purpose of the idea, and class/module/instance dicts are collections. Proposing that classes and modules be immutable is an attempt to maximize the volume of code that can be shared.

And, yes, that would mean you can't say "MyClass.foo = something" outside of the class, at least not without an explicit "monkey" declaration either by the caller (user) or the callee (author) to mean, "this attribute is now thread-global". That seems a horrible thing for a user to be able to do, until you realize they already do it (see the os.rmdir example above).

06/05/07 @ 18:52

Leave a comment

Your email address will not be revealed on this site.

Your URL will be displayed.

Please enter the phrase "I am a real human." in the textbox above.
(Line breaks become <br />)
(Name, email & website)
(Allow users to contact you through a message form (your email will not be revealed.)
August 2020
Sun Mon Tue Wed Thu Fri Sat
 << <   > >>
2 3 4 5 6 7 8
9 10 11 12 13 14 15
16 17 18 19 20 21 22
23 24 25 26 27 28 29
30 31          


The requested Blog doesn't exist any more!

XML Feeds

blog software