2007年3月17日星期六

Note on Transactional Memory

As computer industry is transisting to the multicore world, parallel programming seems to be more and more important. What have perplexed generations of parallel programmers is the difficulty to write a parallel program. Not mention the complex lock/unlock mechanisms to protect shared datas, the released memory modeling make parallel programmers' life even harder.

To solve this big problem, and make parallel programming simpler, a new parallel expression methodology called transactional memory is proposed. TM system provide programmers with a new keyword 'atomic' to express the atomicity of a block of code. With this new keyword, advocateds of TM believe that parallel programms can be much easy to write and scale. In addition, it enables easy integration between parallel software components.

TM involves techniques from system runtime to operating system to underlying hardware. Currently there are purely software implemented TM, called STM; mostly hardware implemented TM, called HTM; hybrid software-hardware implementation, called HTM-STM; and hardware accelarate software implemented TM, called HATM.

For more details on TM, refer to http://acmqueue.com/modules.php?name=Content&pa=showpage&pid=444&page=2

Here are my points on TM:

1) TM is essentially a new keyword 'atomic', nothing else, really!

2) With this keyword, synchronization becomes a first-class(not a library call, but at language level) concept built in in programming languages, and visible to compiler. Programmers just express what they want rather than how to do it. This way, a clear interface is formed between programmers and underlying environment. So that the environment implementation can do what ever optimizatioin they want provided not changing the program's semantic.

3) So TM is just a concept which defines the interface between programmers and system. We can expect that future programming languages provide more separation between programmers(semantic) and system(underlying implementation), so that programmers just express the semantic, and system figure out how to archieve it. This will definitely lead to various higher level constructs to appear in programming languages, rather than some lower level primitives which will force programmers figure how to archieve a specific object. We can think of pthread_mutex_lock() as a programmer-figured-out implementation of atomic keyword. Anyway, hardware are being more intelligent these days, we must tell the underlying hardware/software what we want to do, rather than tell them do this and do that to make things done, so that they can make their decisions on how to do it best. This is especially true when there are others asking the hardware for service. Think about EPIC!

4) By atomicity, what do we mean? Is it the atomicity of the execution of a block of code, or the atomicity of the access of some shared data? Obviously, it is the latter. So why do we express the atomicity of access of shared data through the atomicity of execution of a block of code? Maybe we should make the atomic keyword applied on data structures. And compilers would understand us better.

set-uid and set-gid on directories

Althrough the effect of set-user-id and set-group-id bit on files is well known to unixers, their effect on directories is not broadly understood.

In linux, you can issue 'chmod u+s item' to set the suid bit on 'item', and 'chmod g+s item' to set the sgid bit. And then, you can check the permissions using 'ls -l'. The suid/sgid bit will be displayed as a 's' on the execution bit if the item has executable permission or as a 'S' otherwise. Suid and Sgid can also set using digital form of chmod, aka, 'chmod 4xxx item' to set suid bit and 'chmod 2xxx item' to set sgid bit. In addition, you can use 'chmod [u|g]+t item' or 'chmod 1xxx' to set the sticky bit. BTW, the sticky bit of a plain file is usually ignored in modern unixes.

The effect of suid/sgid bit on directories is that the items created under these directories would be owned by the parent's owner or group owner automatically.

This is especially useful when a group of users want to share some directories with each other but not anyone else. In such a case user should set their umask to 007 and make the shared directories owned by their group and with set-group-id bit set. Example applications include cvs/svn repositories.