The Low Point — Jeremy Allison Column Archive

Jeremy Allison Column Archives

A Tale of Two Standards

as Published in Open Sources 2.0

“It was the best of protocols, it was the worst of protocols, it
was the age of monopoly, it was the age of Free Software, it was the
epoch of openness, it was the epoch of proprietary lock-in, it was the
season of GNU, it was the season of Microsoft, it was the spring of
Linux, it was the winter of Windows….”

Samba is commonly used as the ‘glue’ between the separate worlds
of Unix and Windows, and because of that Samba developers have to intimately
understand the design and implementation decisions made in both systems.
It is no surprise that Samba is considered one of the most difficult
Free Software projects to understand and to join, only outclassed in
complexity by the voodoo black art of Linux kernel development. It really
isn’t that hard, however, once you look at the different standards implemented
in the two systems, (although some of the more decisions in Windows
can cause raised eyebrows).

In developing Samba we’re creating a bridge between the most popular
standards currently deployed in the computing world, the Unix/Linux
standard of POSIX and the Microsoft developed defacto standard of Win32.
In this article I want to examine these two standards from an application
programmer’s perspective. In doing so I thought it might be instructive
to look at the reasons why each of them exist, what the intention creating
the particular standard might have been, and how well they have stood
the test of time and the needs of programmers. A historical perspective
is very important as we look to the future and decide what standards
should we encourage governments and business to support and what effect
this will have on the software landscape in the early 21st Century.

“Standards; (noun) A flag, banner, or ensign, especially: An emblem
or flag of an army, raised on a pole to indicate the rallying point
in battle.”

The POSIX Standard

POSIX was named (like so many things in the Unix software world)
by Richard Stallman. It stands for “Portable Operating System Interface”
– X ; meaning a portable definition of a Unix like operating system
API. The reason for the existence of the POSIX standard is interesting,
and lies in the history of the Unix family of operating systems.

As is commonly known, Unix was first created at AT&T Bell labs
by Ken Thompson and Dennis Richie in 1969. Not originally designed for
commercialization, the source code was shipped to universities around
the world, most notably Berkeley in California. One of the world’s first
truly portable operating systems, Unix soon splintered into many different
versions as people modified the source code to meet their own requirements.
Once companies like Sun Microsystems and the original, pre-litigious
SCO (Santa Cruz Organization) began to commercialize Unix, the original
Unix system call application programming interface (API) remained the
core of the Unix system, but each company added proprietary extensions
to differentiate their own version of Unix. Thus began the first of
“the Unix wars” (I’m a veteran, but don’t get disability benefits
for the scars they caused). For independent software vendors (ISV’s)
such proprietary variants were a nightmare. You couldn’t assume that
code that ran correctly on one Unix would even compile on another.

During the late 1980’s, in an attempt to create a common API for
all Unix systems to fix this problem the POSIX set of standards was
born. Due to the fact that no one trusted any of the Unix vendors, the
Institute of Electrical and Electronic Engineers (IEEE) shepherded the
standards process and created the 1003 series of standards, known as
POSIX. The POSIX standards cover much more than the operating system
API’s, going into detail on system commands, shell scripting and many
other parts of what it means to be a Unix system. I’m only going to
discuss the programming API standard part of POSIX here, because as
a programmer that’s really the only part of it I care about on a day-to-day
basis.

Few people have actually seen an official POSIX standard document,
as the IEEE charges money for copies. Back before the Web became really
popular I bought one , just to take a look at what the real thing looked
like. It wasn’t cheap (a few hundred dollars as I recall). Amusingly
enough I don’t think Linus Torvalds ever read one or referred to it
when originally creating Linux; he used other vendors’ references to
it and manual page descriptions of what POSIX calls were supposed to
do.

Reading the paper POSIX standard however is very interesting. It
reads like a legal document; every line of every section is numbered
so it can be referred to in other parts of the text. It’s detailed.
Really detailed. The reason for such detail is that it was designed
to be a complete specification of how a Unix system has to behave when
called from an application program. The secret is that it was meant
to allow someone reading the specification to completely re-implement
their own version of a Unix operating system starting from scratch,
with nothing more than the POSIX spec. The goal is that if someone writes
an application that conforms to the POSIX specification, then the resulting
application can be compiled with no changes on any system that is POSIX
compliant. There is even a POSIX conformance suite, which allows a system
passing the tests to be officially branded a POSIX-compliant system.
This was created to reduce costs in government and business procurement
procedures. The idea was you specified “POSIX compliant” in your
software purchasing requests, and the cheapest system that had the branding
could be selected and it would satisfy the system requirement.

This ended up being less useful than it sounds, given that Microsoft
Windows NT has been branded POSIX-compliant and generic Linux has not.

Sounds wonderful, right? Unfortunately reality intruded its ugly
head somewhere along the way. Vendors didn’t want to give up their proprietary
advantages and so all of them pushed to get their particular implementation
of a feature into POSIX. As all vendors don’t have implementations of
all parts of the standard this means that many of the features in POSIX
are optional, usually just the one you need for your particular application.
How can you tell if a particular implementation of POSIX has the feature
you need? If you’re lucky, you can test for it at compile time.

The GNU project suffered from these “optional features” more
than most proprietary software vendors, as their software is intended
to be portable across as systems as possible. In order to make their
software portable across all the weird and wonderful POSIX variants
the wonderful suite of programs known as GNU autoconf were created.
The GNU autoconf system allows you to test to see if a feature exists,
or works correctly, before even compiling the code; thus allowing an
application programmer to degrade missing functionality gracefully (ie.
not failing at runtime).

Unfortunately not all features can be tested this way, as sometimes
a standard can give too much flexibility, thus causing massive runtime
headaches. One of the most instructive examples is in the pathconf()
call. The function prototype for pathconf() looks like this :

long pathconf(char *path, int name);

Here, “char *path” is a pathname on the system and “int name”
is a defined constant giving a configuration option you want to query.
The ones causing problems are the :

_PC_NAME_MAX
_PC_PATH_MAX

constants. _PC_NAME_MAX queries for the maximum number of characters
that can be used in a filename within a particular directory (specified
by “char *path”) on the system. _PC_PATH_MAX queries for the maximum
number of characters that can be used in a relative path from the particular
directory. On the surface there seems nothing wrong with this, until
you consider how Unix file systems are structured and put together.
A typical Unix file system looks like this :

Any of the directory nodes such as /usr/bin or /mnt could be a different
file system type, not the standard Unix file system (maybe even network
mounted). In the diagram, the “/mnt/msdos_dir” path has been mounted
from a partition containing an old MS-DOS style FAT file system type.
The maximum directory entry length on such a system is the old DOS 8.3
eleven character name. But below the Windows directory could be mounted
a different file system type with yet other differing maximum name restrictions
than 8.3, maybe an NFS mount from a different machine, for example on
the path : “/mnt/msdos_dir/nfs_dir“. Now the pathconf() can accommodate
these restrictions and tell your application about it, if you remember
to call it on every single possible path and path component your application
might use ! Hands up all application programmers who actually do this……
Yes, I thought so (you at the back, put your hand down. I know how you
do things in the USA “Star Wars” missile defense program but no
one programs in ADA anymore, plus your tests never work, OK ?). This
is an example of something that looks good on paper but in practical
terms almost no one would use in an actual application. I know we don’t
in Samba, not even in the “re-written from scratch with correctness
in mind” Samba4 implementation.

Now let’s look at an example of where POSIX gets it spectacularly
wrong, and why it happened.

First Implementation Past the Post

Any application program dealing with multiple access to files has
to deal with file locking. File locking has several potential strategies,
ranging from the “lock this file for my exclusive use” method, to
the “lock these 4 bytes at offset 23 as I’m going to be reading from
them soon” level of granularity. POSIX does implement this kind of
functionality via the fcntl() call, a sort of “jack of all trades”
for manipulating files (hence “fcntl -> file control”). It’s
not important exactly how to program this call, suffice it to say that
a code fragment to set up a byte range lock as described above looks
something like :

int fd = open("/path/to/file", O_RDWR);

…. set up “struct flock” structure to describe the kind of
byte range lock we need…

int ret = fcntl(fd, F_SETLKW, &flock_struct);

and if ret is zero, we got the lock. Looks simple, right? The byte
range lock we got on the region of the file is advisory. This means
that other processes can ignore it and are not restricted in reading
or writing the byte range covered by the region (that’s a difference
from the Win32 way of doing things, in which locks are mandatory; if
a lock is in place on a region no other process can write to that region
even if it doesn’t test for locks). An existing lock can be detected
by another process doing its own fcntl() call asking to lock its own
region of interest. Another useful feature is that once the file descriptor
open on the file (int fd in the example above) is closed then the lock
is silently removed. This is perfectly acceptable and a rational way
of specifying a file locking primitive, just what you’d want.

However, modern Unix processes are not single threaded any more,
they commonly consist of a collection of separate threads of execution,
separately scheduled by the kernel. Because the lock primitive has a
per-process scope, this means that if separate threads in the same process
ask for a lock over the same area it won’t conflict. In addition, because
the number of lock requests by a single process over the same region
is not recorded (according to the spec) then you can lock the region
ten times, but you only need to unlock it once. This is sometimes what
you want, but not always: consider a library routine that needs to access
a region of a file but doesn’t know if the calling processes has the
file open. Even if an open file descriptor is passed into the library,
the library code can’t take any locks, it can never know if it is safe
to unlock again without race conditions.

This is an example of a POSIX interface not being future-proofed
against modern techniques such as threading. A simple amendment to the
original primitive allowing a user-defined “locking context” (like
a process id) to be entered in the struct flock structure used to define
the lock would have fixed this problem, along with extra flags allowing
the number of locks per context to be recorded if needed.

But it gets worse. Consider the following code:

    int second_fd;
    int ret;
    struct flock lock;

    int fd = open("/path/to/file", O_RDWR);

    /* Set up the "struct flock" structure to describe the
    kind of byte range lock we need. */

    lock.l_type = F_WRLCK;
    lock.l_whence = SEEK_SET;
    lock.l_start = 0;
    lock.l_len = 4;
    lock.l_pid = getpid();

    ret = fcntl(fd, F_SETLKW, &lock);

    /* Assume we got the lock above (ie. ret == 0). */

    /* Get a second file descriptor open on
    the original file. Assume this succeeds. */

    second_fd = dup(fd);

    /* Now immediately close it again. */

    ret = close(second_fd);

What do you think the effect of this code on the lock created on
the first file descriptor should be (so long as the close() call returns
zero) ? If you answered “it should be silently removed when the second
file descriptor was closed”, congratulations, you have the same warped
mind as the people who implemented the POSIX spec. Yes, that’s correct.
Any successful close() call on any file descriptor referencing a file
with locks will drop all the locks on that file, even if they were obtained
on another, still open, file descriptor.

Let me be clear to everyone: this behavior is never what you would
want. Even experienced programmers are surprised by this behavior, because
it makes no sense. Even after I’ve described this to Linux kernel hackers
their response has been one of stunned silence, followed by a “but
why would it do that”?

In order to discover if this functionality was actually correctly
used by any application program or anything really depended on it, Andrew
Tridgell, the original author of Samba once hacked the kernel on his
Linux laptop to write a kernel debug message if ever this condition
occurred. After a week of continuous use he found one message logged.
When he investigated it turned out be be a bug in the “exportfs”
NFS file exporting command, where a library routine was opening and
closing the /etc/exports file that had been opened and locked by the
main exportfs code. Obviously the authors didn’t expect it to do that
either.

The reason is historical and reflects a flaw in the POSIX standards
process, in my opinion, one that hopefully won’t be repeated in the
future. I finally tracked down why this insane behavior was standardized
by the POSIX committee by talking to long-time BSD hacker and POSIX
standards committee member Kirk McKusick (he of the BSD daemon artwork).
As he recalls, AT&T brought the current behavior to the standards
committee as a proposal for byte-range locking, as this was how their
current code implementation worked. The committee asked other ISVs if
this was how locking should be done. The ISVs who cared about byte range
locking were the large database vendors such as Oracle, Sybase and Informix
(at the time). All of these companies did their own byte range locking
within their own applications, none of them depended on or needed the
underlying operating system to provide locking services for them. So
their unanimous answer was “we don’t care”. In the absence of any
strong negative feedback on a proposal, the committee added it “as-is”,
and took as the desired behavior the specifics of the first implementation,
the brain-dead one from AT&T.

The “first implementation past the post” style of standardization
has saddled POSIX systems with one of the most broken locking implementations
in computing history. My current hope is that eventually Linux can provide
a sane superset of this functionality that may then be adopted by other
Unixes and eventually find its way back into POSIX.

OK, having dumped on POSIX enough, let’s look at one of the things
that POSIX really got right, and is an example to follow in the future.

Future Proofing

One of the great successes of POSIX is the ease in which it has adapted
to the change from 32-bit to 64-bit computing. Many POSIX applications
were able to move to a 64-bit environment with very little or no change,
and the reason for that is abstract types.

In contrast to the Win32 API (which even has a bit-size dependency
in its very name), all of the POSIX interfaces are defined in terms
of abstract data types. A file size in POSIX isn’t described as a “32
bit integer” or even as a C language type of “unsigned int“, but
as the type of “off_t“. What is “off_t“? The answer depends
completely on the system implementation. On small or older systems it
is usually defined as a signed 32 bit integer (it’s used as a seek position
so it can have a negative value), on newer systems (Linux for example)
it’s defined as a signed 64-bit integer. So long as applications are
careful to only cast integer types to the correct “off_t” type and
use these for file size manipulation then the same application will
work on both small and large POSIX systems.

This wasn’t done all at once, as most commercial Unix vendors have
to provide binary compatibility to older applications running on newer
systems, so POSIX had to cope with both 32-bit file sized applications
running alongside newer 64-bit capable applications on the new 64-bit
systems. The way to make this work was decided at the Large File Support
working group, which finished its work during the mid 1990’s.

The transition to 64-bits was seen as a three stage process. Stage
one was the original old 32-bit applications; stage two was seen as
a transitional stage, where new versions of the POSIX interfaces were
introduced to allow newer applications to explicitly select 64-bit sizes,
and finally; stage three where all the original POSIX interfaces default
to being 64 bit clean.

As is usual in POSIX, the selection of what features to support was
made available using compile-time macro definitions that could be selected
by the application writer. The macros used were :

_LARGEFILE_SOURCE

If defined a few extra functions were made available to applications
to fix the problems in some older interfaces, but the default file access
was still 32-bit. This corresponds to stage one described above.

_LARGEFILE64_SOURCE

If this is defined then a whole new set of interfaces are available
to POSIX applications that can be explicitly selected for 64-bit file
access. These interfaces explicitly allow 64-bit file access and have
’64’ coded into their names. So open() becomes open64(),
lseek() becomes
lseek64(), and a new abstract data type called off64_t is
created and used instead of the off_t file size data type in such structures as
struct stat64. This corresponds to stage two.

_FILE_OFFSET_BITS

This represents stage three, and this macro can be either undefined
or set to the values 32 or 64. If undefined or set to 32 it corresponds
to stage one (_LARGEFILE_SOURCE). If set to 64 all the original interfaces
such as open() and lseek() are transparently mapped
to the 64-bit clean
interfaces. This is the end stage of porting to 64-bits, where the underlying
system is inherently 64-bit, and nothing special need be done to make
an application 64-bit aware. On a native 64-bit system that has no older
32-bit binary support this becomes the default.

As you can see, if a 32-bit POSIX application had no dependencies
on file size embedded in it, then simply adding the compile time flag
“-D_FILE_OFFSET_BITS=64” would allow a transparent port to a 64-bit
system. There are few such applications though, and Samba was not one
of them. We had to go through the stage two pain of using 64-bit interfaces
explicitly (which we did around 1998) before we could track down all
the bugs associated with moving to 64-bits. But we didn’t have to re-write
completely, and that I consider a success of the underlying standard.

This is an example of how the POSIX standard was farsighted enough
to define some interfaces that were so portable and clean that they
could survive a transition of underlying native CPU word length. Few
other standards can make that claim.

Wither POSIX ?

The POSIX standard has not stayed static; it has managed to evolve
(although some would argue too slowly) over time. A major step forward
was the establishment of the “Single Unix Specification” (SUS) which
is a superset of POSIX developed in 1998 and adopted by all the major
Unix vendors, shepherded by the Unix standards body “the Open Group”.
It was a great leap forward when this specification was finally made
available for free on the Web from the OpenGroup Web site at http://www.unix.org.
It certainly saved me from having to hunt down cheap POSIX specifications
in second hand bookshops in Mountain View, California.

The expanded SUS now covers such things as real-time programming,
concurrent programming via the POSIX thread (pthread) interfaces, internationalization
and localization, but unfortunately not file Access Control Lists (ACLs).
Sadly that specification was never fully agreed on, and so has never
made it into the official documents. Interestingly enough the SUS doesn’t
cover such things as the graphical user interface (GUI’s) elements,
as the history of Unix as primarily a server operating system meant
that GUI’s were never given the importance needed for Unix to become
a desktop system.

Looking at what happened with ACLs is instructive in considering
the future of POSIX and the SUS. Because ACLs were sorely needed in
real-world environments the individual Unix vendors such as SGI, Sun,
HP and IBM added them to their own Unix variants. But without a true
standards document they fell into their old evil ways and added them
with different specifications. Then along came Linux….

Linux changed everything. In many ways, the old joke is true that
Linux is the Unix defragmentation tool inspired by novice system administrators
coming to Unix from the Windows platform for the first time and asking
“where is the system defragmentation tool?”, the concept of a file
system designed well enough not to need one being outside their experience.
As Linux became more popular programs originally written for other Unixes
were first ported to it, then after a while were written for it and
then ported to other platforms. This happened to Samba, where Sun’s
SunOS on SPARC system was at first our primary user platform but after
five years or so rapidly migrated to Linux on Intel x86 systems. We
now develop almost exclusively on Linux, and from there port to other
Unix systems.

What this means is the Linux interfaces are starting to take over
as the most important standards for Unix-like systems to follow, in
some ways supplanting POSIX and the SUS. The ACL implementation for
Linux was added into the system at first via a patch by Andreas Grünbacher,
held externally to the main kernel tree. Finally it was adopted by the
main Linux vendors SuSE (now Novell) and Red Hat and has become part
of the official kernel. Other free Unix systems such as FreeBSD quickly
followed with their own implementation of the last draft of the POSIX
ACL specification, and now there are desktop GUI and other application
programs that use the Linux ACL interfaces. As this code is ported to
other systems the pressure is on them to conform to the Linux API’s,
not to any standards document. Sun have announced that their Solaris
10 on Intel release will run Linux applications “better than Linux”
and will be fully compatible at the system call level with Linux applications.
This means they must have mapped the Linux ACL interface onto the Solaris
one. Is that a good thing ?

In a world where Linux is rapidly becoming the dominant version of
Unix, does POSIX still have relevance, or should we just assume Linux
is the new POSIX ?

The Win32 (Windows) Standard

Win32 was named for an expansion of the older Microsoft Windows interface,
renamed the Win16 interface once Microsoft was shipping credible 32-bit
systems. I have a confession to make; in my career I completely ignored
the original 16-bit Windows on MS-DOS. At that time I was already working
on sane 32-bit systems (68000 based) and having to deal with the original
insane 8086 segmented architecture was too painful to contemplate. Win32
was Microsoft’s attempt to move the older architecture beyond the limitations
of MS-DOS and into something that could compete with Unix systems, and
to a large extent they succeeded spectacularly.

The original 16-bit Windows API added a common GUI on top of MS-DOS,
and also abstracted out the lower level MS-DOS interfaces so application
code had a much cleaner “C” interface to operating system services
(not that MS-DOS provided many of those). The Win32 Windows API was
actually the “application” level API (not the system call level;
I’ll discuss that in a moment) for a completely new operating system
that would soon be known as Windows NT (“New Technology”). This
new system was designed and implemented by Dave Cutler, the architect
of Digital Equipment Corporation’s VMS system, long a competitor to
Unix. It does share some similarities with VMS. The interface choice
for applications was very interesting, sitting on top of a system call
interface that looks like this :

The original idea behind the Windows NT kernel was that it could
host several different “subsystem” system call interfaces, providing
completely different application behavior from the same underlying kernel.
Thus it was meant to be a completely customizable operating system,
providing different kernel “personalities” any ISV might require.
The DOS subsystem and the (not shown) 16-bit Windows subsystem were
essential as they provided backwards compatibility for applications
running on MS-DOS and 16-bit Windows; the new operating system would
have gathered little acceptance had it not been able to run all the
old MS-DOS and Windows applications. The OS/2 subsystem was designed
to allow users of text mode OS/2 applications (which was at one time
a Microsoft product) to port them to Windows NT.

The two interesting subsystems are the original POSIX subsystem and
the new Win32 subsystem. The POSIX subsystem was added as the POSIX
standard had become very prevalent in procurement contracts. Many of
these valuable contracts were only available to systems that passed
the POSIX conformance tests. So Microsoft added a minimal POSIX subsystem
into the new Windows NT operating system. This original subsystem was,
I think it’s fair to say, deliberately crippled to make it not useful
for any real-world applications. Applications using it had no network
access and no GUI access, so although a POSIX compliant system might
be required in a procurement contract, there usually was no requirement
that the applications running on that system had to also be POSIX compliant.
This allowed new applications using the Microsoft preferred Win32 subsystem
to be used instead. All might not have been lost if Microsoft had documented
the internal subsystem interface, allowing third party ISVs to create
their own Windows NT kernel subsystems, but Microsoft kept this valuable
asset purely to themselves (there was one exception to this which I’ll
discuss below).

So let’s examine the Win32 standard API, the interface designed to
run on top of the Win32 kernel subsystem. It would be logical to assume
that, like the POSIX system calls, the calls defined in the Win32 API
would closely map to kernel level Win32 subsystem system calls. But
that would be incorrect. It turns out that, when released, the Win32
subsystem system call interface was completely undocumented. The calls
made from the application level Win32 API were translated, via various
shared libraries (“DLLs” in Windows parlance) mainly the NTDLL.DLL
library, into the real Win32 subsystem system calls.

Why do this, you might ask? Well the above board reason is that it
allows Microsoft to tune and modify the system call layer at will, improving
performance and adding features without being forced to provide backwards
compatibility application binary interfaces (or “ABI’s” for short).
The more nefarious reasoning is that it allows Microsoft applications
to cheat, and call directly into the undocumented Win32 subsystem system
call interface to provide services that competing applications cannot.
Several Microsoft applications were subsequently discovered to be doing
just that of course. One must always remember that Microsoft is not
just an operating systems vendor, but also the primary vendor of applications
that run on its own platforms. These days this is less of a problem,
as there are several books that document this system call layer, and
there are several applications that allow snooping on any Windows NT
kernel calls being made by applications, allowing any changes in this
layer to be quickly discovered and published. But it left a nasty taste
in the mouths of many early Windows NT developers (myself included).

The original Win32 application interface was on the surface very
well documented, and cheaply available in paper form (five books at
only twenty dollars each; a bargain compared to a POSIX specification).
Like most things in Windows, on the surface it looks great. It covers
much more than POSIX tries to standardize, and so offers flexible interfaces
for manipulating the GUI, graphics, sound, pen computing, as well as
all the standard system services such as file I/O, file locking, threading,
and security. Then you start to program with it. If you’re used to the
POSIX specifications you almost immediately notice something is different.
The details are missing. It’s fuzzy on the details. You notice it the
first time you call an API at runtime and it returns an error that’s
not listed anywhere in the API documentation. “That’s funny….?”
you think. With POSIX, all possible errors are listed in the return
codes section of the API call. In Win32, the errors are a “rough guide”.

The lack of detail is one of the reasons that the Wine project finds
it difficult to create a working implementation of the Win32 API on
Linux. How do you know when it’s done ? Remember that Linus with some
help was able to create a decent POSIX implementation within a few years.
The poor Wine developers have been laboring at this for twelve years
and it’s still not finished. There’s always one more wrinkle, one more
undocumented behavior that some critical application depends on. Reminds
me of Samba somehow, and for very similar reasons.

It’s not entirely Microsoft’s fault. They haven’t documented their
API because they haven’t needed to. POSIX was documented to this detail
due to need: the need of the developers creating implementations of
the standard. Microsoft know that whatever they make the API do in the
next service pack, that’s still the Win32 standard. “Where ever you
go, there you are”, so to speak.

However, the Win32 design does some things very well; security, for
instance. Security isn’t the number one thing people think of when considering
Windows, but in the Win32 API security is a very great concern. In Win32,
every object can be secured, and a property called a “Security Descriptor”
which contains an access control list (ACL) can be attached to it. This
means objects like processes, files, directories, even Windows can have
ACLs attached. This is much cleaner than POSIX, where only objects in
the file system can have ACLs attached to them.

So let’s look at a Win32 ACL. Like in POSIX, all users and groups
are identified by an unique identifier. On POSIX it’s a uid_t type for
users, and a gid_t type for groups. In Win32, both are of type SID or
security identifier. A process or thread in Win32 has a token attached
to it which lists the primary SID of the process owner and a list of
secondary group SID entries this user belongs to. Like in POSIX, this
is attached to a process at creation time and the owner can’t modify
it to give themselves more privileges. A Win32 ACL consists of a list
of SID entries with an attached bit mask identifying the operations
this SID entry allows or denies. Sounds reasonable, right? But the devil
is in the details.

Each SID entry in an ACL can be an allow entry, or a deny entry.
The order of them is important. Re-order a list of entries and swap
an deny entry with an allow entry and the meaning of the ACL can completely
change. POSIX ACLs don’t have that problem because the evaluation algorithm
defines the order in which entries are examined. In addition, the flags
defining the entry (marked as [f*] above) control if an entry is inherited
when the ACL is attached to a “container object” (such as a directory
in the file system) and may also affect other attributes of this particular
entry.

The bit mask enumerates the permissions that this entry is allowing
or denying. But the permissions are (naturally) different depending
on what object the ACL is attached to. Let’s look at the kind of permissions
available for a file object :

    DELETE                : Delete the object.
    READ_CONTROL          : Read the ACL on an object.
    WRITE_DAC             : Write the ACL on an object.
    FILE_READ_DATA        : Read from the file.
    FILE_READ_ATTRIBUTES  : Read file meta-data.
    FILE_READ_EA          : Read extended attributes 
                            (if the file has any)
    FILE_WRITE_DATA       : Write to the file
    FILE_WRITE_EA         : Write extended attributes 
                            (if the file has any)
    FILE_EXECUTE          : Open for execute 
                            (why do we need the .EXE tag then?)
    SYNCHRONIZE           : A permission related to an open 
                            file handle, not the file.

And this is one of the more simple kinds of permission bearing object
in Win32.

If the Win32 API treats security so seriously, why does Windows fail
most security tests in the real world? The answer is that most applications
ignore this wonderful, flexible security mechanism; because it’s just
too hard to use. Just like the problem with the POSIX pathconf() call.
No one can use it correctly; your application would degenerate into
a mess. It doesn’t help that Microsoft, having realized the APIs controlling
security were too hard to use, has been adding functions to simplify
this mess, sometimes adding new APIs with a new service pack. In addition,
they’ve been extending the underlying semantics of the security mechanism,
adding new flags and new behaviors as they moved into the “Active
Directory” world.

Try taking a look at the “file security dialog” in Windows 2000.
It’s incomprehensible. No one, especially a system administrator, can
keep track of this level of detail across their files. Everyone just
sets one default ACL on the root of a directory hierarchy and hopes
for the best. Most administrators usually want to do two simple things
with an ACL. Allow group “X” but not user “Y”, and allow group
“X” and also user “Z”. This is just about comprehensible with
POSIX ACLs, although they’re near the limit of the complexity people
can deal with. The Win32 security system is orders of magnitude more
complex than that; it’s hopelessly over designed. Computer scientists
love it, as it’s possible to do elegant little proofs of how secure
it is, but in the real world it’s simply too much to deal with effectively.
A great idea, adding ACLs to every system object, but a real shame about
the execution.

Just to spread the blame around, the networking “experts” who
designed the latest version of Sun’s network file system, NFS version
4, fell in love with this security mechanism and decided it would be
a great idea to add it into the NFSv4 specification. They probably thought
it would make interoperability with Windows easier. Of course they didn’t
notice that Microsoft had been busily extending the security mechanism
as Windows has developed, so they standardized on an old version of
the Windows ACL mechanism, as Microsoft documented it (not as it actually
works). So now the Unix world has to deal with this mess, or rather,
a new network file system with an ACL model that is almost but not quite
compatible with Windows ACLs, and completely alien to anything currently
found on Unix. I sometimes feel Unix programmers are their own worst
enemies.

The Tar Pit: Backwards Compatibility

Now, as an example of where Win32 got things spectacularly wrong,
I want to look at a horror from the past, that unfortunately got added
into the Win32 interfaces due to the MS-DOS heritage. My pet hate with
Win32 is the idea of “share modes” on open files. In my opinion,
this one single legacy design decision has probably done more to hold
back the development of cluster aware network file systems on Win32
systems than anything else.

Under POSIX, an open() call is very simple. It takes a pathname to
open, the way in which you want to access or create the file (read,
write or both with various create types) and a permission mask that
gets applied to files you do create. Under Win32, the equivalent call
CreateFile() takes seven parameters, and the interactions between them
can be ferociously complex. The parameter that causes all the trouble
is the “ShareMode” parameter. This can take values of any of the
following constants OR’ed together :

  FILE_SHARE_READ   : Allow others to open for read
  FILE_SHARE_WRITE  : Allow others to open for write
  FILE_SHARE_NONE   : Don't allow any other opens
  FILE_SHARE_DELETE : Allow open for delete intent

In order to make the semantics here work this means that any Windows
kernel dealing with a file open has to know the about every other application
on the system that might have this file open. This was fine back in
the single machine MS-DOS days, when these semantics were first designed,
but is a complete disaster when dealing with a clustered file system
where a multitude of connected file servers may want to give remote
access to the same file, even if they’re only serving out the file read-only
to applications. They have to consult some kind of distributed lock
management system in order to keep these MS-DOS inherited semantics
working. While this can be done, but it complicates the job enormously
and means cluster communication on every CreateFile() and
CloseHandle()
call.

This is the bane of backwards compatibility at work. This idea of
“share modes” arbitrating what access concurrent applications can
have to a file is the cause of many troubles on a Windows system. Ever
wonder why Windows has a mechanism built in to allow an application
to schedule a file to be moved, but only after a reboot? Share modes
in action. Why are some files on a Windows server system impossible
to back up due to “Another program is currently using this file”
errors? Share modes again. There is no security permission that can
prevent a user opening a file with effectively “deny all” permissions.
If you can open the file for read access, you can get a share mode on
it by design. Consider a network shared copy of Microsoft office. Any
user must be able to open the file “WINWORD.EXE” (the binary file
containing Microsoft Word) in order to execute it. Given these semantics
any user can open the file with READ_DATA access with the “ShareMode”
parameter set to FILE_SHARE_NONE and thus block use of the file., even
over the network. Imagine on a Unix system being able to open the /etc/passwd
file with a share mode and denying all other processes access. Watch
the system slowly grind to a halt as the other processes get stuck in
this tar pit….

World Domination, Fast

Now I’ve heaped enough opprobrium on Win32, let’s give it a break
and consider something the designers really did get right, and one of
the advantages it has over POSIX. I’m talking about the early adoption
of the UNICODE standard in Win32. When Microsoft was creating Win32
one of the things they realized was that this couldn’t just be another
English-only, American and European-centric standard, it had to be able
to not only cope with, but encourage, applications written in all world
languages (never accuse Microsoft of thinking small in their domination
of the computing world).

Given that criteria, their adoption of UNICODE as the native character
set for all the system calls in Win32 was a stroke of genius. Even though
the Asian countries aren’t particularly fond of UNICODE as it merges
several character sets they consider separate into one set of code points,
UNICODE is the best way to cope with the requirements of internationalization
and localization in application development.

In order to allow older MS-DOS and Win16 applications to run, the
Win32 API is available in two different forms, selectable by a compiler
#define of -DUNICODE (it also helps if you own the compiler market for
Windows, as Microsoft does, as you can standardize tricks like this).
The older code page based applications call Win32 libraries that internally
convert any string arguments to 16-bit UNICODE and then call the real
Win32 library interface, which like the Windows NT kernel, is UNICODE
only.

In addition to this Win32 comes with a full set of library interfaces
to split out the text messages an application may need to display into
resource files so ISVs can easily have them translated for a target
market. This eases the internationalization and localization burdens
considerably for vendors.

What is more useful, but not as obvious, is that making the Win32
standard natively use UNICODE meant developers were immediately confronted
with the requirements of multilingual code development. So many applications
written in English speaking or Western European 8-bit character set
compatible countries are badly written, making the assumption that a
character will always fit within one byte. The early versions of Samba
definitely had that mistake and retro-fitting multi-byte character set
handling into old code is a real bear to get right. I know as I was
the person who first had to work on this for Samba (later I got some
much needed help from Andrew) so I may be a little touchy on this subject.

Whenever I did Win32 development I immediately designed with non-English
languages in mind, and wrote everything with the abstract type TCHAR
(one of the few useful abstract types in Win32), which is selectable
at compile time using the UNICODE define to be either wchar_t with UNICODE
turned on or unsigned char with UNICODE turned off. Getting yourself
in the right multi-byte character set mindset from the beginning eliminates
a whole class of bugs that you get when having to convert a quick “English
only” hacked up program into something maintainable for different
languages. POSIX has been catching up over the years with the iconv()
functionality to cope with character set conversions, and the SUN designed
gettext() interfaces for localization but Win32 had it all right from
the start.

Wither Win32 ?

As with POSIX, the Win32 standard has not stayed static over time.
Microsoft have continued to develop and extend it, and have the advantage
that anything they publish immediately becomes the “standard”, as
is the case with all single vendor defined standards.

However Microsoft is attempting to deemphasize Win32 as they move
into their new .NET environment and the new world of “managed code”.
Managed code is code running under the control of an underlying virtual
machine (called the Common Language Infrastructure, or CLI in .NET)
and can be made to prevent the direct memory access that is the normal
mode of operation of an API designed for “C” coding, such as Win32
or POSIX. Free Software is also making a push into this area too, with
the “Mono” project which implements the Microsoft C# language and
.NET managed code environment on Linux and other POSIX systems.

Even if Microsoft are as successful as they hope in pushing ISV programmers
to convert to .NET and managed code using their new C# language, the
legacy of applications developed in C using the Win32 API will linger
for decades to come. ISV programmers are an ornery lot, especially people
who have mastered the Win32 API, due to it’s less than complete documentation.

What seems to happen over the years is that experienced Win32 programmers
gain this sort of folk-knowledge about the Win32 APIs, how they really
work versus what the documentation says. I often hang out on Usenet
Windows discussion groups and it’s very interesting to watch the attitudes
of the experienced Windows programmers. They usually hate telling novices
how stuff works, it’s almost like learning it was a badge of honor,
and they don’t want to make it too easy for the neophytes. The exude
an air of “They must suffer as I did”.

As Microsoft becomes less interested in Win32 with the release of
their new “Longhorn” Windows client and the move to managed code,
is it possible for them to lose control of it? The POSIX standard is
so complete because it was designed to allow programmers reading the
standards documents to re-create a POSIX system from scratch. The Win32
standard is nowhere near as well documented as that. However there is
hope in the Wine project, which is attempting to re-create a version
of the Win32 API that is binary compatible with the Windows on Intel
x86 systems. Wine is in effect a second implementation of the Win32
system making it closer to a true vendor independent standard. Efforts
taking place at companies like CodeWeavers and Transgaming Technologies
are very promising; I just finished playing the new Windows-only game
Half Life 2 on my desktop Linux system, using the Wine technology. This
is a significant achievement for the Wine code and bodes well for the
future.

Choosing a Standard

Between two evils, I always like to take the one I’ve never tried
before. –Mae West

So what should we choose when examining what standards to support
and develop applications for? What should we recommend to business and
governments who are starting to look closely at the Open Source/Free
Software options available?

What is important is when business and governments are selecting
products based on standards, they pay attention to open standards. No
more Microsoft Word “.DOC” format standards (which suffers from
the same problem as Win32 as being single vendor controlled). No de-facto
vendor standards, no matter how convenient. They need to select standards
that are at the same level as POSIX, namely standards to the level that
other implementations can be created from the documentation. It’s simple
to tell when a standard meets that criteria because other implementations
of it exist.

The interesting thing is that both POSIX and Win32 standards are
now available on both systems. On Linux we have the POSIX standard as
native, and the Wine project provides a binary compatible layer for
compiled Win32 programs, that can run many popular Win32 applications.
Perhaps more interestingly for programmers, the Wine project also includes
a Linux shared library “winelib”, which allows Win32 applications
to be built from source code form on POSIX systems. What you end up
with is an application that looks like a native Windows application,
but can be run on non-Intel platforms; something that early versions
of Windows NT used to support, but now is restricted to x86 compatible
processors. Taking your Win32 application and porting it using winelib
is an easy way to get your feet wet in the POSIX world, although it
won’t look like a native Linux application (this may be a positive thing
it your users are used to a Windows look and feel).

If you’ve already gone the .NET and C# route, then using the Mono
project may enable your code to run on POSIX systems.

On Windows, there is now a full POSIX subsystem, supported by Microsoft
and available for free. I alluded to Microsoft’s reluctance to release
the information on how to create new subsystems for the Windows NT kernel
above, but it turns out earlier in their history they were not so careful.
A small San Francisco based company, Softway Systems, licensed the documentation
and produced a product called OpenNT (later renamed Interix), which
was a replacement for Microsoft’s originally crippled POSIX subsystem.
Unfortunately OpenNT didn’t sell very well, someone cruelly referred
to it as having “all the application availability of Linux, with the
stability of Windows”. As the company was failing, Microsoft bought
it (probably to bring the real gem of the Windows kernel subsystem interface
knowledge back in house) and used it to create their “Services for
Unix” (SFU) product. SFU contains a full POSIX environment, with a
Software development kit allowing applications to be written that have
access to networking and GUI API’s. The applications written under it
run as full peers with the native Win32 applications and users can’t
tell the difference.

Recently Microsoft made SFU available as a free download to all Windows
users. I like to think the free availability of Samba had something
to do with this, but maybe I’m flattering the Samba Team too much. As
I like to say in my talks, “if you’re not piloting Samba on Linux
in your organization, you’re paying too much for your Microsoft software”.
But what this means is if you want to write a completely portable application,
the one standard you can count on to be there and fully implemented
and supported on Windows, Linux, Solaris, Apple MacOS X, HPUX, AIX,
IRIX and all the other Unix systems out there is POSIX. So if you’ll
excuse me, I’m going to look at porting parts of Samba to Windows……

Jeremy Allison,
Samba Team.
San Jose, California.
5th April 2005

Source link