Programming Wilderness: April 2011

Practically all programs depend on libraries to execute. In most modern Unix-like systems, including Linux, programs are by default compiled to use dynamically linked libraries (DLLs). That way, you can update a library and all the programs using that library will use the new (hopefully improved) version if they can.

Dynamically linked libraries are typically placed in one a few special directories. The usual directories include /lib, /usr/lib, /lib/security for PAM modules,/usr/X11R6/lib for X-windows, and /usr/local/lib. You should use these standard conventions in your programs, in particular, except during debugging you shouldn't use value computed from the current directory as a source for dynamically linked libraries (an attacker may be able to add their own choice ``library'' values).

There are special conventions for naming libraries and having symbolic links for them, with the result that you can update libraries and still support programs that want to use old, non-backward-compatible versions of those libraries. There are also ways to override specific libraries or even just specific functions in a library when executing a particular program. This is a real advantage of Unix-like systems over Windows-like systems; I believe Unix-like systems have a much better system for handling library updates, one reason that Unix and Linux systems are reputed to be more stable than Windows-based systems.

On GNU glibc-based systems, including all Linux systems, the list of directories automatically searched during program start-up is stored in the file /etc/ld.so.conf. Many Red Hat-derived distributions don't normally include /usr/local/lib in the file /etc/ld.so.conf. I consider this a bug, and adding /usr/local/lib to/etc/ld.so.conf is a common ``fix'' required to run many programs on Red Hat-derived systems. If you want to just override a few functions in a library, but keep the rest of the library, you can enter the names of overriding libraries (.o files) in /etc/ld.so.preload; these ``preloading'' libraries will take precedence over the standard set. This preloading file is typically used for emergency patches; a distribution usually won't include such a file when delivered. Searching all of these directories at program start-up would be too time-consuming, so a caching arrangement is actually used. The program ldconfig(8) by default reads in the file /etc/ld.so.conf, sets up the appropriate symbolic links in the dynamic link directories (so they'll follow the standard conventions), and then writes a cache to /etc/ld.so.cache that's then used by other programs. So, ldconfig has to be run whenever a DLL is added, when a DLL is removed, or when the set of DLL directories changes; running ldconfig is often one of the steps performed by package managers when installing a library. On start-up, then, a program uses the dynamic loader to read the file /etc/ld.so.cache and then load the libraries it needs.

Various environment variables can control this process, and in fact there are environment variables that permit you to override this process (so, for example, you can temporarily substitute a different library for this particular execution). In Linux, the environment variable LD_LIBRARY_PATH is a colon-separated set of directories where libraries are searched for first, before the standard set of directories; this is useful when debugging a new library or using a nonstandard library for special purposes, but be sure you trust those who can control those directories. The variable LD_PRELOAD lists object files with functions that override the standard set, just as /etc/ld.so.preload does. The variable LD_DEBUG, displays debugging information; if set to ``all'', voluminous information about the dynamic linking process is displayed while it's occurring.

Permitting user control over dynamically linked libraries would be disastrous for setuid/setgid programs if special measures weren't taken. Therefore, in the GNU glibc implementation, if the program is setuid or setgid these variables (and other similar variables) are ignored or greatly limited in what they can do. The GNU glibc library determines if a program is setuid or setgid by checking the program's credentials; if the UID and EUID differ, or the GID and the EGID differ, the library presumes the program is setuid/setgid (or descended from one) and therefore greatly limits its abilities to control linking. If you load the GNU glibc libraries, you can see this; see especially the files elf/rtld.c and sysdeps/generic/dl-sysdep.c. This means that if you cause the UID and GID to equal the EUID and EGID, and then call a program, these variables will have full effect. Other Unix-like systems handle the situation differently but for the same reason: a setuid/setgid program should not be unduly affected by the environment variables set. Note that graphical user interface toolkits generally do permit user control over dynamically linked libraries, because executables that directly invoke graphical user inteface toolkits should never, ever, be setuid (or have other special privileges) at all.

Here’s a good discussion about reading an array of integers

Jon says:

I don't know of anything within BinaryReader which will read an array of integers, I'm afraid. If you read into a byte array you could then use Buffer.BlockCopy to copy those bytes into an int[], which is probably the fastest form of conversion - although it relies on the endianness of your processor being appropriate for your data.

Have you tried just looping round, calling BinaryReader.ReadInt32() as many times as you need to, and letting the file system do the buffering? You could always add a BufferedStream with a large buffer into the mix if you thought that would help.

Marc is of the opinion:

int[] original = { 1, 2, 3, 4 }, copy;
byte[] bytes;
using (var ms = new MemoryStream())
{
    using (var writer = new BinaryWriter(ms))
    {
        writer.Write(original.Length);
        for (int i = 0; i < original.Length; i++)
            writer.Write(original[i]);
    }
    bytes = ms.ToArray();
}
using (var ms = new MemoryStream(bytes))
using (var reader = new BinaryReader(ms))
{
    int len = reader.ReadInt32();
    copy = new int[len];
    for (int i = 0; i < len; i++)
    {
        copy[i] = reader.ReadInt32();
    }
}

Although personally I'd just read from the stream w/o BinaryReader. Actually, strictly speaking, if it was me I would use my own serializer, and just:

[ProtoContract]
public class Foo {
    [ProtoMember(1, Options = MemberSerializationOptions.Packed)]
    public int[] Bar {get;set;}
}

since this will have known endianness, handle buffering, and will use variable-length encoding to help reduce bloat if most of the numbers aren't enormous.

Programming Wilderness

Pages

Thursday, April 14, 2011

Dynamic Linked Libraries in Unix Systems

Thursday, April 7, 2011

Read an array of integers with a Binary Reader