Category: Kernel

Magical container_of() Macro

When you begin with the kernel, and you start to look around and read the code, you will eventually come across this magical preprocessor construct. What does it do? Well, precisely what its name indicates. It takes three arguments — a pointer, type of the container, and the name of the member the pointer refers to. The macro will then expand to a new address pointing to the container which accommodates the respective member. It is indeed a particularly clever macro, but how the hell can this possibly work? Let me illustrate …

The first diagram illustrates the principle of the container_of(ptr, type, member) macro for who might find the above description too clumsy.

Illustration of how containter_of macro works

Illustration of how containter_of macro works

Bellow is the actual implementation of the macro from Linux Kernel:

#define container_of(ptr, type, member) ({                      \
        const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
        (type *)( (char *)__mptr - offsetof(type,member) );})

At first glance, this might look like a whole lot of magic, but it isn’t quite so. Let’s take it step by step.

Statements in Expressions

The first thing to gain your attention might be the structure of the whole expression. The statement should return a pointer, right? But there is just some kind of weird ({}) block with two statements in it. This in fact is a GNU extension to C language called braced-group within expression. The compiler will evaluate the whole block and use the value of the last statement contained in the block. Take for instance the following code. It will print 5.

int x = ({1; 2;}) + 3;
printf("%d\n", x);

typeof()

This is a non-standard GNU C extension. It takes one argument and returns its type. Its exact semantics is throughly described in gcc documentation.

int x = 5;
typeof(x) y = 6;
printf("%d %d\n", x, y);

Zero Pointer Dereference

But what about the zero pointer dereference? Well, it’s a little pointer magic to get the type of the member. It won’t crash, because the expression itself will never be evaluated. All the compiler cares for is its type. The same situation occurs in case we ask back for the address. The compiler again doesn’t care for the value, it will simply add the offset of the member to the address of the structure, in this particular case 0, and return the new address.

struct s {
	char m1;
	char m2;
};

/* This will print 1 */
printf("%d\n", &((struct s*)0)->m2);

Also note that the following two definitions are equivalent:

typeof(((struct s *)0)->m2) c;

char c;

offsetof(st, m)

This macro will return a byte offset of a member to the beginning of the structure. It is even part of the standard library (available in stddef.h). Not in the kernel space though, as the standard C library is not present there. It is a little bit of the same 0 pointer dereference magic as we saw earlier and to avoid that modern compilers usually offer a built-in function, that implements that. Here is the messy version (from the kernel):

#define offsetof(TYPE, MEMBER) ((size_t) &((TYPE *)0)->MEMBER)

It returns an address of a member called MEMBER of a structure of type TYPE that is stored in memory from address 0 (which happens to be the offset we’re looking for).

Putting It All Together

#define container_of(ptr, type, member) ({                      \
        const typeof( ((type *)0)->member ) *__mptr = (ptr);    \
        (type *)( (char *)__mptr - offsetof(type,member) );})

When you look more closely at the original definition from the beginning of this post, you will start wondering if the first line is really good for anything. You will be right. The first line is not intrinsically important for the result of the macro, but it is there for type checking purposes. And what the second line really does? It subtracts the offset of the structure’s member from its address yielding the address of the container structure. That’s it!

After you strip all the magical operators, constructs and tricks, it is that simple :-).

References

Advertisements

Custom Kernel on Fedora

Why would someone want to have a custom kernel? Well, maybe you like the cutting-edge features or maybe you want to hack on it! Anyway, this post explains step-by-step, how to download, build and install your custom kernel on Fedora. I’ll be building kernel from the mainline tree for i686.

Warning: Keep in mind that if something goes wrong you might end up irreversibly damaging your system! Always keep a fail-safe kernel to boot to, just in case!

1. Getting the kernel tree

Linux kernel tree is developed using git revision control system. There are a LOT of different versions and different trees of the kernel maintained by various people. The development tree is linux-next, Linus’ mainline is called simply linux, there is also linux-mm which had a similar purpose as linux-next today. You need to pick one and clone it locally. In this how-to, I’ll stick with the linux tree from Linus that is at least a tiny bit more stable than linux-next. Use the following to get the source:

$ git clone git://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git

2. Patching the source

Since this is really bleeding-edge code, there might be some problems with compilation or other bugs. Now is the time to patch the tree to resolve all these issues. Also if you want to hack a little and do some of your own changes to the kernel, now is the time! Don’t be afraid, it’s not that hard …

3. Configure the kernel

Kernel is a pretty big piece of software and it provides a shitload of  configuration options. Various limits and settings can be adjusted in this phase. You can also decide, what parts of kernel such as what drivers will be included in your build. Like I said, there is a lot of options and a couple of ways of setting them:

$ make config     # Go through each and every option by hand
$ make defconfig  # Use the default config for your architecture
$ make menuconfig # Edit the config with ncurses gui
$ make gconfig    # Editing with gtk GUI app

Sometimes, the kernel is built with an option that it saves it’s configuration in /proc/config.gz. If this is your case, you can copy it inside the tree and use make oldconfig. This will ask only for the new features, that were not present in the previous version.

$ zcat /proc/config.gz > .config
$ make oldconfig

On Fedora, the configuration of currently installed kernels can be found in the /boot directory:

$ ll /boot/config*
-rw-r--r--. 1 root root 123540 Mar 20 17:31 /boot/config-2.6.42.12-1.fc15.i686
-rw-r--r--. 1 root root 125193 Apr 21 15:54 /boot/config-2.6.43.2-6.fc15.i686
-rw-r--r--. 1 root root 125204 May  8 14:23 /boot/config-2.6.43.5-2.fc15.i686

4. Build it

When you’re done configuring, you can advance to the build. The only advice I can give you here to take advantage of -j option that make offers if you’re on a system with multiple cores. To build the kernel using 2 jobs per core on a dual-core processor use:

$ make -j4

It will significantly improve the build times. Either way, it will take some time to build, so it’s time to get a coffee!

5. Installation

Result of successful build should be a bzImage located in arch/i386/boot/bzImage and a bunch of built modules *.ko (kernel object). It’s vital to point out, that bzImage isn’t just a bzip2-compressed kernel object. It’s a specific bootable file format, that contains compressed kernel code along with some boot code (like a stub for decompressing the kernel etc).

Anatomy of bzImage (source: wikipedia.org)

To install the new kernel, rename and copy the bzImage to boot and install the modules by writing:

# cp arch/i386/boot/bzImage "/boot/vmlinuz-"`make kernelrelease`
# make modules_install

The modules will be installed to /lib/modules. Also a file called System.map will be created in the root of kernel source tree which is a symbol look-up table for kernel debugging. You can place it into /boot along with the image:

# cp System.map "/boot/System.map-"`make kernelrelease`

The file contents looks like this:

$ head System.map
00000000 A VDSO32_PRELINK
00000040 A VDSO32_vsyscall_eh_frame_size
000001d5 A kexec_control_code_size
00000400 A VDSO32_sigreturn
0000040c A VDSO32_rt_sigreturn
00000414 A VDSO32_vsyscall
00000424 A VDSO32_SYSENTER_RETURN
00400000 A phys_startup_32
c0400000 T _text
c0400000 T startup_32

6. Initramfs

When all the files are in place, you need to generate the initial ramdisk (initramfs). The initial filesystem that is created in RAM is there to make some preparations before the real root partition is mounted. For instance if you’re root is on a RAID or LVM, you’ll need to pre-load some drivers etc. It usually just loads the block device modules necessary to mount the root.

There’s an utility called dracut, that will generate this for you.

# dracut "" `make kernelrelease`

This will generate the image and store it into /boot/initramfs-kernelrelease.img. To inspect the file use

# lsinitrd /boot/initramfs-kernelrelease.img

7. Bootloader Settings

In the final step, before you can actually boot the new kernel is to configure your bootloader and tell it that the new kernel is there. There was a transition between Fedora 15 and Fedora 16 from GRUB 0.97 (nowdays known as grub-legacy) to new GRUB2 so I’ll explain both.

grub-legacy

In the old version (which I am currently using on F15) you need to edit the /boot/grub/menu.lst file and add new entry with paths to your kernel and initrd. The entry might look like the following:

title Fedora (2.6.41.10-3.fc15.i686)
      root (hd0,0)
      kernel /boot/vmlinuz-3.1.0-rc7 ro root=UUID=58042206-7ffe-4285-8a07-a1874d5a70d2 rd_NO_LUKS rd_NO_LVM rd_NO_MD rd_NO_DM LANG=en_US.UTF-8 SYSFONT=latarcyrheb-sun16 KEYTABLE=cz-us-qwertz rhgb quiet
      initrd /boot/initramfs-3.1.0-rc7.img

grub2

In grub2 you should be able to do this automatically by this command

# grub2-mkconfig -o /boot/grub2/grub.cfg

After this step you can reboot and let your machine chew on some fresh meat directly from the developers! Probably as fresh as it’ll ever get! Boot up and enjoy ;-).

Sources


If you liked this post, make sure you subscribe to receive notifications of new content by email or RSS feed.
Alternatively, feel free to follow me on Twitter or Google+.