Yocto is an open source project from the Linux Foundation that builds custom Linux distributions. It is supported by some of the big names in the computing scene, such as Intel. The build uses an
LD_PRELOAD tool called Pseudo (pronounced like ‘sudo’) to simulate root user privileges during the build. This is similar to the way Breeze to traces applications and so I am going to try to explain how it works.
I have traced a simple rebuild of zlib. Yocto uses a make-like build system called Bitbake, implemented in python. After sourcing the
oe-init-build-env script, I typed
trace bitbake zlib in the Breeze commandline. The build was quick because I’d built it before, but it was enough to be able to see Pseudo in action. In the trace I can see that the top-level Bitbake script calls pseudo. By clicking on the program node in the graph I can see that the path to the main bitbake implementation and the argument ‘zlib’ are passed into pseudo. After setting up the build environment, Pseudo exec’s and Bitbake starts to run inside the Pseudo environment.
What is LD_PRELOAD?
If I look at the libraries loaded by bitbake, I can see the
libpseudo.so library in addition to the other libraries I expect to see. Under the environment tab I can see that the
LD_LIBRARY_PATH variables are set.
LD_LIBRARY_PATH variable just tells the dynamic linker where to look for libraries when the program starts. The
LD_PRELOAD is far more powerful: it tells the linker to load these libraries before loading any of the others. This means that any functions in the
LD_PRELOAD libraries will override those in the normal libraries. You can even override functions that allow applications to communicate with the kernel and intercept calls to open or manipulate files.
Breeze uses this mechanism to override library calls and traces what the application does. The override functions record the event in the trace and then delegate to the ‘real’ function in the GNU C library. For example, when a program calls open(‘myfilename’), Breeze intercepts that call, records that attempt and then calls the real open() function in order to maintain normal program behaviour.
How does Pseudo work?
Pseudo does a similar trick to make the build system think it is running as root. It maintains a database of alternative file permissions and a log of filesystem access events. I can see these databases as well as the cached build information in the Files View tab in Breeze.
When Bitbake tries to access a file under pseudo, the call (eg chmod() or open()) is interrupted and pseudo pretends to make that call as a super (root) administrator user. It uses its SQL database to keep track of any changes to the file system so the build has no idea that the root privileges are being faked. In fact one similar alternative system for doing this kind of think is called ‘fakeroot’!
How does Pseudo work with Breeze?
Although Breeze and Pseudo use the same technique to attach to the Bitbake build, the linker provides for that. Both LD_PRELOAD libraries can be loaded at the same time with one taking priority over the other. Breeze is hardened for use on commercial builds that are not expecting an LD_PRELOAD library. It can cope when bitbake or pseudo modify the tracing environment because it is able to put back any Breeze variables that have been taken out.
Breeze has a lot of code to cope with different kernel versions and programs that have been built on different platforms. When we traced bitbake for the first time, Breeze was so effective in picking the correct system library call, it bypassed the pseudo library completely. We then made Breeze check for ‘unexpected’ extra preloaded libraries and choose the first in the same way the linker does and it all worked just fine.
What else does Bitbake do?
Even in a simple rebuild such as this there are various other steps that I haven’t discussed. You can see the full call graph below in which bitbake accessed git and runs the actual build. Yocto is an open source project so why don’t you try tracing it for yourself.