Running Linux in the browser
Most people are quite surprised to see an operating system boot up in their web browser. But for SUSE Studio, this is an essential part of the user experience. In this post, I’m going to tell you about my favorite feature in SUSE Studio: Testdrive. Why did we build it, and how does it work?
Testdrive screenshot from a recent Ars Technica review
SUSE Studio is a web service that makes it ridiculously easy for anyone with a web browser and a couple of years of Linux experience to build a software appliance, or your own custom Linux distribution, in less than ten minutes.
One of our main objectives in creating SUSE Studio was to give users a streamlined build-test-tweak-rebuild cycle, so they could build and improve their appliances over several iterations.
But we also wanted Studio to have an extremely low barrier to entry. If you had to install a new program to use Studio, or if you had to have a SUSE system already, we thought a lot of people wouldn’t bother to try it.
So that’s why we made SUSE Studio a web service that you can use from any computer, even if you don’t have SUSE. Even if you don’t have Linux. (Although you do need to have some Linux experience to use it.)
But this raised another problem: if Studio is a web service, then the appliance that you create is sitting on our servers, not on your computer. So to test what you’ve built, you would have to download it, and then run it in a virtual machine or burn it to a CD. This would slow down your development iterations horribly and make the tool unusable.
We solved this problem by making it possible to actually boot your appliance in the web browser, in seconds, with a single click. We call this feature “testdrive.” I made a short (1 minute) screencast of testdrive in action so you can see for yourself.
(If you’re reading this in an aggregator and don’t see the screencast, click here).
So you can watch the appliance boot, log in, poke around, and run your tests to make sure that everything is working — without downloading anything. If you find a problem, you can fix it, rebuild, and testdrive again.
Java vs Flash and VNC
The way it works is that we boot your appliance in a KVM instance on our server, and expose the virtual machine framebuffer via VNC to a Flash applet running in your browser.
We also played around with the VNC protocol to see if we could improve the performance. There are some common VNC extensions to compress the framebuffer traffic with techniques like run-length encoding and JPEG compression, but we thought we might be able to do better.
Remote framebuffer optimization offers different challenges from Xvnc, where you have cooperation from the windowing system. When you move a window on the screen, Xvnc can use a CopyRect command to reduce the amount of network traffic, and when you move the mouse, it can just send a cursor-moved event. But you can’t do that with a framebuffer, where you just have pixels on the screen. So we tried some other things.
We added PNG tile compression because we thought it might look better than JPEG. Using PNG also allowed us to fall back to grayscale or monochrome on slow connections. We also added a tile cache to the client and server to avoid sending redundant tiles. And I worked on a patch to automatically detect if the console was scrolling, so that we could use CopyRect to avoid repainting the entire screen.
We ran these for several months, but in the end, these extensions didn’t have as much impact as we had hoped. And since they were non-standard, they were difficult to maintain and a bit hacky. So we threw all of that away and are using the more standard VNC extensions now (ZRLE and Tight).
Even though we didn’t end up using any of that code, it was a good experiment. And it is a nice example of our team spirit: the courage to try something new to improve the user experience. And the willingness to throw it all away when it doesn’t help.
As useful as it is to have an integrated testing mechanism, we also wanted to make it possible to fine-tune your appliance inside testdrive by running commands, editing files, or even installing new software.
For example, you might want to configure the desktop, setup a launcher, change the volume, or other small details that are easier to do interactively that hunting down the proper command/file to modify. Or your application might have an interactive installer which you want to run so that your users don’t have to.
When a Linux system boots for the first time it generates and modifies a lot of files that you definitely don’t want to include in the appliance you give to your users: things like SSH host keys, resolv.conf, etc. So it wouldn’t work to just make the testdrive session “read-write.”
Our target user experience was to allow the user to see all the changes that occurred on the filesystem since the testdrive session began, and to select the changes they want to include in their final, pristine appliance.
Here is a short (90 second) screencast showing this feature:
(If you’re reading this in an aggregator and don’t see the screencast, click here).
How Modified Files Works
It took a few tries to figure out how to do this.
We needed a solution that didn’t require any cooperation from the appliance itself. A special kernel module or other hack we might inject into the appliance would limit what users could build with SUSE Studio. We wanted our users to be free to choose any kernel, for example.
We also needed to be able to dynamically generate the list of filesystem modifications in one or two seconds. Anything slower would be too uncomfortable for interactive use.
I spent a week trying different approaches and went down several dead ends. Most of what I tried was just too slow.
Then late one night during a phone conversation with Miguel, we hit on a really great solution.
QEMU (and by extension KVM, more on that in a sec) has the ability to run a virtual machine with a copy-on-write disk image, or cowfile. As the KVM instance runs, the original appliance disk image is only used for reading, and all writes go to the cowfile. Whenever the virtual machine wants to read a block off the disk, it first checks to see if that block is present in the cowfile, and falls back to the original appliance image if not.
So you essentially have two copies of the disk image — the original image and the modified image — without the cost of having to make a full copy. The magic of the “if” statement.
All of this is common practice.
What’s special is that we used libext2fs, the user-space implementation of the ext2fs filesystem, to read the filesystem metadata of both the original and the modified filesystems. We read all the inodes and dentries into memory, compare them, and display the differences. This worked perfectly for us. The first time a diff is run, it takes a few seconds, but after that the metadata blocks are cached, and it is common to see warm diffs of multi-gigabyte appliances take less than half a second.
After that conversation with Miguel it took me about two days to get this working. This was a great hack, one of my best in recent years. It was simple to do (once we figured it out) and yet I’ve never seen any virtualization software offer this kind of feature before.
By the way, we can use the same technique to find unused files and packages that the user might be able to remove from his appliance to make it smaller. This would only work if you ran a pretty complete coverage test in testdrive, of course.
You cannot make outbound network connections during a testdrive session, but we do provide a way for you to make incoming connections on a few ports so that you can SSH in or test a web application running in your appliance. For security reasons, you have to explicitly enable this feature by clicking a button, and inbound connections are restricted to those that come from your browser’s originating IP address. We implemented this restriction with a simple patch to QEMU that Alex Graf wrote in about 15 minutes. (Alex seems to do everything in about 15 minutes.)
We give each testdrive instance 512MB of RAM and an hour to run. A modern $2000 server can host 16 simultaneous testdrive instances and has a lifespan of about two years. If we assume 30% utilization, and factor in power, cooling, and bandwidth, we estimate the cost of hosting each testdrive session is about 6 cents. To us that seemed a fair price to pay for making it easier for developers to build appliances with SUSE Linux.
We do run Testdrive on our own server farm that we operate ourselves, not in a cloud service like EC2. We think EC2 is really great, and in fact we plan to add EC2 AMI output support to Studio in the future. But it just wouldn’t work for this particular service.
Cost is one of the reasons. We can operate our testdrive servers at a much lower cost than Amazon charges today. There’s also the fact that each testdrive session is itself a virtual machine instance, and so to run it in EC2 you’d need support for nested virtualization, which EC2 doesn’t currently support. Although, Alex Graf has implemented this in KVM in his nested SVM patches.
As a short digression, a lot of our experiments with testdrive were possible because we had QEMU as a foundation. Originally written by Fabrice Bellard, QEMU is one of my favorite open source codebases. This is not only because of its power and features — it emulates dozens of devices and CPUs in software — but because the code is so simple and hacker-friendly. It is clean and modular and it is totally lacking in the confusing layer-cake of abstractions and generalities and project-specific jargon that make so much large software so unpleasant to work with. Even though I am a total beginner in virtualization technology, I found that I could understand QEMU and write basic patches within an hour of untarring the source tree (it’s certainly possible that the deeper parts I didn’t get into are harder to hack). QEMU is the basis for both KVM and Xen HVM, and is in my opinion the unsung hero of open source virtualization.
Fabrice Bellard must be one of the most talented and prolific developers working today, and I definitely recommend checking out his other projects, including ffmpeg, numcalc.com, tinygl, tinygcc, and his algorithm for computing Pi.
I hope you found this post interesting. I would be glad for any feedback, and let me know what you’d like to hear about in future posts!
Other posts about SUSE Studio and Software Appliances: