This post is part of a series of articles I am writing about SUSE Studio and software appliances.
Most people are quite surprised to see an operating system boot up in their web browser. But for SUSE Studio, this is an essential part of the user experience. In this post, I’m going to tell you about my favorite feature in SUSE Studio: Testdrive. Why did we build it, and how does it work?

Testdrive screenshot from a recent Ars Technica review
SUSE Studio is a web service that makes it ridiculously easy for anyone with a web browser and a couple of years of Linux experience to build a software appliance, or your own custom Linux distribution, in less than ten minutes.
One of our main objectives in creating SUSE Studio was to give users a streamlined build-test-tweak-rebuild cycle, so they could build and improve their appliances over several iterations.
But we also wanted Studio to have an extremely low barrier to entry. If you had to install a new program to use Studio, or if you had to have a SUSE system already, we thought a lot of people wouldn’t bother to try it.
So that’s why we made SUSE Studio a web service that you can use from any computer, even if you don’t have SUSE. Even if you don’t have Linux. (Although you do need to have some Linux experience to use it.)
But this raised another problem: if Studio is a web service, then the appliance that you create is sitting on our servers, not on your computer. So to test what you’ve built, you would have to download it, and then run it in a virtual machine or burn it to a CD. This would slow down your development iterations horribly and make the tool unusable.
Introducing Testdrive
We solved this problem by making it possible to actually boot your appliance in the web browser, in seconds, with a single click. We call this feature “testdrive.” I made a short (1 minute) screencast of testdrive in action so you can see for yourself.
(If you’re reading this in an aggregator and don’t see the screencast, click here).
So you can watch the appliance boot, log in, poke around, and run your tests to make sure that everything is working — without downloading anything. If you find a problem, you can fix it, rebuild, and testdrive again.
Java vs Flash and VNC
The way it works is that we boot your appliance in a KVM instance on our server, and expose the virtual machine framebuffer via VNC to a Flash applet running in your browser.
Our first version of Testdrive used a Java applet, but we had a lot of problems with Java. The applet embedding, javascript bridging, and window-focus behavior varied dramatically between the different versions of Java that our users were running. Sometimes users couldn’t type into Testdrive at all. Sometimes they just got a gray rectangle. So we switched to Flash, and that has worked a lot better.
We also played around with the VNC protocol to see if we could improve the performance. There are some common VNC extensions to compress the framebuffer traffic with techniques like run-length encoding and JPEG compression, but we thought we might be able to do better.
Remote framebuffer optimization offers different challenges from Xvnc, where you have cooperation from the windowing system. When you move a window on the screen, Xvnc can use a CopyRect command to reduce the amount of network traffic, and when you move the mouse, it can just send a cursor-moved event. But you can’t do that with a framebuffer, where you just have pixels on the screen. So we tried some other things.
We added PNG tile compression because we thought it might look better than JPEG. Using PNG also allowed us to fall back to grayscale or monochrome on slow connections. We also added a tile cache to the client and server to avoid sending redundant tiles. And I worked on a patch to automatically detect if the console was scrolling, so that we could use CopyRect to avoid repainting the entire screen.
We ran these for several months, but in the end, these extensions didn’t have as much impact as we had hoped. And since they were non-standard, they were difficult to maintain and a bit hacky. So we threw all of that away and are using the more standard VNC extensions now (ZRLE and Tight).
Even though we didn’t end up using any of that code, it was a good experiment. And it is a nice example of our team spirit: the courage to try something new to improve the user experience. And the willingness to throw it all away when it doesn’t help.
Modified Files
As useful as it is to have an integrated testing mechanism, we also wanted to make it possible to fine-tune your appliance inside testdrive by running commands, editing files, or even installing new software.
For example, you might want to configure the desktop, setup a launcher, change the volume, or other small details that are easier to do interactively that hunting down the proper command/file to modify. Or your application might have an interactive installer which you want to run so that your users don’t have to.
When a Linux system boots for the first time it generates and modifies a lot of files that you definitely don’t want to include in the appliance you give to your users: things like SSH host keys, resolv.conf, etc. So it wouldn’t work to just make the testdrive session “read-write.”
Our target user experience was to allow the user to see all the changes that occurred on the filesystem since the testdrive session began, and to select the changes they want to include in their final, pristine appliance.
Here is a short (90 second) screencast showing this feature:
(If you’re reading this in an aggregator and don’t see the screencast, click here).
How Modified Files Works
It took a few tries to figure out how to do this.
We needed a solution that didn’t require any cooperation from the appliance itself. A special kernel module or other hack we might inject into the appliance would limit what users could build with SUSE Studio. We wanted our users to be free to choose any kernel, for example.
We also needed to be able to dynamically generate the list of filesystem modifications in one or two seconds. Anything slower would be too uncomfortable for interactive use.
I spent a week trying different approaches and went down several dead ends. Most of what I tried was just too slow.
Then late one night during a phone conversation with Miguel, we hit on a really great solution.
QEMU (and by extension KVM, more on that in a sec) has the ability to run a virtual machine with a copy-on-write disk image, or cowfile. As the KVM instance runs, the original appliance disk image is only used for reading, and all writes go to the cowfile. Whenever the virtual machine wants to read a block off the disk, it first checks to see if that block is present in the cowfile, and falls back to the original appliance image if not.
So you essentially have two copies of the disk image — the original image and the modified image — without the cost of having to make a full copy. The magic of the “if” statement.
All of this is common practice.
What’s special is that we used libext2fs, the user-space implementation of the ext2fs filesystem, to read the filesystem metadata of both the original and the modified filesystems. We read all the inodes and dentries into memory, compare them, and display the differences. This worked perfectly for us. The first time a diff is run, it takes a few seconds, but after that the metadata blocks are cached, and it is common to see warm diffs of multi-gigabyte appliances take less than half a second.
After that conversation with Miguel it took me about two days to get this working. This was a great hack, one of my best in recent years. It was simple to do (once we figured it out) and yet I’ve never seen any virtualization software offer this kind of feature before.
By the way, we can use the same technique to find unused files and packages that the user might be able to remove from his appliance to make it smaller. This would only work if you ran a pretty complete coverage test in testdrive, of course.
Networking
You cannot make outbound network connections during a testdrive session, but we do provide a way for you to make incoming connections on a few ports so that you can SSH in or test a web application running in your appliance. For security reasons, you have to explicitly enable this feature by clicking a button, and inbound connections are restricted to those that come from your browser’s originating IP address. We implemented this restriction with a simple patch to QEMU that Alex Graf wrote in about 15 minutes. (Alex seems to do everything in about 15 minutes.)
The Servers
We give each testdrive instance 512MB of RAM and an hour to run. A modern $2000 server can host 16 simultaneous testdrive instances and has a lifespan of about two years. If we assume 30% utilization, and factor in power, cooling, and bandwidth, we estimate the cost of hosting each testdrive session is about 6 cents. To us that seemed a fair price to pay for making it easier for developers to build appliances with SUSE Linux.
We do run Testdrive on our own server farm that we operate ourselves, not in a cloud service like EC2. We think EC2 is really great, and in fact we plan to add EC2 AMI output support to Studio in the future. But it just wouldn’t work for this particular service.
Cost is one of the reasons. We can operate our testdrive servers at a much lower cost than Amazon charges today. There’s also the fact that each testdrive session is itself a virtual machine instance, and so to run it in EC2 you’d need support for nested virtualization, which EC2 doesn’t currently support. Although, Alex Graf has implemented this in KVM in his nested SVM patches.
About QEMU
As a short digression, a lot of our experiments with testdrive were possible because we had QEMU as a foundation. Originally written by Fabrice Bellard, QEMU is one of my favorite open source codebases. This is not only because of its power and features — it emulates dozens of devices and CPUs in software — but because the code is so simple and hacker-friendly. It is clean and modular and it is totally lacking in the confusing layer-cake of abstractions and generalities and project-specific jargon that make so much large software so unpleasant to work with. Even though I am a total beginner in virtualization technology, I found that I could understand QEMU and write basic patches within an hour of untarring the source tree (it’s certainly possible that the deeper parts I didn’t get into are harder to hack). QEMU is the basis for both KVM and Xen HVM, and is in my opinion the unsung hero of open source virtualization.
Fabrice Bellard must be one of the most talented and prolific developers working today, and I definitely recommend checking out his other projects, including ffmpeg, numcalc.com, tinygl, tinygcc, and his algorithm for computing Pi.
Ok…
I hope you found this post interesting. I would be glad for any feedback, and let me know what you’d like to hear about in future posts!
Other posts about SUSE Studio and Software Appliances:
Posted on 30 July 2009
- Leave a comment
- Subscribe with Google Reader
- Follow me on Twitter
Did you like this article?
-
Nat,
The flash / VNC setup is nifty. On windows I always liked that they made remote desktop available via HTTP (they used a propritary plugin, but if you were on a windows box the experience was great). It’d be fantastic to have it be standard that when you want to use a machine, you simply browse to it on the web and get a flash VNC interface (also, IPMI card vendors really have got to implement this!)
About not hosting on EC2 — I couldn’t agree with you more. Amazon’s pricing is absurd. The fact that they have not cut prices in 3 years of service is quite a genius way to increase their profit margins.
The hardware pricing sounds a bit on the expensive side. For $2500 each, you should be able to get 2x quad core xeons (so 16 hardware threads) and 24 GB of ram — that should easily be able to support 25-30 VMs.
-
Ben, we are being pretty conservative on our VM to hardware ratio. We are using two socket quad core machines with good amounts of RAM. We could probably increase our VM per CPU core, but so far we have not needed to.
-
-
Hey Nat,
Thanks very much for this post. It’s really interesting to hear about the different iterations of SUSE Studio and the decisions you made throughout the development process.
Especially given I played with it a bit a while back – I didn’t even know you’d switched to Flash for Testdrive!
Cheers,
Henare
-
Can you add QEMU (or something similar) as a target so the images can run under Windows without any install at all? If you went all the way and included QEMU as part of the ISO (auto-start) then you’d be completely stand-alone.
-
Hi – great stuff!! When I saw the “modified files” screencast I thought that it would be neat if I could bundle/group modifications by name/tag instead of just time based. Sort of like:
In the testdrive web interface: start group/tag “somename”
Do a bunch of modifications
end group/tagResult: a named set of modified files.
Maybee this set of modifications could even be saved and applied to another appliance.
Regards
/Peter
-
That looks awesome. Any chance of a Firefox add-on or similar, for those of us who refuse to use proprietary software like Flash? Or, alternatively, any chance of allowing direct VNC connections?
-
Why didn’t you just use NX / Nomachine?
Nothing you do to VNC will come close to the splendor of NX.
-
Hi Nat,
Congratulations on the launch of SUSE Studio. Kev Smith introduced me to it and what the team has achieved in the user experience is very impressive. It may also be worth my team looking further into the VNC/Flash implementation you did to see if we could benefit from re-using it on our NODS virtual machine hosting service (for demos).
PS your KVM definition link is broken (in the Java vs Flash section).
Chris
-
Wow, very impressive.
For us mere mortals that use our own KVM and would like to expose its screen to the Web, are you providing your Flash plugin for download & use by others (preferrably with source)?
Thanks,
Faidon -
Hi Nat,
on test drive, you build a flash applet from scratch or used an opensource (like flashligth-vnc) as base?
Thanks,
–mike

17 comments