TODO list for libguestfs ====================================================================== This list contains random ideas and musings on features we could add to libguestfs in future. - RWMJ FUSE API -------- The API needs more test coverage, particularly lesser-used system calls. The big unresolved issue is UID/GID mapping between guest filesystem IDs and the host. It's not easy to automate this because you need extra details about the guest itself in order to get to its UID->username map (eg. /etc/passwd from the guest). Haskell bindings ---------------- Complete the Haskell bindings (see discussion on haskell-cafe). PHP bindings ------------ Add bindtests to PHP bindings. Complete bind tests ------------------- Complete the bind tests - must test the return values and error cases. virt-inspector - make libvirt XML --------------------------------- It should be possible to generate libvirt XML from virt-inspector data, at least partially. This would be just another output type so: virt-inspector --libvirt guest.img Note that recent versions of libvirt/virt-install allow guests to be imported, so this is not so useful any more. "Standalone/local mode" ----------------------- Instead of running guestfsd (the daemon) inside qemu, there should be an option to just run guestfsd directly. The architecture in this mode would look like: +------------------+ | main program | |------------------| | libguestfs | +--------^---------+ | | reply cmd | | +----v-------------+ | guestfsd | +------------------+ Notes: (1) This only makes sense if we are running as root. (2) There is no console / kernel messages in this configuration, but we might consider capturing stderr from the daemon. (3) guestfs_config and guestfs_add_drive become no-ops. Obviously in this configuration, commands are run directly on the local machine's disks. You could just run the commands themselves directly, but libguestfs provides a convenient API and language bindings. Also deals with tricky stuff like parsing the output of the LVM commands. Also we get to leverage other code such as virt-inspector. This is mainly useful from live CDs, ie. virt-p2v. Should we bother having the daemon at all and just link the guestfsd code directly into libguestfs? Ideas for extra commands ------------------------ General glibc / core programs: chgrp more mk*temp calls ext2 properties: chattr lsattr badblocks blkid debugfs dumpe2fs e2image e2undo filefrag findfs logsave mklost+found SELinux: chcat restorecon ch??? Oddball: pivot_root fts(3) / ftw(3) Other initrd-* commands ----------------------- Such as: initrd-extract initrd-replace Simple editing of configuration files ------------------------------------- Some easy non-Augeas methods to edit configuration files. I'm thinking: replace /etc/file key value which would look in /etc/file for any instances of key=... key ... key:... and replace them with key=value key value key:value That would solve about 50% of reconfiguration needs, and for the rest you'd use Augeas, 'download'+'upload' or 'edit'. RWMJ: I had a go at implementing this, but it's quite error-prone to do this sort of editing inside the C-based daemon code. It's far better to do it with Augeas, or else to use an external language like Perl. Quick Perl scripts ------------------ Currently we can't do Perl "one-liners". ie. The current syntax for any short Perl one-liner would be: perl -MSys::Guestfs -e '$g = Sys::Guestfs->new(); $g->add_drive ("foo"); $g->launch; $g->mount ("/dev/sda1", "/"); ....' You can see we're well beyond a single line just getting to the point of adding drives and mounting. First suggestion: $h = create ($filename, \"/dev/sda1\" => \"/\"); $h = create ([$file1, $file2], \"/dev/sda1\" => \"/\"); To mount read-only, add C 1> like this: $h = create ($filename, \"/dev/sda1\" => \"/\", ro => 1); which is equivalent to the following sequence of calls: $h = Sys::Guestfs->new (); $h->add_drive_ro ($filename); $h->launch (); $h->mount_ro (\"/dev/sda1\", \"/\"); Command-line form would be: perl -MSys::Guestfs=:all -e '$_=create("guest.img", "/dev/sda1" => "/"); $_->cat ("/etc/fstab");' That's not brief enough for one-liners, so we could have an extra autogenerated module which creates a Sys::Guestfs handle singleton (the handle is an implicit global variable as in guestfish), eg: perl -MSys::Guestfs::One -e 'inspect("guest.img"); cat ("/etc/fstab");' How would editing files work? virt-rescue pty --------------- See: http://search.cpan.org/~rgiersig/IO-Tty-1.08/Pty.pm http://www.perlmonks.org/index.pl?node_id=582185 Note that pty requires cooperation inside the C code too (there are two sides to a pty, and one has to be handled after the fork). [I tried to implement this in the new C virt-rescue, but it doesn't work. qemu is implementing its own ptys, and they are broken. Need to fix qemu.] Windows-based daemon/appliance ------------------------------ See discussion on list: https://www.redhat.com/archives/libguestfs/2009-November/msg00165.html qemu locking ------------ Add -drive file=...,lock=exclusive and -drive file=...,lock=shared Change libguestfs and libvirt to do the right thing, so that multiple instances of qemu cannot stomp on each other. virt-disk-explore ----------------- For multi-level disk images such as live CDs: http://rwmj.wordpress.com/2009/07/15/unpack-the-russian-doll-of-a-f11-live-cd/ It's possible with libguestfs to recursively look for anything that might be a filesystem, mount-{,loop} it and look in those, revealing anything in a disk image. However this won't work easily for VM disk images in the disk image. One would have to download those to the host and launch another libguestfs instance. [Not sure this is such a good idea. See also live CD inspection idea below.] Map filesystems to disk blocks ------------------------------ Map files/filesystems/(any other object) to the actual disk blocks they occupy. And vice versa. Is it even possible? See also contribs/visualize-alignment/ Integration with host intrusion systems --------------------------------------- Perfect way to monitor VMs from outside the VM. Look for file hashes, log events, login/logout etc. http://www.ossec.net/ http://la-samhna.de/samhain/ http://sourceforge.net/projects/aide/ http://osiris.shmoo.com/ http://sourceforge.net/projects/tripwire/ Fix 'file' ---------- https://www.redhat.com/archives/libguestfs/2010-June/msg00053.html https://www.redhat.com/archives/libguestfs/2010-June/msg00079.html Freeze/thaw filesystems ----------------------- Access to these ioctls: http://git.kernel.org/linus/fcccf502540e3d7 Tips for new users in guestfish ------------------------------- $ guestfish Tip: You need to 'add disk.img' or 'alloc disk.img nn' to make a new image. Type 'notips' to disable tips permanently. > add mydisk Tip: You need to type 'run' before you can see into the disk image. > run Tip: Use 'list-filesystems' to see what filesystems are available. > list-filesystems /dev/vda1 Tip: Use 'mount fs /' to mount a filesystem. > mount /dev/vda1 / Tip: Use 'll /' to view the filesystem or ... > ll / Could we make guestfish interactive if commands are used without params? ------------------------------------------------------------------------ > sparse [[Prints man page]] Image name? disk.img Size of image? 10M Common problems --------------- How can we solve these common user problems? [space for common problems here] Better support for encrypted devices ------------------------------------ Currently LUKS support only works if the device contains volume groups. If it contains, eg., partitions, you cannot access them. We would like to add: - Direct access to the /dev/mapper device (eg. if it contains anything apart from VGs). Display image as PS ------------------- Display the structure of an image file as a PS. Greater use of blkid / libblkid ------------------------------- guestfs_zero should use wipefs. See wipefs(8). There are various useful functions in libblkid for listing partitions, devices etc which we are essentially duplicating in the daemon. It would make more sense to just use libblkid for this. There are some places where we call out to the 'blkid' program. This might be replaced by direct use of the library (if this is easier). Visualization ------------- Eric Sandeen pointed out the blktrace tool which is a better way of capturing traces than using patched qemu (see contrib/visualize-alignment). We would still use the same visualization tools in conjunction with blktrace traces. guestfish parsing ----------------- At the moment guestfish uses an ad hoc parser which has many shortcomings. We should change to using a lex/yacc-based scanner and parser (there are better parsers out there, but yacc is sufficient and very widely available). The scanner must deal with the case of parsing a whole command string, eg. for a command that the user types in: > add-drive-opts "/tmp/foo" readonly:true and also with parsing single words from the command line: guestfish add-drive-opts /tmp/foo readonly:true Note the quotes are for scanning and don't indicate types. We should also allow variables and expressions as part of this new parsing code, eg: set roots inspect-os set product inspect-get-product-name %{roots[0]} % is better than $ because of shell escaping and confusion with shell variables. Can we combine this with ability to set and read environment variables? Currently guestfish uses many environment variables like $EDITOR without any corresponding ability to set them. set EDITOR /usr/bin/emacs echo $EDITOR # or %{EDITOR} edit /etc/resolv.conf live CD inspection for Windows 7 -------------------------------- Windows 7 install CDs are quite different and pretty impenetrable. There are no obvious files to parse. More ntfs tools --------------- ntfsprogs actually has a lot more useful tools than we currently use. Interesting ones are: ntfslabel: display or change filesystem label (we should unify all set*label APIs into a single set_vfs_label which can deal with any filesystem) ntfsclone: clone, image, restore, rescue NTFS ntfsinfo: print various information about NTFS volume and files ntfs streams: extract alternate streams from NTFS files ntfsck: checker for NTFS filesystems Undelete files -------------- Two useful tools: - ext2undelete - ntfsundelete More mkfs_opts options ---------------------- Useful options to offer: - Set label. - Set UUID. Use /proc/self/mountinfo ------------------------ This file contains lots of interesting information about what is mounted and where. eg: 16 21 0:3 / /proc rw,relatime - proc /proc rw 17 21 0:16 / /sys rw,relatime - sysfs /sys rw,seclabel 18 23 0:5 / /dev rw,relatime - devtmpfs udev rw,seclabel,size=1906740k,nr_inodes=476685,mode=755 26 21 253:3 / /home rw,relatime - ext4 /dev/mapper/vg-lv_home rw,seclabel,barrier=1,data=ordered This could be used instead of current hairy code to parse the output of the 'mount' command. We could add new APIs to return kernel mount options, type of filesystem at a mountpoint etc. guestfish drive letters ----------------------- There should be an option to mount all Windows drives as separate paths, like C: => /c/, D: => /d/ etc. More inspection features ------------------------ - last shutdown time - DHCP address - last time the software was updated - last user who logged in - lastlog, last, who Integrate virt-inspector with CMDBs ----------------------------------- Either integrate virt-inspector with Configuration Management Databases (CMDBs) or at least check that virt-inspector produces the right range of data so that integration would be possible. The standards for CMDBs come from the DMTF, see eg: http://dmtf.org/news/pr/2009/7/dmtf-releases-cmdbf-standard-federating-configuration-management-data Efficient way to visit all files -------------------------------- https://rwmj.wordpress.com/2010/12/15/tip-audit-virtual-machine-for-setuid-files/#content A naive method would look like: g#visit ~return_stats:true "/" ( fun pathname stat -> ... ) However this has two disadvantages: - requires hand-written custom bindings in each language - unclear about locking, thread-safety and re-entrancy of handle g A better way would be to have some sort of explicit "download all filenames and stat structures", which could then be iterated over: let files = g#find_opts ~return_stats:true "/" in List.iter ( fun pathname stat -> ... ) The problem with this is that 'files' is going to be larger than a protocol buffer. This leads to thinking about changes to the protocol / generator to make this simpler. The proposal would be to add RBigStringList, RBigStructList [or RBig (Ranytype ...)]. These would work like FileOut, in that they would use file streaming to stream XDR structures (probably written to a file on the library side). Generated code would hide most of the implementation. We also need to think about security issues: is it possible for the daemon to keep sending back data forever, and if so what happens on the library side. [Users can now use virt-ls to solve some of these problems, but it is not a general solution at the API level] Interactive disk creator ------------------------ An interactive disk creator program. Attach method for disconnected operation ---------------------------------------- http://libguestfs.org/guestfs.3.html#guestfs_set_attach_method "Librarian" has an idea that he should be able to attach to a regular appliance, but disconnect from it and reconnect to it later. This would be some sort of modified attach method (see link above). The complexity here is that we would no longer have access to stdin/stdout (or we'd have to direct that somewhere else). GObject Introspection --------------------- We periodically get asked to implement gobject-introspection (it's a GNOME thing): http://live.gnome.org/GObjectIntrospection This would require a separate Gtk C API since the main guestfs handle would have to be encapsulated in a GObject. However the main difficulty is that the annotations supported to define types are not very rich. Notably missing are support for optional arguments (defined but not implemented), support for structs (unless mapped to other objects). Also note that the libguestfs API is not "object oriented". libosinfo mappings for virt-inspector ------------------------------------- Return libosinfo mappings from inspection API. virt-sysprep ideas ------------------ - touch /.unconfigured ? - other Spacewalk / RHN IDs (?) - Kerberos keys - Puppet registration - user accounts - Windows sysprep (see: https://github.com/clalancette/oz/blob/e74ce83283d468fd987583d6837b441608e5f8f0/oz/Windows.py ) - blue skies: change the background image - (librarian suggests ...) . install a firstboot script virt-sysprep --script=/tmp/foo.sh . run an external shell script . run external guestfish script virt-sysprep --fish=/tmp/foo.fish . rm /var/cache/apt/archives/* - /var/run/* and pam_faillock's data files - homedirs/.ssh directory, especially /root/.ssh (Steve Grubb) - if drives are encrypted, then dm-crypt key should be changed and drives all re-encrypted - /etc/pki (Steve says ...) Rpm uses nss. Nss sets up its crypto database in /etc/pki. Depending on how long the machine ran before cloning, you may have picked up some certificates or things. This is an area that you would want to look into. - secure erase of inodes etc using scrub (Steve Grubb) - other directories that could require cleaning include: /var/cache/gdm/* /var/lib/fprint/* /var/run/* /var/lib/AccountService/users/* /var/lib/sss/db/* /var/lib/samba/* /var/lib/samba/*/* (thanks Marko Myllynen, James Antill) - remove or modify UUIDs in /etc/fstab (eg. on Ubuntu) (thanks Joshua Daniel Franklin) Launch remote sessions over ssh ------------------------------- We had an idea you could add a launch method that uses ssh, ie. all febootstrap and qemu commands happen the same as now, but prefixed by ssh so it happens on a remote machine. Note that proper remote support and integration with libvirt is different from this, and people are working on that. ssh would just be "remote-lite". virt-make-fs and virt-win-reg need to not be in Perl ---------------------------------------------------- Probably they should be in C or OCaml. Integrate snap-type functionality in inspection tools ----------------------------------------------------- Mo Morsi's "snap" program lets you describe a guest as the list of packages (eg. RPMs) installed + changes made to those RPMs + files added. http://projects.morsi.org/wiki/Snap This results in a compact description of the guest. He even managed to do a kind of migration of guests by simply recreating the guest from the description on the target machine. It would be ideal to integrate this and/or use inspection to do this.