Some time this week NC State University will make an official announcement that a new email and calendaring system has been chosen that will unite the two different email and calendaring systems used on campus. In a 4 to 3 vote Novell GroupWise will be recommended as our one communications tool to rule them all. NCSU will transition to the new systems over the summer in what will cost upwards of 1 million dollars.
Its amazing that an "Open Source University" that has been charged with reducing IT budgets and pressed strongly to be more efficient with resources will not even consider an Open Source solution for email and calendaring. Our Open Source, Cyrus based email system for most staff and all students has been a strong model for the success of Open Source, its scalability, and cost effectiveness in both man hours and cost. Its a sad time at NCSU.
Sunday, December 16, 2007
GroupWise
Tuesday, November 06, 2007
Linux Woes
I'm bored. There are two things that I'm looking at working on. The first is a long standing quest to turn NCSU's Linux website into a dynamic site that easier for me and others to maintain. I've been poking at making the site completely in the existing MoinMoin wiki or redoing it in MediaWiki. Although there don't seem to be many tools to convert a Moin wiki into MediaWiki. I've also been looking at using WordPress and Drupal. WordPress is inviting as its easy to maintain and seems to put a lot of effort into looking really good. The problem is theming the CMS properly. Unfortunately, CSS and similar magic isn't really my cup of tea. All solutions will take quite a bit of work.
In a similar vain what value exactly does a separate wiki get me? Is there value in having an "official" website with the tacked on wiki? Should the website be one or the other completely?
The second project is configuration management. I've been researching the existing tools such as Bcfg2 and Puppet and have been fairly impressed, but neither seem to fit it what what I need. I have some brief requirements written up here:
I've also been keeping tabs on Func which is very interesting. It may definitely play a role in my monitoring. But it doesn't seem to be heading in the CM direction either. Its close...perhaps if I thought enough about how to implement some CM ideas on top. Perhaps after Func can pull as well as push.
Wednesday, August 29, 2007
RHEL 5 Vesa Bug
For a couple days I've been looking at a bug in RHEL 5 where the VESA driver is unable to drive a Dell LCD panel connected to a Dell Optiplex 745 using a rather new ATI X1300 card. I was getting bad resolutions, X starting once and then failing to restart until the machine is rebooted, and lots of errors with "no screens found." On my 745 in my office running in x86_64 mode I could not duplicate the problem even when using the identical monitor and configuration.
Turns out there is a bug that effects how the X server gets the DDC inforamtion about the monitor. This is overerly well documented as #10238 in Freedesktop.org's Bugzill and as #236416 in Red Hat's Bugzilla. All non-i386 arches and virtual machines (Xen) use emulation to get the DDC information. Turns out using the same emulation works in this case for i386 RHEL 5 machines.
To work around the bug edit your xorg.conf file and add the following to the Server Layout section:
Option "Int10Backend" "x86emu"
Wednesday, July 18, 2007
Collecting Usage Statistics
One of many goals I have is to be able to collect usage statistics for Linux machines at the university. I want to see the utilization of a computer lab over time and calculate the amount of usage of a group of computers. Which means I need to find out how long each login session on each computer lasts and eventually get that into a graph of the computers I'm interested in. Sounds easy.
However, its rather hard to find other bits of code on the internet of people collecting usage statistics from a group of Linux machines. I had to write a PAM module not long ago and it was easy to use the session functionality to make a report for each login of how long that session lasted and send that to the XMLRPC interface that collects all this mess. It works, but why was adding this functionality into a PAM module the easiest thing to do? Isn't there an easy way to lift this information from the system with a bit of Python?
I've been looking at wtmp. It stores the data I want but its horrible to work with. I either need yet more C code to work with a bad API or a C Python module to still work with a bad API and grok wtmp's strangeness. I could screen scrape the "last" command, but that's really prone to error with the way it represents dates.
How do other folks mine this kind of data?
Tuesday, July 03, 2007
More T61 Goodness
I have built patched modules to drive the Intel 4965 wireless card in the ThinkPad T61. A lesser version of the same code is available in later Fedora 7 kernels but the PCI ID is different for the card in the T61. I've build Fedora Kmod packages that build the iwl4965 module from the iwlwifi-0.0.32 package. Its not the neatest package I've build, but it works.
Also, there is a small bug in the new Inte 945GM video driver where the driver is attempting to scale the image even though the LCD panel is running at its native resolution. This produces an image that doesn't appear "crisp" or appears out of focus. So here are some rebuild xorg-x11-drv-i810 packages with the proper patch. I found this out from the following post: http://www.spinics.net/lists/xorg/msg25117.html
The packages are here: http://linuxczar.net/code/t61
Saturday, June 30, 2007
ThinkPad T61 and Fedora
I'm the proud owner of a new Lenovo ThinkPad T61. It has the new 965GM graphics chipset as well as the Intel 4965 a/b/g/n wireless. The T61 is currently only available in widescreen and I have the 14.1" 1440x900 model. So far I've been fairly impressed, but it being a new laptop there are always a few tricks to get it working.
Fedora 7 has the new Intel drivers that drive the 965GM and X seems to work fine. However, it appears as if Gnome doesn't understand the widescreen resolution. GDM, the graphical boot loader, and Gnome itself seems to only want to work in a 1024x768 window in the upper left corner of the screen. In Gnome, I can re-adjust the panels (except the top panel) and move windows and use the space outside the box but the below image is what things look like per default. You can see that the resolution selection app seems to believe that 1024x768 is as high as we go. Does anyone know a solution to this?
The Intel 4965 wireless does not work out of the box. However, Intel has released drivers as part of the iwlwifi kernel modules. The most recent Fedora 7 kernel (2.6.21-1.3228.fc7) has an older set. It looks like using a newer snapshot of this project and getting the microcode for the wireless card should enable this to work fairly well. This article has some details for getting the iwlwifi code to work on Fedora 7.
Sunday, June 17, 2007
Configuration Management
Configuration anagement (CM) is a critical point of doing large scale, or even small, systems administration. Its more than overly important that your various machines pick up new and updated configuration files easily and in a timely matter. At NCSU, I've been doing what's counted at CM using a python project I call Realmconfig. Okay, Realmconfig has been around at NCSU managing linux machines longer than I've been its maintainer. As Realmconfig developed, it gained more and more CM-like features such as a arbitrary collection of modules than run at boot to handle initial configuration. One of these modules "manages" a selection of files, if the file isn't identical to the gold copy its replaced with the gold version. Generally, its worked well for initial configuration put pushing out changes hurt. They hurt bad.
Said module that "manages" files can either run once, run every boot, or run only when I bump its version which requires a new Realmconfig package. Run once handles inital configuration but ignores any updates. Run every boot sees updates, provided I've included them in a new package, but is draconian in applying those updates. Certain systems have modified configuration files in place and need to keep them. Only running when the version is bumped is a compromise, but I end up with the worst of all the problems.
Obviously, I need to move away from a haphazard collection of simplistic scripts to something that can scale to an environment of thousands of machines, not require a new RPM package for any update (unless one solely decides to do CM by RPM), propagate updates easily without weird scripts, ability to handle restarting of services and small scriptlets, and allow for administrators to override/replace aspects of the configuration I've provided.
The last point sounds a bit odd. Allow my configuration to be overridden? Most CM systems scale well but are designed around centralized administration. That's a little different than what I will call centralized management. Centralized administration is one or a group of system administrators that act as one, unified entity to manage machines. In this case they are all trusted and working together to build and maintain their infrastructure. Any configuration changes made from outside the centralized administration are, by definition, not approved and should be quickly reconciled with the known good configuration. Or, more simply, you have a compromised machine.
However, I work at a university with many fiefdoms. Seldom is anything done that may be perceived as giving away direct control of something to another fiefdom. Fortunately, systems administrators are normally smart folks and understand that working together across fiefdoms they can achieve bigger and better things. Some, however, don't. So what we have at the university are modified versions of Solaris, Windows, and Linux that we (the central IT folks) make available to the university. The colleges, and departments can deploy these "kits" as they need and leverage centralized management of the machines and deploy their own labs, workstations, and services. Most importantly, they can still be "in control" of the machines themselves.
So, where does this leave us in the realm of Configuration Management? I require a system where I can push out changes to all the managed linux machines on campus. Also, local systems administrators that may not be trusted with all the configuration of all machines may wish to add configuration and have it enforced on their machines. Its possible that there might be a third layer as well. Also, if a local administrator decides to manage a file I also manage we need to do something a lot smarter than replace it with the global copy. We need to merge, or make sure that their files remain intact and ignore the global changes.
I'm not aware of any CM tool that this flexible. I've been looking at Bcfg2 and will spend some more time with it as well. For a CM it seems designed well, stays away from inventing new languages, scalable, and is written in Python. We'll see how it plays out in my testing. An important part for a useful CM tool is something that there is a community around rather than some random code I wrote. Bcfg2 has a very active community and maintainer.
Now we get into my crazy ideas. Toss in the hopper that configuration should be managed by some sort of SCM so that we have backups, machines can have their configuration rolled back, and a log is kept of why the configuration was updated. Suddenly, to me at least, we have the use case of a distributed SCM. Each machine has its own repository where configuration changes can be made locally and the machine can pull its configuration from any other machine, by default a master repository. An easy way to make a configuration hierarchy. We just need to be smart about automation and conflicts.
Using Git and pretending we have a useful data schema and tools to make use of it, how do we manage the magic local repository based on another machine's?
- git clone SOURCE_REPO
- git branch upstream origin
- git pull origin :upstream
- git merge -s ours upstream
- System distrubutes configuration files. Local admins can commit their own configuration to HEAD.
- Goto step 3.
Probably complete crack.
