April 2003 Archives

April 22, 2003

Knowledge work and programming as craft work

Two good pointers in an article by Jim McGee today. Jim's note combines a pointer to an interesting article by Alan Cooper asserting that "commercial programming is clearly a craft", not an engineering activity. McGee also points to an even more insiteful article on knowledge work as craft.

Cooper is the author of one of my favorite books: The Inmates Are Running the Asylum : Why High Tech Products Drive Us Crazy and How To Restore The Sanity. I talked about his work in an article last year linking Cooper's work with extreme programming.

rsync: one of the wonders of the CS world

One of the pleasures of studying computer science is coming upon a really clever way of doing something. I've been feeling that pleasure lately about the rsync algorithm.

Andrew Tridgell invented the rsync algorithm. I'll quote from his excellent paper (co-authored with Paul Mackerras) to describe the problem:

Imagine you have two files, A and B, and you wish to update B to be the same as A. The obvious method is to copy A onto B.

Now imagine that the two files are on machines connected by a slow communications link, for example a dial up IP link. If A is large, copying A onto B will be slow. To make it faster you could compress A before sending it, but that will usually only gain a factor of 2 to 4.

Now assume that A and B are quite similar, perhaps both derived from the same original file. To really speed things up you would need to take advantage of this similarity. A common method is to send just the differences between A and B down the link and then use this list of differences to reconstruct the file.

The problem is that the normal methods for creating a set of differences between two files rely on being able to read both files. Thus they require that both files are available beforehand at one end of the link. If they are not both available on the same machine, these algorithms cannot be used (once you had copied the file over, you wouldn't need the differences). This is the problem that rsync addresses.

This may sound a bit abstract, but it's a problem that occurs all over:

  • Keeping large software mirrors in sync
  • Incremental backups
  • Web caches (consider a university/company where everyone is loading up CNN.com: the home page is almost (but not quite) the same for each person.)

Eric Raymond famously opined that "Every good work of software starts by scratching a developer's personal itch." That's certainly true in Andrew Tridgell's case: Andrew lives lives in Australia. Andrew describes Internet connectivity from Australia as "very high latency, very low bandwidth link... a typical Internet link, at least if you're in Australia. So, a piece of wet string, a really pathetic link.." The latter description comes from an transcript of an absolutely splendid talk Andrew gave at the Ottawa Linux Symposium in 2000.

The rsync algorithm is embodied in the rsync tool, which is available under Unix and Windows (if you're using Cygwin). I've recently been playing with rsync to back up the core files from my home Linux box to another machine to back them up. The basic idea is to first make a copy of of the files on the other machine, and then run rsync periodically to keep up with any changes. The most striking example occurred when I uploaded some files up to a machine I have access to out on the Internet. I have a DSL connection. Upload speeds max out at about 200k bits/sec maximum. (Downloads go about about 1mbit.) When I did the initial copy, it took me about 6.5 hours to upload 1.5 gigabytes of data. After another day or so, I used rsync to update. I had changed some rather large email files in the interim. rsync's stats claimed that 33 files had changed. If I'd had to recopy those files again, I would have transfered 176 Mbytes. But rsync was able to sync up those files by transferring only 2.4 megabytes of data, and did the update in 4min 44 seconds. Wonderful!

If you're interested in these sorts of things, Trigdell's rsync paper is excellent. The talk is great fun as well; Trigdell leads you down some of the false alleys he went down while trying to develop rsync.

April 14, 2003

A great thing: hotels with ethernet

Hotels with free DSL are wonderful. I spent the night at the Hilton Garden Inn in Arcadia, CA; I'd read that the hotel had Internet access, but it was a pleasant surprise to find out it was free. Speeds are pretty good, too - a DSL speed test last night said I was getting 975k/bits/sec down and back up to the net. That upload speed is impressive; my Bellsouth ADSL service gives me something like 1.2 Mbits down, but only 256 k/bits back up. (If this weblog isn't responsive, that's why.)

After I check out, I guess I could also go out and find a Starbucks with T-Mobile's WiFi service. Looking at their site this morning, they do have a pay-as-you-go plan at .10/minute. That's not too bad, but they have a 60 minute minimum, meaning it's at least $6 every time you login. They also have a $50/pre-pay plan with no minimum, but the minutes expire after 120 days. That's not so good for me. I don't drink coffee, so I can't see spending 8.3 hours in coffee shops with my laptop over four months.

Pet peeve: flight attendents who give the wrong time

Here's a pet peeve: flight attendents who come over the PA system to tell you the local time in the city you're landing in, but then get the time wrong by a couple of minutes. Ok, so that's pretty picky, but if you're telling 100 people what time it is, shouldn't you at least reset your watch once in a while? Coming into Phoenix and then Los Angeles yesterday, on both legs they got the time wrong by four minutes. Not much, but when you're tring to make a tight connection, you want know know how much time you've got.

There really is a blue penguin

There actually is a species of penguin known as the blue penguin. The New Zealand Penguins site has a page on blue penguins, and claims they are the world's smallest penguins, standing about 25cm (~ 10") tall. The page also has a button to hear the call of the blue penguin.

All quite curious. I didn't pick my domain name because of any great interest in penguins; I just liked how it sounds.

April 11, 2003

Blissful ignorance: Radio's lack of logs hides my sins

Paul Beard will do doubt mutter the West Coast equivalent of "duh!" when he reads this, but since I moved this weblog to my own box and can look at the web server logs, I'm discovering a number of bad links in items I've published over the last year. Not dead links - links that were once valid, but no longer are - but links that were incorrectly constructed in the first place.

When I was using Radio and hosting with Userland, I never saw the logs, and I was blissfully ignorant.

Of course, it was my fault that these links were bad in the first place; I published the items but never bothered to check them after they were published. As my friend John Sampson says, "Your gun, your bullet, your foot."

I know there are things that aren't right yet on this site. Part of that is caused by the way I moved over from Userland: I used wget to fetch a copy of all my pages from Userland (attempting to fix local links along the way), and published a copy of those pages on my new site. I then told radio to add a Meta Refresh tag to all my pages at Userland, directing them to the same URL at my new site. The result is that people see who hit my my old Radio pages see the same pages on the new box. But as I've pointed out, there were problems with some of the links originally, and I've added to them with the move.

What I really want is for the old URLs to work if people change radio.weblogs.com to weblog.bluepenguin.com in the URL. Over time, the next step will be to put redirects in place, redirecting from the the Radio-formatted pages in the old paths to the same content published from Movable Type. If I'm willing to dig into mod_rewrite again, I can probably make it all work in a couple of lines in my httpd.conf.

April 9, 2003

I can't help myself (cat pictures)

I know I really shouldn't bother posting this .. .but I just can't help it. Any reference to an album full of pictures of real cats (wild and not-so-wild) has to get a mention from me. Phil Windley writes:

To combat my boredom, I started playing with these new things called web servers. We set one up at http://lal.cs.byu.edu. We even set up a web site for the University (without any explicit permission) and we'd teaching professors and grad students from all over campus about HTML, browsers, and web servers. One of the most popular things we did was called the LAL Cat Archive.

Because all the machines in my lab were named after cats (like panther, jaguar, etc.) one of my grad students, Kelly Hall, started to collect pictures of cats from various places. For a time, this collection was the most popular content at BYU and even got a mention in Newsweek magazine. Eventually, we had to take it down because of the bandwidth drain, but the pictures have been preserved and I've made some of them available in my photo gallery.

I've taken a cat pictures, but nothing like the hundreds of pictures on Phil's site.

April 8, 2003

Insightful thoughts on weblogs and KM from Jim McGee

Jim McGee of McGee's Musings has had a run of great posts lately, mostly around the area of knowledge management and web logs. Rather than repeat everything he has to say, I'll point to some of the things I liked best of late:

Investing in knowledge sharing - starting on the weblog learning curve
Weblogs are only the latest in a long line of tools aimed at getting people to work together. Touches near a favorite point of mine: almost any tool will work for those some.
Knowledge work, weblogs, and fair process
Pointer to and comment on a Harvard Business Review article by Chan and Mauborgne with a very telling premise: "employees will commit to a manager's decision--even one they disagree with--if they believe that the process the manager used to make the decision was fair."
Thinking in public, part 2
A reader suggests that what we need is tools for "thinking together". McGee suggests that this is too big a step: thinking in public is hard, and thinking collaboratively is hard. "Thinking together" implies both, and that's too big a hill for most.

Thoughtful stuff. Along the lines of knowing who you're listening to, McGee's bios suggest that he's been at this a while. (His reference to The Network Nation was enough to convince me.)

GoDaddy - get your domains cheap!

Back in January when I finally decided on a domain name for my own use, I used Dotster to register it; $8.95 to transfer an already existing domain. Dotster seemed like a good deal; $12.95 for a new registration. This last weekend I had to register three domains for my wife. I was ready to turn to Dotster again, but she said she'd already done some research, and thought GoDaddy was cheaper. Sure enough, GoDaddy will register a .com domain for $8.95/year. I registered three domains with no particular hassle. Seems like a good deal.

GeoURL - see who's near you

I've added a GeoURL button on my page. If you put a couple of meta tags in your page and get GeoURL to ping you, you can a button to your page that will list sites that are geographically close to you. Fun!

My list includes at least site I know - b.cognosco -- Terry Frazier's site. It doesn't include Nicest of the Damned, and I'm pretty sure Frank Steele lives within a few miles, so he has to get on the stick.

April 11: Poem In Your Pocket Day

April 11 is Poem In Your Pocket Day. The New York Times has been running a series of ads to promote national poetry month. The ads suggest that you keep a poem with you on Friday, April 11th, and share it with others throughout the day.

While looking through our bookshelves for something to read, I came across a wonderful book called Committed to Memory: 100 Best Poems to Memorize. It's full of poems that beg to be read aloud. I think I'll find a poem from this book for my pocket.

(Thanks to Dan Lobby for posting the NYT ad on his door.)

April 7, 2003

Google says 0XDECAFBAD likes me ..

I got a new domain back in January. I've been meaning to move this blog, but I've been lazy about moving off Userland's web hosting. Userland's recent email telling me I had until May 1 to send them $40 gave me the incentive I needed to get this done.

Google knows nothing about my new site, but it knows lots about my old site. Since the best way to get noticed by Google is to have other people point to you, I'm using Google to find out who's linked to me. Doing an Advanced Search looking for who links to radio.weblogs.com/0106188 gets the job done. But if you take a look at at the list of pages Google claims links to me, you'll notice something interesting: Les Orchard's 0xDECAFBAD seems to be inordinately fond of me. Of the 276 links Google claims to find, 0xDECAFBAD seems to represent at least 80% of them. But if you go look at any of those pages from 0xDECAFBAD that Google says point to me, you won't find me mentioned anywhere. What gives?

The answer lies in looking at the cached page for each one of those. In Les's old site design, he had a blogroll on every page, and I was part of that list. That's flattering, but it makes it kind of difficult. Since 0xDECAFBAD represents so many links to my old site, Google might prefer the old site over the new.

And here's where my procrastination pays off. If I don't renew with Userland, the old Radio-hosted content should vanish at some point. When it does, Google will ignore all those old links to me. I hope.

This weblog has moved (the other side)

This weblog has moved to a new, permanent location: weblog.bluepenguin.us.

I have two new RSS feeds: an RSS 2.0 feed with the full contents of each post, and an an RSS 1.0 feed with abbreviated posts.

Important note: the old location where my weblog was hosted (radio.weblogs.com/0106188/) will disappear on May 1st, 2003 when my annual hosting agreement with Userland expires. Please update to my new site ASAP.

I've moved all my content over this new site, where it appears in somewhat different locations. However, to make it easy to preserve links, you can also use the old URLs (with the new hostname, of course) at my new site. For example, my essay on VMware which was at http://radio.weblogs.com/0106188/stories/2002/10/08/vmwareMyNewBestFriend.html will also be found at http://weblog.bluepenguin.us/0106188/stories/2002/10/08/vmwareMyNewBestFriend.html.

Uh, who writes that weblog?

Lots of weblogs have catchy names that have stuck in my head: a klog apart, In My Experience, slam, b.cognosco (formerly Blunt Force Trauma - a great name!), and so on. But with the exception of blogs run by old colleagues (quotidian and Nicest of the Damned), I find it hard to remember the names of the people who run these blogs. Jon's Radio (now titled Jon Udell's Weblog) works, because I can remember it's Jon Udell. This lack of connection to the the person's name is unfortunate, because one of the draws of weblogs is attaching a name to the all the elements of a website - the written style, the look and feel, the essence of the person that comes out in what and how they write.

I toyed briefly with calling this weblog From the Blue Penguin. That's cute, but I'm not a blue penguin, and although I like them well enough, penguins have no real fascination for me.

I'm going to punt. If it's good enough for Jon Udell, Paul Holbrook's Weblog is good enough for me.

Oh, and just to give myself a shot at remembering these: "a klog apart" is Phil Wolff, "in my experience" is Daniel Kapusta, "slam" is Marc Barrot, and "b.cognosco" is Terry Frazier. (Actually, I did remember Terry, but that's almost certainly because we've met in person.)

A new weblog (sort of)

I've been threatening to move my weblog off Userland Radio ever since I got my own domain name. The imminent May 1 expiration of my Radio hosting pushed me into taking the plunge. Somewhat less than three weeks isn't long to let the changes propogate around, but so be it.

The the subtitle of my Radio blog was Worth $40 a year? You decide ... It was certainly worth $40 to get a jump-start into blogging, but the limitations of both Userland's tool and the inflexiblity of hosting on their server (no logs!) pushed me this direction.

I'm moving over into using Movable Type. I've been using MT at work since last summer, and I've been very pleased with it, so this will feel more comfortable to me than Radio. And using Radio was always dependent on being able to have it up and running on my home PC.

I've done my best to preserve all the old content. I've both moved all the content into the new structure, and I've made copies of the userland pages into the same URL structure (aka /0106188/2003/01/29.html). Over time, I'll replace those old references with HTTP redirects to the newer version of the content.