Good morning Heath Seals...
I recently switched employers (four months ago), and am already switching again. This is in no small part due to how my current employer manages problems. Every single issue is a crisis, because every single issue causes mass confusion. Well, mass confusion is probably a little harsh, but let me give you an example.:
1) Conference call/Phone call 2) Sametime Meeting/Sametime side chat 3) Lotus Notes email - Multiple Recipients (tech pages) / Single Recipient 4) Lotus Databases (To many to count) 5) Lotus Universal Transmittal System (Yes ticketing systems convey information) 6) ServiceCenter 5 & 6 7) Shift Turnover (in Lotus Notes) 8) Web pages (one per group http://ssg http://ots http://ccg etc.) 9) Run Books/Word documents/Excel spreadsheets such as contingency test plans and server location 10) Meetings/Talking person to person There are more, but If these aren't enough to make my point, then more won't help. Furthermore the majority of these communication channels are "dirty" which means that a signifigant percentage of bad information gets conveyed along with the good, and there is no mechanism to correct them after the fact. Websites and runbooks over 6 years old with no date on them anywhere, so the reader can't even tell if the information is stale or not. So when bad information is conveyed, then people start picking up the phone and calling to get clarification. The result in a large organizaton is that the majority of the time is spent trying to figure out what needs to be done, or even what is going on. Even the most simple problems quickly escalate to incidents, as more and more people are contacted through one of these narrowcast/broadcast mediums until the one person who knows what is going on and how to resolve it is contacted, and fixes it. Does this sound familiar? Well that's great, but what can be done to fix it? They've been doing things this way so long they're used to it. 1) Use a Wiki. All documentation that needs to stay around more than a week (Web pages, Run Books, server layouts, etc) need to be in a wiki. Why? Because this information will need to be updated (really, it will) and when the reader finds an error, they need to be able to annotate the documentation as needed. Having the reader tell their manager to tell the author's manager to tell the author is insane. With a wiki, they can fix the error "on the fly", and the author can be notified (by email) that someone has edited his or her page, and review it without getting 2 other people involved. Wiki's also keep track of the complete history of a document and tell you how old the document is, and who made what changes on every revision. As a result, your documentation gets better over time, and you decrease the number of people awakened in the middle of the night to clarify bad documentation. It is also a one-stop searchable repository for your documentation. 2) Stop using email groups and set up a listserv. Email is a great colaborative tool, but it fragments your conversations into dozens or more of small inboxes all over your network. With Lotus notes, you can't even sort by subject or do a global search the contents of the disparate folders, I can't search your email for something I might have removed. A listserv will pool all of the emails on a given problem, archive them, and they can be indexed for searching by everyone. This will immediately cut down on "clarification calls" and anything covered in the list that needs to be more permanent can be added to the wiki. Entire list threads can be added to the wiki. You will never have to solve the same problem twice. 3) Lose Sametime and set up an IRC server. Sametime has to be one of the worst chat clients ever produced. Anytime you have to message someone to ask them to invite you into a chat, you should realize something is wrong. Every time you accidentally hit <esc> and it closes your window without prompting, you should realize something is wrong. But those annoyances aside, there is a reason IRC has been around as long as it has: It's a great utility. You can create chat's on the fly, so you can even have one chat per problem ticket, and log the chat (automatically) and save the contents of it to the wiki. Don't even get me started on the usage of bots . They are very useful. And users can use the irc client of thier choice to connect. Believe me when I tell you that user customization increases efficency, that's a whole rant in itself. But being able to log the chats and go back and see "how did we solve that problem the last time?" and then, yes, you guessed it, add it to the wiki. 4) Get a ticketing system with an open API If you haven't figured it out yet, the theme of this rant is to unify your documentation. Having a ticketing system that makes you load a fat client to get information out of is rediculous, you should be able to link to open and closed tickets in a browser, then you can reference them in the wiki, and your tickets aren't just thrown out after they are implemented/resolved. They instead become a living part of your documentation system. An open standards web-based ticketing system is your best bet. Something like bugzilla or trac (which has a wiki built in) would work nicely. 5) Stop using the phone. As for the phone calls, it would be nice if those would go away altogether. They can't be logged, searched later or used to update your documentation after the fact unless someone is transcribing them. Phone calls are a horrible problem management tool. But if effort is spent unifying the wiki/listserv/irc into a single colaborative tool, you can minimize your need for phone calls. Maybe a system for logging phone calls could be developed using a tool like Asterisk, which would also give you voip capabilities. The entire conversation could be replayed and even timestreched and frequency compensated like in MythTV, the technology exists, but isn't ready for prime-time like the wiki/listserv/irc are. If you think of your organization as a large distributed parallel computer system, with each person as a processor, then the wiki becomes your "shared memory" and every "processor" can read and write to it. It will decrease the latency in propagating *accurate* information out to different nodes, thereby increasing accuracy and efficiency. Or is that too nerdy of a metaphor? Suggested Reading: "The Mythical Man Month" -- Frederick P. Brooks
Your network is out of control, or more accurately, at the edge of out of control. That may seem a little harsh, but allow me to explain. First let's address the term "control", it has been defined as, "the power to direct or determine." In a system it would mean the ability to know (determine) what state your system is in, and the ability (power) to make (direct) that system to another state. In this context, "system" does not mean a single computer system (although it could) instead we want to think of all of the systems on the network as a single system.
If the heating and cooling people were in different groups, then the heating instructions could be in one manual (or web site) and the cooling instructions in/on another. dn: cn=file01,ou=Hosts,dc=yourdomain,dc=com objectClass: top objectClass: ipHost objectClass: device objectClass: customhost descCpu: 1 descRam: 524210176 operatingSystem: Solaris 8 uniqueId: a8c0e73d ipHostNumber: 192.168.1.231 hardwarePlatform: sun4u filesystem: /opt/iplanet WEB filesystem: /opt/sybase/SYBSOMETHING DBA Service: TCP:8080 WEB
Now if a problem comes in involving file01 such as a failed maestro/cron/ at job or connection refused, we can use an LDAP search utility to know that the problem is owned by the DBA group or WEB group depending on which host or file system is in question. Furthermore there are several utilities that can access and update the information in an LDAP directory, so tools can be written (quickly) to index and search the information. dn: cn=weblogic, ou=Sensor,dc=yourdomain,dc=com objectClass: sensor objectClass: actuator host: file01 descSensor: Weblogic Threads cmdSensor: /bin/ps -ef | /bin/grep "wls" | wc -l valueSensor: 3 descActuator: Bounce Weblogic cmdActuator: rsh ${HOST} /etc/init.d/weblogic restart tryActuator: 5 delayTry: 60 ownerEmail: james
This way, there is a clear definition of what should be running where, and how to correct it, how many times to try, how long to wait before attempting another restart, and what group/user to contact if it fails all attempts. This is only an example. How much information we tie to a sensor and/or actuator is completely arbitrary. And this is really going above and beyond the scope of identity management. This would be the desired endgame to completely streamline problem resolution, if not completely automate it.
Some of you might not have heard, I wrecked the pod.
It didn't really occur to me when I got the pod that there might be hidden advantages to effectively camping all year. Well, there are disadvantages as well. Winter pretty much sucks, especially when you lose your injector to you primary propane heater, followed by your secondary. Luckily there is a non-vented tertiary heater that can be used for short periods of time without fear of asphyxiation. It was enough to heat up the pod so the ceramic heaters (electrc) could maintain the temperature. But now spring is here, which means I can open the pod windows (hatches) and enjoy the weather. Of course the majority of my neighbors are an vacation or retired. So It's pretty much a festive atmpsphere year-round. But now that spring is here, It's even moreso. So I work a late day today, and I get home and sit down at the console and and prepare to make sure the snapshots have run properly. Before I know it's 9 PM and I realize I haven't eaten. I get up to mke some soup (I don't really cook much. It's a waste of good system time) and one of my neighbors brings me a tenderloin, and peas. They were having a cookout. This wasn't the first time this has happened either. It was pretty awesome. I mean how often do perfect strangers come by and bring you a steak? I mean c'mon, free steak.
Well it has been over six months since I last updated the site, so I figured it's time. By now I'm sure no one is even looking at the site anymore, having long quit caring. That makes now the perfect time to update it. I have been contracting as a Unix Systems Admin for CAT Financial in Nashville, TN since July, and they have been keeping me pretty busy. Le Charget is having bodywork done and so that means spending more money on it, as well as it being techincally "inoperable" as with no paint anywhere, no vinyl on the roof, and the bumpers and trim missing it isn't much to drive. Pod Life has become pretty routine. I am now referred to as "The Pod Guy" around the office. So I registered the domain and one of my co-workers made me a logo. Apparently I look a great deal like Peter Griffin from Fox's The Family Guy Eh? what can you do? I took some more pod photos (at the bottom) mainly of the new MythTV/Powefile jukebox system.
So I'm getting out of my car last night and just as I'm slamming my car door, my phone slips. It manages to fall between the car door and the door frame, just in time for the door to close. I hear a *crunch*. Well I get bad service where the pod is located, so I figure why not try to switch from TDMA to GSM, It's not like the cell phone companies are expanding the TDMA networks anyway. So I can either buy a new phone, or process it as a relocation and get a discount. Being that I am out of work, discounts are good. So $150 later I have a spiffy new flip-phone and a shiny new number. (256) 566-5866. I will try and contact as many people as I can, but I have to start with *EVERY DAMN JOB SITE ON THE WEB* But I'll rant about them a little bit later. Keep in mind that I probably don't have your number If I don't call you once a week, as all of that information is in my dead phone.
Well, in my own little personal hell, that is. I called the mechanic to see what the status is on Le Charget . I put in in the shop on Feb 25th, so I am happy to hear that it's ready to go. I'm not so happy to hear that one of the mechanics wrecked it into something. So I get a ride (3 hours) down to Montgomery. Now three hours doesn't seem like a long time, but when all you can think about is how bad your car has been wrecked, it's an eternity. So I get there and the damage is pretty damn bad and I am not a happy camper. I need to know how much the damage is so I drive to get a damage appraisal. Here's an interesting fact: Most body shops will not even talk to you if you have a vintage car. Who knew? The third body shop I tried directed me to a shop that specializes in classic muscle-cars. I run out of gas on my way to this shop. Now it's important to note that the car had *at least* a half tank of gas when I brought it in to the shop. I was in the mechanized infantry for 4 years and am permanently mentally scarred to the point of being paranoid when it comes to running out of "class 3". As a result, I never have less than a half-tank of gas. I also feel it is important to note that there is a $21 GAS itemization on my bill. This means they went through *at least* 25 gallons of gasoline while my car was in their possession. That could be another rant. I "top the car off", and the paranoia subsides. I make my way to the only place in town where I can get a quote. $1,759.50 + $325 for the drivers-side floor carpet the mechanics tore. The chrome on the bumper has a groove in it so it needs re-chromed or replaced. The pictures tell the rest of the story. Now on the way back to the mechanic's shop, I notice my amp meter is pegged, that means it's pushing more than 40 AMPs, not volts, AMPs. Now I can run everything in my pod on 30 AMPs if that is any indication of the severity of the problem. And the driver's side window will not go down. This is important too, because I had brought the car to the shop, four months ago, to have the near-melted wiring harness replaced, and to get the windows working. So It took them four months and a little over $2000 to not fix my car, and wreck it, *twice*. But I'm not bitter. So I have to leave the car there, because it's if it ran 40+ amps for the entire 3 hour trip home, I would die in a firey explosion. That's not high on my TODO list. But if this keeps up, I might raise it a few pegs. So Larry and I will be going down next week with the trailer, so we can bring the charger back in it's natural state, on the bed of a trailer in tow.
I took some photos of the pod, It still needs some work but I'm pretty much living in it now. I plan to lock down the computers, so they will not move during transit, but other than that It's ready to move. So should I find work in anywhere I can be moved and set up in about 48 hours. The computers are operational, but I'm currently leeching broadband from the parents. It's Starband and as a UNIX guy, who spends most of his time in a remote shell, I must say the latency is unbearable. There is up to three seconds of lag between keystrokes. So I type it blind and backspace when there is an error. It's tedious. If is wasn't for openwebmail, I'd have to stop using email, pine is my primary client and it's unusable in a remote shell.
Well, I've got the public content back up. The bookshelf will be down until I get work or find someone willing to host it. My UML linode has limited space and such. I'll see what I can do, to get it back up, though looking for work is my priority right now. I'm going to try to get some pod pictures up this weekend, if anyone cares. Mostly they will be provided to answer the "How can you live in a pod?" questions and the like.
Due to finiancial situation at my previous place of employment, I am no longer employed. It will take me a litte bit to get the content off of the old site, what with moving out of Montgomery and all. The server previously serving this site is powered down and in storage. I spend most of my day looking for work, and the weekends rennovating the pod. If anyone knows someone in need of a systems administrator who pretty much eats, breathes and sleeps for this computer stuff, pass them along my resume Have pod, will travel. Thanks.
I was rousted out of bed at 5:30 this morning by the chirping of computer-lab crickets, every UPS in the house was beeping. I have a flashlight that I keep right next to my bed in the event of a power outage. I realized at this point that I had moved my flashlight so that I could do see into a computer in the living room, grrr. So I look for my lighter, a zippo I carry because I'm comforted by the fact that I can make fire, I guess. It's out of fuel, grarrrr. So the 7 candles I have at hand are now useless. Now I normally keep my lighter fuel on the counter near the kitchen sink, but I had moved it so that I might cut some plastic with string. Don't ask. Luckily I know I have another flashlight in Le' Charget, so I head out to the garage, locate it, and it won't come on, *ugh*, so I toss the batteries and stumble around the kitchen until I find some fresh ones. No joy, The bulb is bad. *rage building* So I have to grope around in the dark in my workspace using my cell phone backlight for illumination, if you can call it that, until I find my lighter fluid. I manage to locate it and promptly refill the zippo blind. Now it has plenty of fuel. It still won't light. The wick is scorched. Now I'm livid, and the UPSes are beeping like crazy. I manage to shut down most of them with ctrl-alt-del, wait for floppy light and power off. I locate my leatherman and manage to pull the wick out, and it lights. I now have candlepower. And as soon as I light the first candle, I see my flashlight. I would go back to sleep, but I'm now furious. So I drive to work, only to realize I left my driver side rear window down last night, so I get to splash in the puddle that is my floorboard all the way to the office. At least it's not Monday.
Following my current involvement in a flame war on the MALU.org mailing list I decided to type up the following for the similie impaired If I offended people on the list, then they probably needed to be offended. But religious devotion should not be a replacement for logic... Even if it is for a cause as good as linux advocacy. PREMISE A: Any two objects that share a similar behavior or property can be said to be alike. PREMISE B: (object one) The GPL applies itself to any code to which it it linked. (object two) A virus applies itself to any tissue to which it is conjoined. The two lines above describe *similar* behavior between two objects. (not exact, explicit behavior, but similar behavior) The GPL and a virus share a similar behavior. Conclusion: The two objects, The GPL and a virus can be said to be alike. Therfore: The statement "The GPL is like a virus." is correct. Neither consent, desire, nor the properties of the LGPL, Apache, or any other Non-GPLed license nor pro-opensource ideals have any effect on the premises or logic above. Note, that usage of a similie does not mean that the GPL is a virus, just that is shares a property or behavior with one. If one of the premises above is incorrect, by all means, let me know. Hell, I'll even post and addendum right here. But don't email me if you cannot refute a premise above. Telling me how great the GPL is doesn't effect the logic above. I'm a big fan of the GPL. I use Linux on 7 of my 8 computers including my primary workstation. But I am also capable of emotional detachment on this topic, which allows me to analyse the facts. Sad to say this is not true for some of the people on the list.
Derek and I have discussed this more since the flamewar. The conclusion we've come to is that while it is true that the GPL can be said to behave like a virus, It is not the most apt analogy, and the negative stigma to the word 'virus' makes the statement inflamitory. it is just as apt (if not more so, depending on your point of view) to say, "The GPL is like Hershey's syrup swirling about in the delicious creamy milk that is your code." But both statements have a bias.
Well I'm finally moving over to the new system. Basically a bunch of home grown mod_perl scripts as slashcode has become so bloated as to be useless. It's too bad, I really liked the look and feel of slashdot, but I felt I needed to get away from it. So expect everything to be broken for the next couple of weeks as I get organized. I don't have a lot of free time to migrate the functions, so I'll be doing it when I can. While the migration is going on, the old (and undoubtably broken) site will be available at: http://www.jameswhite.org:8888 *note: this server will be going up and down* |
|