Internal borders

Created: 11 Mar 2010

I’ve had this idea in my head for some time that the way people run IT organisations is wrong: they’re too fragmented into subject-specific areas. Then the DevOps guys came along and started trying to encourage developers to work with ops people, which is a start. But I’m not satisfied with dev-ops collaboration; I want ops-ops collaboration and I had a good old rant to the Build Doctor about it over an ale or two. Kris Buytaert followed up with a blog post describing some of the tensions he sees.

Splitting your operations people up into teams of DBAs, Systems Administrators, Network Engineers, Storage Engineers and all the other ops disciplines is probably causing you pain. You probably don’t even know that it’s sickness; you probably just think the symptoms are part of everyday life in IT: glacial progress on projects and issue resolution and a lack of interest in the business goals.

The first symptom is that to get some things done seems to require enormous willpower and dedication. These are simple things that require work by multiple teams. There’s a delay while the task crosses the internal borders between each team and they try to understand the request. There are times when people forget what the process is and it doesn’t get routed correctly to the next team in the chain. It’s unlikely that any of the teams have the inclination or the understanding to test that what the customer wants has actually been done. What they test is that their tiny piece has been done.

It seems as if every time things cross internal borders between teams there’s a latency cost while you wait for them to do their thing. There’s also a cost of doing business with your colleagues, something familiar to anyone who has been asked to fill out numerous mandatory fields in a form to request that someone sitting within feet of you make a relatively insignificant change. The more team boundaries you cross, the more these delays and costs mount up.

The second symptom is that operational issues can take a long time to resolve. In my experience this is particularly true of performance problems. Maybe your users are complaining about some report taking minutes to complete and you pass the ticket onto your application support people and they say, no, there’s nothing wrong with the application, maybe it’s the database? So the problem baton is passed to the DBAs and they have a look at the database, declare that the database is OK and ask the network guys to have a look. The network guys mutter about 5% utilisation or something and say the network is fine and they pass it to the sys admins, who mutter something about 20% CPU usage and say everything is fine. And now, probably a day or two later, you have a bunch of technical people who have checked their personal fiefdoms are fine and a bunch of angry users who are still have reports that take too long to run.

Of course, all these experts could be right, but it doesn’t matter, because optimising the network, the databases, the storage or the servers in isolation is absolutely useless. The report could be running slow because there’s an extra few milliseconds of latency between the server and the database and the report does 10,000 database queries, which return a lot of data and the milliseconds add up to a long delay. Unless someone sits down and works out what is going on, while understanding the whole technology stack, your users are going to have to put up with it.

The third symptom is that none of your operations staff seem to actually care what the business wants to achieve. The human mind is odd. As soon as you put people in discipline-specific teams you seem to be sending them a subtle message that their job is not to meet business needs, but to look after some arbitrary technical resource. Don’t, therefore, be surprised if the DBAs care more about databases than the business goals. By putting them into the “DBA team” you’re telling them that their job is to look after databases. By implication, the business goals are secondary.

Putting people into a bunch of specialist teams doesn’t seem to be the right way to do things. There has to be a better way.