Lessons from the Mainframe

Mainframes are a part of computing history that we don’t often think about in our modern world of mobile, social and cloud computing. But mainframes are still very much around and we owe a lot to this foundational technology and some valuable lessons can still be learnt from the disciplines that mainframes instilled in those who use(d) them.

The mainframe today is very much part of the overall computing landscape as demonstrated by 2nd quarter results posted by IBM, where mainframe sales have increased by 11 percent compared to a drop of 3.3 percent of revenues [1]. So mainframes are still going strong.

But what can we learn from mainframes that might be relevant today? Mainframes were extremely expensive, technically complex and you had to work with very limited computing resources. By way of example the first mainframe I worked on was an IBM 4331 with 8Mb of memory, and this merrily ran the computing requirements for company, costing several million rand at the time. But with such limited resources, a lot of attention was paid to optimisation and effective use of resources.

As we architect our computer solutions, we can meet computing demand by scaling up (larger servers) or scaling out (many servers). It has been popular to use the scale out approach as hardware is inexpensive and it is easier to just allocate more. And that is true. To provision another sever is quick and the actual run cost of one server is fairly insignificant.

Over time, this thinking has resulted in a plethora of servers and systems because, after all you just fire up another one. Then we got to realise that theses server farms were cumbersome and we borrowed some technology from the mainframes in the form of virtualisation ( a technology implemented on mainframes in the 1990’s). This has been a fairly large focus in recent years with vendors expounding the value of virtualization and we all chased the cost savings and management benefits. But we did not fundamentally challenge our consumptive behaviour. All we did was apply a technical solution to the problem to make it a bit more efficient.

My early computer career was as a systems programmer where one of your most important functions was to performance tune and troubleshoot the system. Glutinous consumption of resources by users or programmers was seriously frowned upon. A successful application was one which fulfilled functional requirements AND performed efficiently whilst consuming as few resources as possible. This is what I think we lack in today’s computing environment. We just buy more, and yes the individual cost is not so high, but put all the systems together, with all the people required to run and manage them and you end up with the large IT budgets of today.

This consumptive approach is rampant in the industry today. Buy any server based application and more often than not, it requires it’s own server. Ask the vendor if you can install it on an existing server with a bunch of other things and you will generally get the reaction of “er…no, we don’t have any customers who do that, it is meant to run on it’s own.”

There are so many times that I have seen significant savings achieved by just having one person look closely at usage patterns and drive optimisation efforts. Business demands faster delivery which drives us towards completing the current project and moving on to the next. No time to optimise. Accept the bloat and move on.

Virtualization allows us to get the best usage of the hardware, but it does not really reduce the number of people required to manage all these servers. We also don’t really know how systems interact with each other, so they are treated like black boxes and we don’t challenge vendors to make their products coexist with each other.

Being an old mainframe minimalist, I think we should challenge ourselves and go back to the disciplines optimisation and treat resources as scare. But it does take time and effort. My experience has been that CPU savings can be achieved in relatively short time frames (days to weeks) with dedicated focus and developers who are prepared to revisit their code. Disk space is another story though. Think how hard it is to clean up the hard drive of your CP/laptop – it takes hours and hours to work out what to keep and what to safely delete. Scale that problem to a corporate computer system with hundreds of owners of data and it becomes very challenging. To get meaningful savings you are probably looking at many months to realise savings.

The trick is to start, don’t boil the ocean but look a little closer and develop a culture of lean computing. If you want to extend the life of expensive computer resources, treat them as such and you are guaranteed to find wastage which you did not know about, and money you can be saving.

Posted in optimisation | Tagged , , , | Leave a comment

Really? Call that a Disaster?

The main computer system is down and has been for the last four hours. Business is at a total standstill and the tech support team are sure they will get the problem sorted out soon, which what they told you for the last 3 hours. So, is this the time for declaring a disaster and recovering to that other computer centre they were talking about last month in the workshop? Really? Are you sure you have reached a point where the very existence of your business is at stake if the systems are not online in the next hour, or do you merely have an operational outage of your computer system that is just very painful and costly?

This is an aspect of IT Disaster Recovery Planning that is often confusing – when do you really have a disaster on your hands and when is it just that extended outage that will get fixed, given enough time?

Context is king, so a safe answer would be, it depends. What is the business? What are the business operations affected by the outage? What are the realistic expectations of recovery in the next few hours? These questions will obviously guide one’s thinking. But there is a big difference to the calm and considered answers to these and other questions given in a simulation workshop versus the reality of key systems being down and there being real pressure to invoke a disaster recovery plan.

At this point I must make one clarification. Declaring a disaster usually means invoking the plans and activities to recover the computer systems and does not necessary mean you actually switch to the recovered systems. For the purposes of this discussion I mean a disaster declaration to INCLUDE the switch of processing activities to the alternate site.

Some businesses will have well developed DR plans which have been well rehearsed, each person essentially ‘knows the drill’ and they get on with their specific role. Criteria for declaring a disaster have been carefully thought through and are well documented. All roles and procedures are well known and the DR teams function like a well-oiled machine. The technical solutions have all been setup and tested countless times and all recovery procedures well documented. Sound like your organisation? No? Don’t be too surprised, the ideal situation described above probably exists only in the power point slides of the consulting firms wanting to sell you BCP and DR services.

The reality for many organisations is more likely to be found on a continuum from having nothing, to having a fairly well developed solution. You would have noticed that technology did not feature in the previous discussion, so it is not whether you are using flash copy incremental backups or real time synchronous replication of data to a standby system. It is about the maturity of the whole solution, people, process and technology. The easy part is the technology. Understanding your business and thinking critically about what is truly a life threatening event to it’s survival is far more difficult.

The Disaster Recovery Journal defines a disaster as:

A sudden, unplanned catastrophic event causing unacceptable damage or loss. 1) An event that compromises an organization’s ability to provide critical functions, processes, or services for some unacceptable period of time”

Whilst the definition is correct it leaves much to interpretation and does not really help one to distinguish when you truly have a disaster on your hands. I would argue that the difference between a disaster and an operational outage will be plainly obvious in the event of a true disaster. Simply put, you will absolutely know when you are dealing with a true disaster. There will be no doubt about a recovery in the near future, such as a fire that destroys the computer room. The decision to declare a disaster is obvious to anyone. In this scenario, you have no choice and anything is going to be better than the burnt out husk of your computer room.

If that is NOT the case, tread with caution and consider the maturity of your IT DR capability. Invoking a DR plan is not dissimilar to abandoning ship and getting into the lifeboat. So you need to know how good that lifeboat is in the first place. If you are in the fortunate position of the well-oiled DR capability described earlier, then it is an easier decision as you know what is and is not achievable. If you are not really that sure of your DR capability, should you be committing your entire business to it? You need to know and be confident about your DR capability otherwise you are more likely to create the disaster you were trying to avoid in the first place.

Whilst my focus has been largely on people and process thus far, knowledge of your IT systems becomes a critical factor in the decision making process. To determine the Recovery Point Objective you will need to assess when the event occurred and how that affected the normal day’s processing schedule. So an intimate knowledge of the processing cycles, cut off times, data transmission schedules, business activity peaks for the given day, week and month and data interchanges between systems are just some of the information you will need to have at your fingertips. And this is not going to be the time when you can phone-a-friend, it needs to be on hand and up to date. If it is not, then you are likely to start guessing the answers, and believe me, this is not the time to be guessing. If you don’t have the answers or are not confident of the answers then you got to ask yourself, are you feeling lucky? Not the best criteria for committing your organisation to the proverbial lifeboat.

The computer systems at your DR site most likely are not an exact copy of your primary site. There will be numerous little differences that only become apparent when you start trying to recover your IT systems and try process the daily workload. The best evidence of this is the post-test reports of the last DR exercise. They typically have lots of little niggles that were identified with a similarly long list of corrections and updates to procedures to make the next test more reliable. To make matters worse, you will typically find that your DR test scope is quite limited and will not reflect the full might and complexity of the entire organisations IT processing needs. So, I ask you again, are you feeling lucky?

Investing in IT DR solutions is sometimes just seen as insurance, so committing scarce financial and people resources to deal with a possible future crisis is often hard to justify. Once the need is recognised and you start to make that investment, the next question is how to make the best use of the servers and systems set aside for disaster recovery. Having standby systems which are not used is a very expensive luxury few can afford.

One of the side effects of having invested in DR facilities is that it becomes very tempting to actually use them, after all you have spent good time and money on them. The next time the systems go down, it is not unreasonable to ask whether one should be invoking the DR plan and using these very systems you have been paying for. But, it is and always will be a lifeboat. If you are not absolutely confident in your DR capability and have fact based knowledge of how your systems will operate from that DR centre, you should be very cautious about using your DR facilities to address your operational outage. Rather get past the current problem and then invest in hardening the current solutions to reduce the risk and impact of downtime and leave DR to the real disasters.

Hopefully I have made you a little unsettled or even better, quite nervous. A good DR capability comes with regular practice, and developing the knowledge about your IT systems so that you can be confident about making the decision to invoke the DR plans. Ideally, there should be as little guesswork involved as possible, so to build the knowledge you need to focus on the people, the processes and less so on the technology.

Posted in Business Continuity, Disaster Recovery | Tagged | Leave a comment

Transition Time

I bet you hate traffic. Especially that type of traffic that seems just to stand still, barely moving, grinding at your patience as you think of what else you could be doing with your time. Sadly, for all those who commute to work (which is the vast majority of us), traffic is a reality we have to deal with. Of course, we do hear about Flexible Working Arrangements which include things like flexitime and working from home, otherwise called Telework.

And of course it sounds wonderful, no traffic to contend with, you recover all that lost time you normally spend commuting and when the weather is awful and it’s raining, well you don’t have to go anywhere as your office is in your home. The benefits of Telework are numerous and require a blog post of its own to do them justice.

So what advantage does the poor commuter have over the that ever so fortunate person who works from home? Well, it’s not social interaction, unless you count the pleasant exchange of obscenities with taxi drivers as they cut in front of you. And it’s probably not the productive use of your time where you catch up emails (unless you are the passenger).
It is actually something that occurs without you realising it, but is a vital part of your ability to cope with life. It is called transition time.

We typically operate in different roles, so for the ladies these might be ‘Mom’, ‘Wife’ at home and then at work ‘Project Manager’.  We create  boundaries  like ‘work’ and ‘home’  to help us order our world into social domains which make sense to us and have relevance. And then we  enact these different roles which are appropriate to the domain. Kissing the boss good morning might get you in trouble just as much as shaking your wife’s hand would when you say goodbye to go to work.

Changing from one role to another requires you to disengage psychologically and physically from one role and re-engage in another role. This process is known as transition.  Your daily commute to and from work plays an important transition ritual which allows you to adjust from being the ‘Project Manager’ back to being ‘Mom’ (most kids would argue they don’t see the difference but that is another subject altogether).

Some years ago I worked very close to home and it took me all of 3 minutes to drive from work to my house. Nice as it was, I had great difficulty adjusting and often found myself still very absorbed in work issues when I got home and it took a while to switch off. The problem was that the transition time from Work to Home was too short and I had not properly disengaged from my work role.

Teleworkers suffer from the same problem. The boundaries between Work and Home become completely blurred and one of the real drawbacks of this form of work, is that work time encroaches into family time. It also becomes very difficult to switch off from work because it is always there and always accessible.  So Teleworkers need to develop transition rituals to help preserve the boundaries between work time and family time. Sometimes this is having a separate office area where you go to work or it may be a specific routine you follow before starting work.

Some people are able to make many micro-transitions where they switch between roles multiple times. This just sounds like plain old-fashioned multi-tasking but true role transition is much deeper than that. It is a true ‘switch off’ from the work mode to engaging fully psychologically, emotionally and physically to the new role.

So next time you are in your car going home, put on some good music and leave the cares of the office behind. Spare a thought for those poor Teleworkers who never switch off  and cannot enjoy a colourful social exchange with their local taxi driver.

For more on Boundary Theory see

Ashforth, B. E., Kreiner, G. E., & Fugate, M. (2000). All in a day’s work: Boundaries and micro role transitions. Academy of Management Review, 25 (3), 472-491

Aside | Posted on by | Tagged , , , | Leave a comment

Does organizational culture matter?

To most of us we are vaguely aware of organizational culture, you might have heard it mentioned by the CEO in his annual speech. But it actually is something which affects us in more ways than one, and for the IT professional, is something you should be thinking about carefully for your next IT project.

So what is organizational culture? Organizational culture forms part of organizational theory and has numerous definitions, but for the sake of this discussion we can define organizational culture as :

“The set of shared values, beliefs, norms and morals which influence how employees think, feel and behave”

As such, it creates a sense of identity, increases commitment, reinforces values and shapes behaviour. It is also is connected to and touches many other issues as is shown in the following concept map.

Org culture slide

Follow the green arrows in the diagram, as we take a closer look at how Organizational Culture affects your project. So Organizational Culture sets the norms and values of the organisation and thus  shapes what is considered acceptable behaviour. The norms of employees and managers are the habits amd ways of working that develop over time within the organization. When a new IT system is implemented, we typically systemise a lot of the work (that after all is the aim, i.e. to reduce manual work) and will most often follow best practice. However, in doing so, we also systemise the knowledge of the employees and managers to a greater or lesser degree.  We also change the work processes and by implication we can end up changing some of those work norms either intentionally or unintentionally.

In essence, our new IT system has had an effect on the culture of the organisation. Well, so what? Where that can start to become significant, is in the potential for resistance to the new system. Part of the problem with systemising knowledge is that it can lead users to have feelings of being disempowered and robbed of their perceived ability to contribute. Often the option to make discretionary decisions are purposely removed by computerised processes and this can create huge amounts of frustration for the user.

The second problem is that a lot of the tacit knowledge of employees is not captured or computerised in the new system. So there will be times where the user will know how to handle a particular situation and the ‘computer system’ will not give them the ability to act. A classic case of ‘the computer says no’ (see the comedy sketches of Little Britain).

So now your new system has affected the organisational culture in several ways, it has changed the work patterns, affected how employees feel through increased uncertainty, removed discretionary decision making and probably played right into the hands of organisational politics. If the effects are felt widely enough, you may well find that the users grumbles turn into a full blown organisational resistance to the new system.

Back to the origional question, does organisational culture matter? I would argue yes. Firstly be aware of it, and then you need to assess the organizational culture into which you will introduce your system. Try to anticipate  how you are going to affect the organizational culture and develop an plan on how you will address the challenges that arise though a strong change management program.

Change management is not about ice breakers and cupcakes, it is about engaging with issues such as organizational culture and working with the people aspects of the change we so readily detonate in the workplace.

Posted in Organisational Culture | Tagged , , , , | Leave a comment