MVSFORUMS.com

Himesh · Posted: Tue Jan 21, 2003 1:09 am Post subject: Why Mainframes rarely crash?

Why Mainframes rarely crash?

An excerpt from the same article.

Grant · Posted: Tue Jan 21, 2003 5:11 pm Post subject:

That's very interesting Himesh.
Our overnight incident report is full of server crashes/freezes/reboots , but the mainframe just keeps on keeping on....

zatlas · Beginner Joined: 17 Dec 2002 Posts: 43 Topics: 4

Hi
Like the old VolksWagen after 40 years of production of virtually same model, all the bugs were kinked out. On the other hand, the thing was designed as an 18 wheeler from the start. It is not a desktop that was put on its side and recoined as a server (hmmm... builtin scalability, what a concept)
ZA

DaveyC · Posted: Tue Jan 28, 2003 10:07 am Post subject:

It didn't always used to be that way. I remember the dreaded MVS/XA spin loop like it was only yesterday. ESA with PAF and auto ACR took care of that nasty. The old machines would crash all the time because of environmental problems like a chiller unit failure (pop goes the TCM). Those environmentals cost a fortune to run, which is probably why companies wanted a cheaper alternative. Thank god for CMOS.

I used to be an MVS operator back in the 80's. If I had a dollar for each time we lost the machine I could retire.
_________________
Dave Crayford

Himesh · Posted: Thu Jan 30, 2003 4:26 am Post subject:

Dave,

What you said has, taken me by surprise.
I would highly appreciate it if you could explain just a little bit more about those problems that you talked about (especially the "spin loop").

regards,
Himesh

DaveyC · Posted: Thu Jan 30, 2003 6:52 am Post subject:

Spin loops are caused by a CPU locking a resource required by another CPU, the waiting CPU goes into a spin loop until the resource is available. When the threshold is exceed CPU recovery is required. Spin loops are still quite common, but now the CPU recovery is automated. Back then a WTOR was issued to the operators. If they were quick and switched on they could answer the message and invoke CPU recovery, they had about 120 seconds to answer. Most of the time the Operators were watching TV and the machine crashed.

As for environmentals, before CMOS MVS mainframes were water cooled. We had 3 chiller units, 2 active and 1 spare. If you had a chiller failure the spare was supposed to kick in, but they were not reliable and often needed manual intervention. By the time you got to the chiller units the machine had crashed due to a thermal trip.

IBM hardware was also prone to nasty failures. We had 18 3390/3 HDA crashes in a month due to a chemical corruption in the manufacturing plant in Germany... EMC was the choice after that fiasco.

However, the last 10 years has seen mainframes deliver the 5 nines mentioned in that article.
_________________
Dave Crayford

Himesh · Posted: Thu Jan 30, 2003 7:06 am Post subject:

Dave,

Thanks for your valuable input.
It must have been quite interesting to have worked on the mainframes "of the past".

regards,
Himesh