You are Home   »   News   »   View Article

Cray - building a supercomputer is getting harder

Tuesday, August 2, 2016

If you want a really efficient supercomputer for seismic processing, you can no longer build one to handle the expansive data sets coming out of the field by just adding more processors, explained Bert Beals from Cray Inc.

For about 20 years, one of the general ideas behind supercomputing was that as faster processors came onto the market, you can purchase them and install them in your computing centre, and immediately take full advantage of the regular cadence of clock upgrades that came with each new generation of microprocessors, says Bert Beals, Global Lead, Energy Industries, Cray Inc.

But that doesn't work any longer, because Dennard Scaling no longer applies, he says.

This means that faster and faster microprocessors are now requiring more and more power to run them, and more and more cooling to take away the heat.

Dennard Scaling has been one of the key theories behind the gradual increasing power of computers over the past few decades. It says that you can add more and more transistors into the same space but with a constant power density (amount of power per unit volume).

In other words as transistors get smaller you could get more processing capacity from the same space without using much more power.

The theory was first written about in a 1974 paper by Robert H. Dennard.

But the law stopped working around 2005 to 2007, mainly due to an increase in current leakages, leading to chips overheating as the transistors got smaller.

This meant that if you want to increase the computing capacity, there will also be an increase in power consumption, even if you can fit more transistors into the same physical volume. This leads to an increase in heat and so cooling required.

This leads companies to think much more carefully about how to put their supercomputers together and has led to microprocessor designs that support many cores on a single chip versus increasing the core clockrate.

Meanwhile Moore's Law, which says that the number of transistors you can fit on a single integrated circuit (or microchip) will double approximately every 2 years, is 'not quite dead yet', Mr. Beals says.

Gordon Moore himself (co-founder of Intel) said in 2015 that he imagines the law 'dying in the next decade or so', according to the Moore's Law Wikipedia entry.

But due to the end of Dennard Scaling, even though you might get more transistors on a microchip, clockrates will not increase much and chip designers will find other uses for all those additional transistors.

This means that companies are starting to think much harder about how to get more efficiency from each core that they already have, rather than being able to depend on greater performance by simply adding new ones,.

If they want more processing power, then they are more likely to want to do it with more physical computers, rather than more densely packed microchips, to spread out the heat generated over a large volume.

This means more demands on interconnects and network architectures to support efficient intercommunication between an ever growing number of compute nodes.

This is reminiscent of the early days of high performance computing for seismic processing in the 1970s and 1980's when oil companies employed teams of people working out how to manage and interconnect many individual computers together in the best way to gain performance efficiencies,.

Massively parallel

This means that supercomputers must support an architecture which is much more 'parallel' - often with several separate machines working on the same computational problem at once.

This requires a different kind of interconnect, memory hierarchy, and input-output strategy versus a serial optimization approach,

'You have to think about the overall systems architecture, combined with software architecture, combined with the people skills, necessary to deal with processing requirements at massive scale.'

'We have to carefully design our system architectures to keep all the cores 'fed'. It is very different from buying 1000 machines on internet and cabling them together yourself with Ethernet switches.'

About Cray

Cray is a supercomputer manufacturer based in Seattle, Washington.

A Wikipedia page of the largest supercomputers in the world shows that 5 of the top 10 are provided by Cray - including in Oak Ridge National Laboratory (US), Los Alamos National Laboratory (US), Swiss National Supercomputing Centre, Höchstleistungsrechenzentrum, (Germany) and
King Abdullah University of Science and Technology (Saudi Arabia).

The company does not manufacture its own general purpose microprocessor, but is a centre for expertise about the best ways that these processors can be utilized most efficiently in a massively parallel infrastructure.


'An integrated supercomputing environment with appropriate software and expertise is a much wiser investment than just trying to buy the lowest dollars per flop machine you can buy.'

The company has a performance engineering organization which provides expertise to Cray's customer and partners to help optimize their applications to take full advantage of the parallelism inherent in Cray's platforms.

Cray has designed a special interconnect between the different computing nodes, which is far more efficient than Ethernet or even Infiniband. The interconnect topology is known as 'dragonfly', with a direct, dynamically routable connections between any node and any other node in the system.

Cray produces its own hardware and software to manage the communications between the nodes and route data. It has its own chassis design and cooling systems.

These systems can be air cooled or liquid cooled - having a liquid cooled computer facilitates much greater power and cooling efficiency for the data centre.

Oil and gas supercomputing

The oil and gas industry has been using high performance computing since the 1970s for seismic processing. That use has continued and grown since that time. For example, it is estimated several large oil companies and seismic processing service companies have seismic processing power that measures in single and double digit petaflops.

'One key example of supercomputing usage within the oil and gas industry is the system employed by PGS to process some of the most complex and largest deep-water surveys ever collected.

The Cray system at PGS, named 'Abel' after the famous Norwegian mathematician Niels Henrik Abel, is 14th on the Nov 2015 'Top 500' list and is the top commercial system on the list. (The list is online at www.top500.org/list/2015/11/)

Seismic processing

Seismic processing algorithms exist that will require supercomputers that are dramatically faster than what is available today, Mr Beals says.

We have requirements from the oil and gas industry which show a need in the next 3-5 years for machines that are 10x what we're running on today. In the next 10-15 years, we're going to need machine capabilities that are 100x what we're running today,' he says.

Current seismic processing methods still must make accommodations for the limitations of current supercomputing technologies.

How will we design and deploy systems to support the explosion of algorithmic complexity and massive increase in the amount and resolution of data needed to accurately, realistically model the earth's subsurface?

To maximise drilling predictability, improve recovery ranges, and optimize production, you need as good an understanding of the subsurface as possible. 'We want to be able to develop a highly accurate, high resolution elastic model of the subsurface of the earth which is as easy to navigate as Google Earth is for the earth's surface,' Mr Beals says.

By having more processing power, geophysicists can do more processing iterations of their data before their deadlines, and the more iterations, the better the quality of the final result.

Seismic shot records are also getting longer. It is not unusual for recording times to be 3 or 4 times longer than shot records in the past. This leads to a tsunami of data to be processed.

One Cray customer has said that current full waveform inversion processing would have taken 'thousands of years' to run with the computers available 'not so long ago', now that time can be reduced to days or weeks.

The requirements for computer processing continue to increase. The industry is looking at 'more and more complex drilling targets, and the techniques and technology necessary to get a clear picture of reservoirs in those complex structures need powerful processing resources to enable exploitation of those areas,' he says.



Associated Companies
» Cray Inc

CREATE A FREE MEMBERSHIP

To attend our free events, receive our newsletter, and receive the free colour Digital Energy Journal.

DIGITAL ENERGY JOURNAL

Latest Edition Aug-Sept 2024
Sep 2024

Download latest and back issues

COMPANIES SUPPORTING ONE OR MORE DIGITAL ENERGY JOURNAL EVENTS INCLUDE

Learn more about supporting Digital Energy Journal