High Performance Computing (HPC) – Computerphile

High Performance Computing (HPC) – Computerphile

This this two-factor authentication; so
basically to get in I need my card and then I need a PIN and
this is a scrambler pad so basically every time that you look at that the
numbers are in a different order. This is the High Performance Computing facility for the university of Nottingham. SEAN>>What do you use it for? All sorts of things it’s
basically to do with the high compute research so for example students and
researchers will use this for doing calculations based on things like fluid
dynamics, aerospace, genomics… All sorts of things anything which requires
– astronomy – that’s what anything that requires a large amount of compute. SEAN>>And you’ve got earplugs in today for obvious reasons Yes it’s a litte bit noisy in
here yes yeah… SEAN>>So we will do some talking outside (LINK IN DESCRIPTION) but can you show us a bit of it before we go outside? Certainly yes yes the main HPC facility which we call
Minerva is… …and then we’ve got some extensions in, on the racks on here. SEAN>>All of these’s blinking lights what’s going on is this data activity.. …or processing what’s going on there? Both! The actual lights that you can see there, the brighter ones, those are actually the storage, the disk storage the
actual compute nodes don’t actually blink very much. The ones at the bottom
there that’s the network activity. We do shut it down for maintenance once a year for a day or so this at the moment is the third
generation of HPC – the first one… …which was installed about eleven years ago and then we regularly refresh this. SEAN>>So this one’s been going for how long or how long is this been? This one’s been going for about four years. SEAN>>Okay and then I hear rumors of a
new one on the horizon? Yes we’re in the procurement at the moment to put a replacement in SEAN>>and will that mean this gets ripped out and the whole new one just gets put in? Good question, we would like to utilize as much as possible because although it is
old you know there is still life left in it and we do try to – “sweat the
assets” as they say but certainly some of this will be replaced. SEAN>>What’s it running, would we recognise any of the operating system or any of that? It’s, yes the; most of the Nodes are running a version of Linux and the the storage is fairly standard but above that we use PBS as our main
scheduler. SEAN>>How many people might be using this at one time? At any one time they’re probably running hundreds of jobs SEAN>>Do they run for a long time? Might they be running years? How does it work? We wouldn’t have jobs that are run for years but certainly we could have jobs which are running for months. Most of the jobs -you
know- we’re probably only running for days SEAN>>Okay and so when you look at a system like this can you put a figure on how much it costs? Capital cost for a system like this
we’re probably talking in terms of about one and a half to two million pounds ($2.1m – $2.8m) The ongoing costs – We have about 250 kilowatts of air conditioning. When we run this flat out – this particular block here running flat out pulls about 70 kilowatts of power and you’re drawing that all the time so to
run this whole facility you’re talking about thousands of pounds just purely in
power costs and then of course they’re all the ongoing licensing and the support for that… So it’s not insignificant. SEAN>>So that’s a lot of power is there a big red switch somewhere someone has to pull to turn it on? Yes there is – and no I’m not going to press it for you SEAN>>So its obviously a lot of equipment and looks like it might be
quite complicated does it ever go horribly wrong? Does it ever have big problems? Generally speaking it is pretty reliable. Individual nodes will fail. Individual disks will fail but generally speaking the equipment itself is relatively… …modern computer equipment is inherently reliable – we probably have more problems with the air conditioning
than we do with the actual compute itself. SEAN>>So the other thing I was thinking about when when you look at this it’s is this totally bespoke or is it’s like a
template or how does it work in terms of how do you buy one of these – How would
you go and buy a high-performance computer? That’s the $64,000 question
basically you have to start to think “What do we need it for?” because there is
no one generic high performance compute job. Different departments, different
research, different requirements have different computing requirements. Some are very very high performance computing you know it’s a lot of number crunching – others
it’s about manipulating data so there’s a lot of data movement. Other things it’s about
visualization. So you the first thing you’ve got to do is to say right “What is our mix of jobs?” because the way which you set it up for high analytics
is a different hardware set to what you set up for vizualization and things like
that. So that’s the first thing you’ve got to do. You’ve then basically got to say okay these are the jobs that we want to run. Once you’ve actually got that you
then go up with a supplier to say right this is what we want to do, this is how
much money we’ve got to spend. What can you give us? Although this is fairly old now, you know there is still quite a lot of life left in here okay it’s not cutting edge – but it’ll still do a lot of the jobs because a lot
of the jobs are purely about number crunching. This is perfect for that so
basically we will put the new one in – We will try and keep as much as we can of
the old one so that that we “sweat our assets ” and that also means that we’ve
got additionally capacity for our researchers to use as well
and then basically we will then go for a gradual replacement so as new processors come online and as new research projects come you know the balance of the jobs
will change so that means we may have to strip out a particular type of node
replace it with a different type of node but you know so that will be far more
organic in the future we’re not expecting in the future to do a
complete rip and shred. Unless something comes up and oh you know we build a new data center – but that’s not on the cards at the moment. The equipment itself is fairly generic, you know, these are standard blade enclosures. The storage is standard
storage – We have about two hundred and forty terabytes in this block here – it’s all
connected up by InfiniBand SEAN>>Is InfiniBand a speed of network? It’s a standard – This is a 40 gigabit InfiniBand gigabit SEAN>>So at home you might have Gigabit – this is 40 of those? Yes, 40 Gigabit yes – and also of course it’s also multi path as well so.. …because you know there’s no point
in doing a lot of calculations if you can’t then get the result of those
calculations off. There’re effectively two types of jobs. There are parallel jobs where you’ve got a job running on multiple nodes and then you’ve got
single node jobs where basically it’s all running on one node. So again, with the parallel jobs you need network connectivity to make sure you’re not processing the same bit twice. SEAN>>So for a researcher or someone who’s a part of a project what’s the big benefit
of doing this rather than letting their office computer do it? Is it the speed of
compute? It the fact that they can set it off and come back another day or,
what’s the main benefit? Yes it’s the capacity. Because basically the job will start to run it will then continue to run and then so for example Christmas is a very very busy time for
us because a lot of researchers will start a job going then come back after
Christmas and pick up the data As I say, you you could do these things at home, it’s just that it would take you months or years to do what this can do in days or hours. SEAN>>Are they ‘hot’ swappable then? Yes they are SEAN>>(Joking) Come on then, let’s pull one out… No! They’re all single-phase power but because the phase on this rack is different to the phase on this rack
there is the possibility of having a potential difference of more than 400
volts across the two racks. It’s unlikely because each of the… but from a “health and safety” point… and it’s exactly the same why you’ll see a lot of these have got laser [warning stickers] because we use laser optics SEAN>>For your networking? Er, yes the fibre… SEAN>>And what is that, the aircon? Nope, that is the fire suppression SEAN>>Oh let’s go of a look at that then The fire suppression system that we have in here is it’s an IG55 system which is an inert gas. It’s 50% Argon, 50% Nitrogen basically if there is a fire in here all
of the gas in there is released in one go that replaces about half the atmosphere in here which takes the oxygen level down to a point where it doesn’t support combustion. It is just about breathable but you wouldn’t want to run a marathon in it you know it’s like
trying to run at the top of Mount Everest. SEAN>>So it suppresses the fire without damaging the kit? Yes. The gas is released through these nozzles here. SEAN>>They look like sprinklers but they’re actually gas… Gas nozzles, yes. SEAN>>and how does it work with the cooling? Is it go in hot one side and out
cold the other? This is – Yes basically we use aisle containment so this is the cold aisle when we put cold air in it then goes through the equipment
we’d expect to see a delta T in terms of 20-odd degrees – and on the other side basically it gets vented through… SEAN>>So through that glass is going to be 20 degrees warmer? Can we go in? yeah OK I think I’d like to spend my time on this side… If you come down here you
can definitely feel the temperature difference. So these are compute nodes. SEAN>>…and how many computers are in each one of those blocks then? Each one of here so in this particular one you’ve got 1 2 3 4… …8 individual blades in this blade enclosure here. You asked about the big red button? That’s the big red button SEAN>>That would turn it off and on? No, that would turn it off. SEAN>>Ah that’s like a “Danger danger!” –
press that? Basically if I press that then everything will die immediately SEAN>>let’s stay away from the big red button then… But that is the big red button, yes…. Assuming that they are separate parts of the CPU if we look back at our instructions here we execute
instruction 1 it uses the load/store unit.. complicated. The point is what we’re doing is by multiplying G by various numbers or adding it to itself – this point addition –
we’re moving around this curve sort of seemingly at random


  1. Post
    Some Things In Life

    You can make your own high performance computer by connecting a bunch of Pies…Or Playstations or Xboxes.

  2. Post
  3. Post
    John McMillen

    I was thinking, that random keypad to get into the room would be a great feature for smartphones and tablets as it would make it harder to shoulder surf someones password as they would have to be close enough to see what numbers you were actually pressing, not just where they were on the screen.

  4. Post
    Mario Cavicchi

    I would suggest a trick, based on my experience on big servers farm, to decrease dramatically the cost of electricity during the winter … just open the window.

  5. Post
  6. Post
  7. Post
  8. Post
  9. Post
  10. Post
  11. Post
  12. Post
  13. Post
  14. Post
  15. Post
  16. Post
  17. Post
  18. Post
    Mechanical Menace

    Any chance of a vid on HTC? I know to most it's the same thing as HPC but it does solve some very different problems and imho is more interesting.

  19. Post
  20. Post
  21. Post
  22. Post

    I'm wodering if they do mine cryptocurrency if there were spare GPU blades at a given moment. That would probably make sense. But maybe it just doesn't happen.

  23. Post
  24. Post
  25. Post
  26. Post
    Ben Waardenburg

    So interesting. I want to know what model cpu's they are using but I guess those are Linus questions and not computerphile questions.

  27. Post
  28. Post
    Karl Young

    After showing the big machines that go bing, I’m not sure I get the point of standing around yelling about the architecture in a noisy server room, where the dialogue is barely audible.

  29. Post
  30. Post
  31. Post
  32. Post
  33. Post
  34. Post
  35. Post
    Noel Goetowski

    Is it ironic that they have this incredibly powerful, state-of-the-art computing behemoth, and the the sign for the big red emergency killswitch is a piece of paper stuck to the wall with duct tape?

  36. Post
  37. Post
  38. Post
    Andy Brice

    I feel like in cold countries, every building should have a supercomputer, rent out the processing time, and use it to heat the air.

  39. Post
  40. Post

    "This is the high performance computing center".
    "And what do you use it for?"
    "Uh, high performance computing."

  41. Post
    Josh Sisson

    I bet he sneaks into work at the weekends and mines cryptocurrency on it.
    "Hey Chris how did you afford your new McLaren?…"
    "Errm… Won it in a raffle."

  42. Post

    * cyberdyne_systems.exe stopped responding *
    – "Funny, this never happened before."


  43. Post
  44. Post
  45. Post
  46. Post
  47. Post
  48. Post
  49. Post
  50. Post
  51. Post

    Nice computer you got there University of Nottingham! It'd be a shame if some meltdown/spectre were to happen to it….

  52. Post
  53. Post

    I took a tour of a similar facility in my university last year, it was pretty much like this, couldn't hear what anyone was saying, couldn't hear yourself think XD. Ours is pretty cool too because the heat output from all the computers actually gets used to heat two nearby buildings in the winter, to save energy.

  54. Post
  55. Post

    Linux? That is asking for stuff-ups. For starters, Linux is NOT a true Real-Time Operating System. Far from it. It has gaping holes in security. HP-UX is, for example. (But it costs money.) Dreadful. At least they are not using it for any safety-critical applications!

  56. Post

    It's very interesting. But after 4 minutes, I couldn't watch it anymore and just started skipping through to see if the noise would get any better.
    In my opinion, a short introduction to show the hardware would be perfectly fine. But most of the interview should have been conducted in an area where shouting over fans wasn't necessary. In editing you could have superimposed images from the parts of the hardware that were being discussed.

  57. Post
  58. Post
  59. Post
  60. Post
    Jameson Palmer

    It's worth $2.5M and about ready for an upgrade. Oh, I'm jelly. At least we should be seeing 7nm soon™, that'll be the time to switch!

  61. Post
  62. Post

    You wanna talk High Performance Computing!? Mate, I've got a Ryzen 7 – I can run Shadow Of War, on max graphics! Don't think this stupid block of flashing lights can do that…

  63. Post
  64. Post
  65. Post
    Isaac Boates

    do universities like nottingham use idle time to mine cryptocurrencies to help pay for the overhead costs? if not, why not?

  66. Post
  67. Post
  68. Post
    Ilja Sara

    Yay. InfiniBand!
    I have second hand 20Gbit InfiniBand hardware. The connection is between my home server and my desktop. Grabbing videos from RDMA (avoids CPU bottlenecking the speed) capable NFS share from server is like I had the file on my desktop PC. This is important to me since my home server keeps the backups and has the storage, while my desktop has the power but not so much storage. At the moment the link is working close to 10Gbit/s since the InfiniBand card on my desktop is on too slow PCIe slot. Still it's already enough fast for my use since the transfers don't strain CPUs.

  69. Post

    “This is the high Performance Computing facility for the University of Nottingham”
    “What do you use it for”
    “… high performance computing…”

  70. Post

    I'm looking for dr steve bagley's playlist for cpu essentials if anyone has it please post it. Links to all the videos related to the cpu will also work. Thanks.

  71. Post
  72. Post
  73. Post

    I'm surprised it's air conditioned. the air conditioning seems to be using 3.5x the power of the actual compute.
    Rather than chiling the air why not just use more normal air?

  74. Post
  75. Post
  76. Post
  77. Post
  78. Post
  79. Post
  80. Post

    @3:45 70 KW? That's A Lot? HPC Installs with recent nodes can easily pull upwards of 25 KW per RACK, so this Cluster must be pretty tame.

  81. Post
    Sina Madani

    The problem I have with HPC clusters like those based on Sun Grid Engine is that they're inconvenient for evaluating the performance of multi-threaded applications.

  82. Post
  83. Post
    Eidetic Ex

    The computing power in that room is truly remarkable. Several years ago I wrote a program to produce a look up texture that is the falloff of sunlight through an atmosphere, handles causing colors to fall off at just the right amounts depending on view angle, sun intensity and a few parameters fast enough to provide photo realistic sky coloration without actually doing the heavy math that a AAA game would use. I based it upon a guy's work whom ran it through something similar to this but with computing technology of that period. He bragged hard about how it only took a couple minutes to calculate in their data center. I had a better understanding of Direct3D9 HLSL 3.0 performance, further optimized his approach and ran it on a computer that was roughly on par with an Xbox360. It took a couple months to finish producing that lookup texture despite better optimization to hardware that was technically better suited to that form of math. I can only imagine what that system in the video is capable of considering some of our greatest computing enhancements have come in the last several years.

  84. Post
  85. Post
  86. Post
  87. Post
  88. Post
  89. Post
  90. Post
  91. Post
    A Nother

    Wow Chris, you look very different to when I worked in your team! Good stuff. But they let you loose near the HPC? Are they mad? All that money spent when we obviously know the answer is 42. 😉

  92. Post
    Sean Macfoy

    0:22 I am astonished, upset and mostly disappointed that ma boi didn't mention the electronic structure calculations that probably occupy most of their HPC center's processor time

  93. Post

    brilliant. comment on how noisy the place is and talk about going outside to ask the questions, then proceed to conduct the whole interview screaming on the inside.

  94. Post

    You should use 3-phase electricity, cheaper and more abundant, also the fans should not go 100% all the time but have temperature regulated profiles, so you can efficiently save some power, increase the equipment lifespan and the noise will go down

  95. Post
  96. Post

    Why don't supercomputers use liquid cooling ? Is it because of posibility of short-circuit ? I suspect it would significantly lower the power drawn by cooling equipment.

  97. Post
  98. Post

    I have done a lot of my PhD calculations on this Minerva (HPC). Thanks to the University of Nottingham and my sponsor.

  99. Post
    Alpha Delta

    If you want to specialize in high performance computing, know Linux, Assembler, C and C++, and naturally have knowledge of computing systems microarchitecture

  100. Post

Leave a Reply

Your email address will not be published. Required fields are marked *