byteLAKE’s CFD Suite (AI-accelerated CFD) — recommended hardware for AI training at the Edge (part 1/3)

--

A few years ago, when we started extensive research at byteLAKE in the space of CFD simulations (Computational Fluid Dynamics), we naturally drifted towards experiments with various different hardware configurations. Our focus has always been on performance though. It is no different today although I think we might expand our research in the future towards other comparably exciting aspects and a broader context of how AI (Artificial Intelligence) can deliver value for industries performing CFD simulations. I have been describing these efforts and results including case studies in a blog post series which can be found here: www.byteLAKE.com/en/AI4CFD-toc. This post, however, is the first one in a blog post miniseries where I will share byteLAKE’s current conclusions and recommendations about edge hardware configuration for the AI Training phase, required for the byteLAKE’s CFD Suite to deliver its predictions to help you significantly reduce time to results in CFD simulations.

This article is the first one (miniseries) where I will describe byteLAKE’s recommended edge, hardware platform for the AI training of the CFD Suite. This will be with a special focus on clients who need something more powerful than a desktop PC but are not yet ready for or do not yet need to migrate to more complex multi-node HPC architectures.

What hardware do I need to efficiently train AI so that it can accelerate CFD simulations in the chemical industry?

For those of you who are not quite familiar with what byteLAKE’s CFD Suite is, let me quickly start with a brief introduction to the product.

What is byteLAKE’s CFD Suite?

It is a Collection of innovative AI Models for Computational Fluid Dynamics (CFD) acceleration. It is a Deep Learning, data-driven solution which is currently available for the chemical industry, reducing mixing simulation time from hours to minutes. While you can read more about the product on byteLAKE’s website at www.byteLAKE.com/en/CFDSuite or find us listed in various independent benchmarks i.e. Discover 5 Top Startups working on Computational Fluid Dynamics (startus-insights.com), CFD Suite works in the following way:

  • first, it needs to be trained. We typically need some number of historic simulations to train the embedded AI models about the physical phenomenon, its parameters, corner cases, etc. The exact number of such historic simulations varies across phenomena but the rule of thumb is that 30–100 of such simulations is enough to be able to address various possible input configurations (speed of mixing, viscosity, pressure, etc.) and geometries;
  • once the AI Training completes, CFD Suite can be used for inference purposes, meaning it can predict the results of the CFD simulations. CFD Suite starts by calling a traditional CFD solver first. CFD Suite analyzes its initial results and when its AI “feels” it is ready to predict, it takes over the simulation and generates its final result (steady-state). It does so within the seconds and including the overhead, CFD Suite has been able to reduce the time of chemical mixing simulations from 4–8 hrs. to 10–20 mins and keep the accuracy of predictions north from 93%.
CFD simulation results vs. byteLAKE’s CFD Suite’s AI predictions

Industry of focus and how we picked a reference hardware

CFD is a very broad topic as these simulations have been successfully used across a large number of industries. For the purpose of this blog post miniseries, I will mainly focus on the chemical industry.

Some of our clients and partners in the space have already acquired powerful hardware to be able to perform AI Training. The majority, however, perform their traditional CFD simulations either on their laptops or various desktop PC configurations. This article is for the latter group. Based on the studies we concluded at byteLAKE, most of our clients in the chemical industry are able to run CFD simulations, AI predictions offered by byteLAKE’s CFD Suite included. However, those who are willing to perform the AI training phase in-house, might require an upgrade to ensure performance and optimize the AI training process itself.

With that in mind, we reached out to our hardware partners for recommendations. We were looking for something along the lines of:

  • a standalone configuration that does not require any server-related specific or advanced infrastructure (racks, server rooms) — something you just plug into a power outlet and is ready to work;
  • edge-type of the device but something definitely stronger than laptops and definitely more flexible than desktop PCs (many CPUs, many GPUs, huge amount of RAM, large and fast storage);
  • ready for some of the most powerful GPUs as it is AI training that we focus on right now. NVIDIA was our preferred vendor as we have had a great experience with almost all of their GPUs and APIs.

Plus, we needed:

  • an optimal performance per value (installed at the edge, edge-sized, edge-priced);
  • to ensure security to keep the data in-house without a need to send anything to the Cloud;
  • to ensure no external data transfer and enable all computing locally;
  • compatibility for a variety of communications ports for connection flexibility;
  • and a modular structure for easy hardware upgrades.

After all, we needed a single hardware option that we could recommend to the clients who need something more than a laptop and are not yet ready or might not need complex server configurations, not to mention the HPC (High-Performance Computing) as such. We did not want to consider Cloud options at that time but we might do so in the future as we expand our offering towards aaS (as-a-Service) options.

To cut a long story short, Lenovo came back with a very interesting offer. At that time they were yet to announce it but we were one of the first ones to learn about its specifications and various configuration options: Lenovo Delivers AI-Enhanced Edge Computing with NVIDIA GPUs | eWEEK. The more we knew about the product, the more we loved it as it simply met all our requirements. Lenovo’s SE450, which is the name of the product, is a very flexible, standalone, edge HPC hardware platform. When you look at Lenovo’s portfolio, it is one of the recommended platforms for AI Inference (execution of the AI-trained algorithms; it’s about shortening the CFD simulations so generating CFD results predicted by AI in the context of byteLAKE’s CFD Suite). We customized its configuration specifically for AI Training purposes. The most important part was to equip the SE450 with 2 powerful GPU cards from NVIDIA: A100 80GB Tensor Core GPU. You can read more about these under the following links:

We picked NVIDIA’s A100 80GB Tensor Core GPU variant to ensure that the cards are capable of handling relatively large mesh sizes of the CFD simulations during the AI training of the CFD Suite. Although we currently focus on simulations of around 5 million cells, we will definitely go north of that number in the nearest future.

„We have significantly changed the architecture of the underlying AI within the CFD Suite. With the mechanisms like dynamic generating of the learning samples, we are now able to fully utilize multiple GPU cards within one node and provide better accuracy. Unlike in the previous versions, where CFD Suite’s AI training performance could only be increased by adding more nodes. Now we can greatly benefit from having more accelerators within a single node.”, said DSc, PhD, CTO at byteLAKE.

byteLAKE’s CFD Suite (AI-accelerated CFD) — recommended hardware for AI training at the Edge

We at byteLAKE picked Lenovo’s SE450 edge HPC server as a recommended edge platform for the AI Training phase of the CFD Suite (AI-accelerated CFD) product. We love its design and flexibility and are convinced our clients in the chemical industry will appreciate how easily it enables the AI Training capabilities at the edge with NVIDIA GPUs — Marcin Rojek, byteLAKE’s Co-Founder.

What’s next?

The answer is quite obvious I guess. Once Lenovo provided the SE450 with 2 NVIDIA A100 80GB Tensor Core GPUs to our team, we immediately started benchmarking it. We did similar tests in the past with NVIDIA V100 Tensor Core GPUs so we were very curious about the performance results we could get with its successor. Also, having done a number of deployments for our clients as well as internal research projects at byteLAKE after we concluded the previous benchmark, we decided to expand the benchmark scope. The goal was to build a recommended by byteLAKE hardware platform for the AI Training phase of CFD Suite with a special focus on clients who want to perform the AI training at the edge hardware platform and are not yet willing or ready to migrate to more complex HPC architectures. Well, by now you kind of know the answer about what the platform is. But let me share more details about the performance analysis we did in part 2 of this blog post miniseries. I will also share more details about how the new version of CFD Suite can leverage more GPUs within a single node configuration. I believe a very much welcome feature for all our clients who prefer such setups.

Continue: part 2.

Download the full report: https://www.bytelake.com/en/download/4018/.

--

--

Co Founder@byteLAKE | Turning Data Into Information for Manufacturing, Automotive, Paper,Chemical,Energy sectors | AI-accelerated CFD | Self-Checkout for Retail