PSC and XSEDE Do AI
ECSS Experts Support Users in Deep Learning with Deep Expertise
April 3, 2018
Artificial intelligence (AI) research is tackling increasingly complicated problems, creating a need to converge AI, high performance computing (HPC) and Big Data analytics. PSC has been at the forefront of this transition, offering expert support to its users through the XSEDE Extended Collaborative Support Service (ECSS). At the February 2018 ECSS Symposium, Paola Buitrago, director of PSC’s Artificial Intelligence and Big Data group, and Joel Welling, senior scientific specialist at PSC and an ECSS consultant, reviewed the state of the art of deep learning, arguably the most successful avenue of AI research to date, as well as the deep learning support available to XSEDE users through the ECSS.
AI 101
AI software makes a computer function intelligently in a specific context—a specific task. Early approaches to AI, which concentrated on statistical analysis of sample data, had to address very tightly limited problems to have any chance of success.
Machine learning, Buitrago explained, is a subset of AI developed to overcome those limitations. By making the software first “train” itself on a sample database, a machine-learning AI could better address questions about a much larger set of study data. Deep learning, in turn, is a subset of machine learning that focuses on “neural networks”—networks of processors acting like nerve cells in a brain—figuring out for themselves what details to focus on to get that “right answer.”
One of the chief challenges to progress in AI, she noted, has been to broaden the contexts under which AIs succeed.
“From the beginning, computers have shown that they were very good at doing tasks that were easy to formally define and which can be challenging for humans,” Buitrago said, “but were not so good at doing tasks that were intuitive for humans.”
She gave the example of describing a picture. Any toddler can do it, but the computer struggles because what it “sees” is just a set of pixels. It’s not clear how to derive shapes, let alone objects, from that kind of information.
Deep Learning: Solving the Problem Layer by Layer
Deep learning tackles that problem by creating several “layers” of increasing conceptual complexity. This allows the software to address the broadest features of the image first. For example, an initial layer may focus on shades and colors; a subsequent layer, straight lines; then corners and contours; and finally, parts of objects that can be used to recognize them, such as wheels on a bicycle. Then the task becomes one of weighing probabilities: Given a set of colors, straight lines, curves—two curves, for example, might be wheels—the software can then come up with a “weighting” that gives a good chance the object is a bicycle.
“The big takeaway of why deep learning is popular is that it’s able to create its own representation and change the problem to a different space, where it is easier to answer,” Buitrago said.
Part of the beauty of deep learning is that the AI decides for itself what the layers will do and how they will connect to each other—the probabilities that link each layer to those above and below it. These connections will inevitably start out sub-optimal. But as the AI trains itself on the sample data, good connections strengthen while bad ones fade away. One reason why the term “neural networks” applies to deep learning is that a set of virtual “neurons” in the software learn how to better connect with each other while practicing.
Neural networks got their name, in fact, because they began as an attempt to achieve AI by copying how the brain works. AI neural networks don’t behave exactly like real networks of brain cells do, but they do share that developmental strategy of strengthening connections that work and severing those that don’t.
“The object of the game for optimizing a neural network … is to find the set of weights that gives you the best answer,” Welling said. Of course, that means we need a rigorous definition of what constitutes a “good” answer. “You have to come up with a measure that tells you how good an answer you’re getting.”
One future challenge for deep-learning AIs, Buitrago added, is the question of transparency. In deep learning, the AI determines the layers, and then optimizes their connections to get the best answer. While we can see the effectiveness of the output, it’s not possible in many cases to understand how the AI works. This hasn’t been a problem with experimental applications, but as AIs gain more real-world application—particularly in solving problems in the medical sphere—the consequences of not understanding what’s under the AI’s hood can be serious.
XSEDE Brings Big Data and HPC into the Picture
While deep-learning algorithms have traditionally run on commodity computers and used modest datasets, one of the insights offered by the work to date is that bigger data and bigger computers are needed to scale up the difficulty of the problems AIs can address. This is where XSEDE enters the picture: the organization’s common allocations system, serving a large ecosystem of high performance computers, gives it an ideal position from which to support researchers studying or applying AI. And through ECSS, XSEDE can offer high-powered support for users, from entry level to expert, in choosing the hardware and frameworks best for their specific research.
Hardware decisions can be non-trivial, according to Welling. The availability of GPU processors, which are uniquely effective in the weighted decisions of an AI algorithm, certainly helped supercharge advances in deep learning research over the last decade. But GPUs are not always the best choice.
“Generally, GPUs give you a lot of speedup,” he said. “But only very recently have the various toolsets allowed you to scale out beyond a single node.” Also, because GPUs are still relatively rare compared with CPUs in the HPC ecosystem, CPU allocations are relatively generous compared with GPU allocations. A researcher may get more work done by using a larger number of CPUs for a longer time than by using GPUs.
One of the main goals of an ECSS consultant is to help a user optimize these tradeoffs.
ECSS experts have also helped users by getting major deep-learning frameworks running effectively on XSEDE’s HPC resources. TensorFlow and Caffe, the two most popular, are available for GPU or CPU use on XSEDE resource Bridges at PSC. TensorFlow has the ability to support parallel processing as part of its core design and is enormously flexible—but those strengths come with less user-friendliness than Caffe, which is a set of plug-and-play layers. Again, individual users and projects may do better with one or the other, and ECSS consultants can help determine which is the best fit. Bridges also offers a number of other popular frameworks, including Caffe2, Keras, PyTorch, Torch, Theano and scikit-learn.
As AIs mature and take on more complex tasks, XSEDE platforms such as Bridges offer powerful hardware and software for projects involving AI. Importantly, XSEDE ECSS consultants provide valuable help from formulating approaches to developing models. Additional AI services are also available at PSC for industry.
Watch the ECSS symposium on machine learning.
Learn more about ECSS.
Learn more about Bridges at PSC.