Unleashing the Power of Deep Learning

MIT Associate Professor Vivienne Sze - Credit: Lillie Paquette / MIT

Unleashing the Power of Deep Learning
Communications of the ACM,
BLOG@CACM
By

“To achieve local AI, we need to change how we think about designing both our processing hardware and our deep learning software.”

 

While it is clear that deep learning (a core technology used in AI-enabled applications) can deliver tangible results, the technology’s applications are still being constrained in several different directions. Many of the limitations in terms of accuracy and ability can be addressed in the coming years as programmers and designers refine their algorithms and pile on more training data.

 

There is one constraint, though, that seems fundamental to the nature of deep learning, to the extent that many developers almost accept it as the price of doing business: the sheer computation power required to run it.

Tied to the Cloud

Deep learning involves running hundreds of millions or even billions of compute operations (for example, multiplies and additions). The GPT3 model—the foundation for the wildly successful ChatGPT tool—is reported to have used 175 billion parameters, requiring more than 1023 compute operations to train (which translates to millions of dollars) and the finished product requires clusters of powerful, and expensive, processers to run effectively.

 

While this hunger for processing power is not going to surprise anybody familiar with the technology, the limitations it imposes extend far beyond the need to buy more processors. It also makes it extremely difficult to run deep learning on a portable device—the kind of thing that people are likely to have in their home, bag, or pocket.

Read the Full Article »

About the Author:

Vivienne Sze is Associate Professor of Electrical Engineering and Computer Science at the Massachusetts Institute of Technology (MIT), co-author of the book Efficient Processing of Deep Neural Networks, and lead instructor of the MIT Professional Education course Designing Efficient Deep Learning Systems.