“Fifty Years of P vs. NP and the Possibility of the Impossible”
Communications of the ACM, January 2022, Vol. 65 No. 1, Pages 76-85
Contributed Articles
By Lance Fortnow
“The P vs. NP problem, and the theory behind it, has not changed dramatically, but the world of computing most certainly has.”
On May 4, 1971, computer scientist/mathematician Steve Cook introduced the P vs. NP problem to the world in his paper, “The Complexity of Theorem Proving Procedures.” More than 50 years later, the world is still trying to solve it. In fact, I addressed the subject 12 years ago in a Communications article, “The Status of the P versus NP Problem.”
The P vs. NP problem, and the theory behind it, has not changed dramatically since that 2009 article, but the world of computing most certainly has. The growth of cloud computing has helped to empower social networks, smartphones, the gig economy, fintech, spatial computing, online education, and, perhaps most importantly, the rise of data science and machine learning. In 2009, the top 10 companies by market cap included a single Big Tech company: Microsoft. As of September 2020, the first seven are Apple, Microsoft, Amazon, Alphabet (Google), Alibaba, Facebook, and Tencent. The number of computer science (CS) graduates in the U.S. more than tripled and does not come close to meeting demand.
Rather than simply revise or update the 2009 survey, I have chosen to view advances in computing, optimization, and machine learning through a P vs. NP lens. I look at how these advances bring us closer to a world in which P = NP, the limitations still presented by P vs. NP, and the new opportunities of study which have been created. In particular, I look at how we are heading toward a world I call “Optiland,” where we can almost miraculously gain many of the advantages of P = NP while avoiding some of the disadvantages, such as breaking cryptography.
As an open mathematical problem, P vs. NP remains one of the most important; it is listed on the Clay Mathematical Institute’s Millennium Problems (the organization offers a million-dollar bounty for the solution). I close the article by describing some new theoretical computer science results that, while not getting us closer to solving the P vs. NP question, show us that thinking about P vs. NP still drives much of the important research in the area.
The P vs. NP Problem
Are there 300 Facebook users who are all friends with each other? How would you go about answering that question? Let’s assume you work at Facebook. You have access to the entire Facebook graph and can see which users are friends. You now need to write an algorithm to find that large clique of friends. You could try all groups of 300, but there are far too many to search them all. You could try something smarter, perhaps starting with small groups and merging them into bigger groups, but nothing you do seems to work. In fact, nobody knows of a significantly faster solution than to try all the groups, but neither do we know that no such solution exists.
This is basically the P vs. NP question. NP represents problems that have solutions you can check efficiently. If I tell you which 300 people might form a clique, you can check relatively quickly that the 44,850 pairs of users are all friends. Clique is an NP problem. P represents problems where you can find those solutions efficiently. We don’t know whether the clique problem is in P. Perhaps, surprisingly, Clique has a property called NP-complete—that is, we can efficiently solve the Clique problem quickly if and only if P = NP. Many other problems have this property, including 3-Coloring (can a map be colored using only three colors so that no two neighboring countries have the same color?), Traveling Salesman (find the shortest route through a list of cities, visiting every city and returning to the starting place), and tens to hundreds of thousands of others.
Formally, P stands for “polynomial time,” the class of problems that one can solve in time bounded by a fixed polynomial in the length of the input. NP stands for “nondeterministic polynomial time,” where one can use a nondeterministic machine that can magically choose the best answer. For the purposes of this survey, it is best to think of P and NP simply as efficiently computable and efficiently checkable.
For those who want a longer informal discussion on the importance of the P vs. NP problem, see the 2009 survey or the popular science book based on that survey. For a more technical introduction, the 1979 book by Michael Garey and David Johnson has held up surprisingly well and remains an invaluable reference for those who need to understand which problems are NP-complete.
About the Author:
Lance Fortnow is a professor and dean of the College of Computing at Illinois Institute of Technology, Chicago, IL, USA.
See also: