Etymologically, the term *convex* originates from the latin word *convexus*—which translates to *arched*. It is not uncommon for one to wonder **why convexity** seems to accompany the word “optimization” in almost every context? Why are convex functions, sets, and formulations so important? Why is there a *convex relaxation *(such as LASSO) craze sweeping the machine learning community, both in academia and industry?

T. Rockafellar’s Convex Analysis contains over 400 concise pages on the remarkable properties and consequences of convexity. That is to say: **There are too many reasons to count.** Perhaps most-notably, convex functions are unimodal—*any optimum of a convex surface is in fact the global optimum.*

This explains why convexity and optimization come hand-in-hand: Convex functions are more or less a class of functions that are readily optimizable. However, unlike general unimodal (or *quasiconvex*) functions, convex functions are closed under non-negative combination and scaling, inducing the powerful theory of convex duality. Even more, convex functions *always* lie below the interior of the secant connecting two points; *always* lie above *any *tangent plane; and are almost everywhere differentiable, that is, the set of points at which convex functions are non-differentiable is necessarily *infinitesimally* small.

By contrast, non-linear functions can be everywhere non-differentiable and, in the general case, locating only a local minimum (which can be and typically is a highly non-optimal point) is utterly intractable! These justifications of convexity, along with many others, have ignited considerable research activity as practitioners search for convex formulations of what were previously non-convex problems.