Jeremy Kun

Trying out Medium

2015-02-15T09:31:53-08:00

I have no problems with the Svbtle platform. I have just seen the majority of new writers moving to Medium, and I figure I should give it a fair comparison.

So I published an article there. Check out the profile and the article.

What's in a blackboard? On mathematical authenticity in movies and TV.

2015-01-03T08:00:01-08:00

Dear producers and directors,

For $100 per scene, I will verify the authenticity of all mathematical lines, props, documents, and boardwork used in that scene. In the event that said math is inauthentic, I will suggest an authentic replacement.

Your friendly neighborhood mathematician,

Jeremy Kun

A lack of authenticity can ruin a tense mood and make a group of supposed experts seem like fools. Designers go through great pains to make costumes or a set authentically French or authentically 1920’s, to avoid anachronisms, and to use contemporary idiom. Medical television shows are applauded by my medical school and resident friends for their accurate jargon. But when it comes to anything mathematical, despite it sometimes being crucial to the plot or characters or setting, television shows and movies hardly seem to try to get it right.

It used to be this way with technology, causing the “zoom and enhance” charade that eventually turned into a cheap-shot for comic effect. Now that software developers are billionaires, television suddenly seems to paints a shockingly accurate picture of Silicon Valley software.

Sometimes “zoom and enhance” is a crucial plot device. Fine, I’m not in a position to deny plot devices to writers who don’t have the time or paygrade to come up with new ones. But as time has gone on even technological one-liners and references have gotten more accurate. Take this analysis of the hacking shown in The Girl with the Dragon Tattoo. The fact that what’s shown on the terminal closely resembles a database query (rather than the standard five years earlier of speedily scrolling nonsense or flying binary) is remarkable considering how what I’m about to show you is typical of television and movie mathematics.

My colleague recently snapped this photograph from an episode of the show Resurrection, which in the interest of disclosure I’ll admit I haven’t seen.

The quality is poor, but in the background you can clearly see a large detailed drawing of the unit circle, complete with the usual marked points at the angles of zero, π/6, π/4, etc. This is the same thing you’d expect to see on the whiteboard of a freshman/sophomore high school student’s math class. The scene’s setting, however, is a top secret government lab. The government is ostensibly paying brilliant scientists millions of dollars to solve the world’s hardest problems. And what do they fill their whiteboards with? Math within the reach of a bright fifth grader.

This is hardly an isolated incident. I used to save images I saw of bad TV math but it seemed pointless except to depress me. Countless television shows, movies, and video games flood their blackboards, whiteboards, and walls with silly and obviously irrelevant mathematics. From my perspective they’d have to actually try to fail as badly as they do. If your goal is to pick something mysterious looking (but complete gibberish), then just go to the Wikipedia page for a random area of mathematics, follow a few (random links)[http://en.wikipedia.org/wiki/Riemannian_manifold] until you find some confusing-looking equations, and behold, scribbles and jargon worthy of a blackboard.

Movies whose subject is a mathematician tend to be better (i.e., Good Will Hunting, A Beautiful Mind, or Proof). These often don’t display believable mathematics appropriate to the stated experience of the characters, but in the same way that The Girl With the Dragon Tattoo was good enough, so are these movies. Rather, I’m talking about the hundreds of movies and episodes that incorporate mathematics into the plot, use the word “equation,” or employ a mathematician (or general nerd/scientist).

Some shows do it better than others. Here’s a still from Elementary, during an episode in which a reclusive mathematician is working on a problem that could actually change the world in a big way, and hence he is compelled to conceal his work with invisible ink.

But even Elementary isn’t perfect. The lines and attitudes the characters use to describe the mathematics are often slightly misleading. Take, for example, the title of that episode (Solve for X), which is essentially a caricature indicating something about math happens in the episode. Bill Gasarch gives more substantial examples at his blog while gracefully ignoring minor issues. But even more troubling than any of these to me is that Sherlock Holmes, a character who prides himself above all else in the purity of deductive reasoning, seems to know absolutely nothing about mathematics. He even says as much in the episode. This is totally incongruous with his character; he has enough time to master a martial art he never uses, tend exotic species of bees, or study historical cartography, but he doesn’t seem to have any clue about the different areas of mathematics, something one could get a flimsy grasp on by browsing Wikipedia for an hour. Rather than have him claim total ignorance with, “The maths are beyond me” (which is in reality a manifestation of the writer’s ignorance), he could, in the above scene, gesture to different parts of the wall saying something like this:

I only vaguely understand how it all fits together, but the parts are shockingly different. [Motioning to various parts of the wall] See here he borrows from algebraic geometry, over here is clearly graph theory, and here he’s using what looks like a Galois-type correspondence. A Galois correspondence! If, as it appears, our friend single-handedly found a way to connect these disparate mathematical ideas in analyzing algorithms, then it may very well be the real thing. Mathematical elegance like this does not occur idly. And as such, he may be in more danger than we thought.

Certainly it’s believable that the smartest person in their fictional world is a little bit mathematical. And then, fine, Hollywood, you can follow it up with the standard reply, “English, please?” But the point is it took me all of five minutes to come up with that line. Being a relatively penniless graduate student among many penniless graduate students, I’d take all the review/scripting work I could manage. I know I’m not alone, and I’m certainly not particularly talented at it. For a fraction of what you might pay a full-time consultant, you could get a small army of graduate students doing quality work. So the only reason I can fathom that studios with multi-million dollar budgets can’t get authentic lines and boardwork is that they don’t try.

That’s why I’m officially opening my shop for business. For my modest fee, I will remove any mathematical “GUI interface using visual basic” snafu that sneaks into your script, ensure your top secret scientists aren’t doing high school trigonometry, and even provide you with realistic, substantive mathematical meat for your scenes. Hell, I can even show you the detailed quirks that mathematicians tend to have when, say, giving a talk. Drop me a line at jkun2@uic.edu, and we’ll talk.

Why I hate (and love) visualizations of mathematics

2014-11-20T08:06:31-08:00

I have a love-hate relationship with visualizations of mathematical ideas.

Let’s say I’m trying to learn about a difficult mathematical concept. For this example I’ll use Markov chains because I recently saw a highly-appreciated visualization of Markov chains due to Victor Powell and Lewis Lehe. For now I’ll pretend I’m the typical person who claims to be a visual thinker and the only reason I don’t get math is because nobody is patient enough to explain things in a way I can understand. (Such people are everywhere.)

So I’ve heard the mysterious term Markov chain, and tried to learn about it previously by reading a book. Maybe I want to even write a computer program to “do” a Markov Chain, whatever that means. I go check out the Powell-Lehe visualization and at the end I think “Wow! That was so easy to understand! A Markov chain is just a little diagram with a ball bouncing around, where the ball is represents the state a system, and the thickness of the lines is how likely the ball is to use that line to travel.”

Then I go to whatever forum linked me to the visualization and I say something like “Man, I never really understood Markov chains until now. I had tried to learn them, but my impatient mathematics teachers were so terrible at explaining anything in a way I could understand.” Job well done, all in a day’s work, time to go off and write some programs.

Here’s the problem with this scenario. All I really understand from the visualization is the definition of a Markov chain. In fact, I don’t even understand that all that deeply. The authors of that visualization make a wispy connection between a Markov chain and a matrix (just to “tally” the transition probabilities, they say). But why is a matrix appropriate for that? They claim it’s for efficiency, but as a practiced mathematician I know the answer is much deeper, in fact much closer to the heart of why Markov chains are interesting.

The truth is that the definition of a Markov chain by itself is not at all deep or complicated. That’s part of why the visualization is so effective, because anyone who understands Markov chains could explain what one is to a wiling fifth grader, in five minutes, with just a pencil and paper. I couldn’t explain why they’re interesting to a fifth grader, but visualizations don’t do that either. The true difficulty comes when you actually want to do something with Markov chains. Whether it’s a mathematical analysis or a useful computer program, you need more than a single definition and a picture.

And here’s one place visualization reveal their uglier side. You can’t analyze Markov chains with a visualization. You can use visualizations to get ideas, but you can’t check if those ideas are valid. Markov chains are inherently quantitative but visualizations are qualitative. This is especially true when working with small examples. Because as soon as you turn to any nontrivial large examples visualizations become a useless mess.

Here is what typical visualizations of networks tend to look like:

And identical looking networks can have completely different Markov chain dynamics. So there’s no hope in distinguishing between them just by looking at pictures.

I know what you’re thinking: if I get interested in Markov chains because I saw a neat visualization, then isn’t that all that matters?

Yes and no.

In the hypothetical scenario, I had tried to learn about Markov chains once the “normal” way (by reading a book or taking a class). But the book or teacher didn’t explain it visually for me so I gave up. I just couldn’t wrap my head around it. And now that I am comfortable with the definition of a Markov chain, I need to learn a new concept: how the convergence rate of a Markov chain to a stationary distribution depends on the magnitudes of the eigenvalues of the transition matrix.

WAT

So I look up the animations for what an eigenvalue is, and I look up visualizations of what convergence means, and I look up visualizations of probability theory. Even with all that understanding, it would take a team visualization experts spoon-feeding me for hours to get me to understand why intuitively these particular numbers govern this particular dynamic. (As far as I “know,” matrices are just for the convenience of writing down transition probabilities) And it’s almost guaranteed that the perspectives you’d gain from visualizations of the disparate concepts are incompatible or at least don’t mesh well. Instead I could do things the old fashioned way: write down some small examples, practice the algebra skills I hemorrhaged, and ask questions when I get stuck. In order to gain a deep understanding I need to actively engage the material in a way that visualizations don’t allow. And the ridiculous part is not how inefficient it would be to make visualizations for the mysterious-sounding relationship I described, but that we’re still at the beginning of an introduction to Markov chains!

What I’m saying is that visualization can help, of course it can. But if I’m not willing to put in real work to understand a topic, then I will never get past the visualization. I will just keep complaining that my math teachers were all terrible and that I can’t wrap my head around an idea, when the truth is that I’m being impatient. That’s the number one reason that an otherwise capable person fails to learn math. Maybe if they understood that being confused is the natural state of a mathematician then they’d realize what the rollercoaster of gaining a deep mathematical understanding is actually like. I would argue that this applies to understanding anything, but out of all things people tend to be the least patient with mathematics.

I understand why visualizations are so appealing, I really do. I even make them myself to explain and synthesize ideas. They’re appealing because our eyes tend to glaze over when we see too much mathematical notation in one place. Pictures and animations give us a break from the syntax, and help us connect the general definitions to a simple example.

But notice that nowhere do I suggest (and I argue nobody should suggest) that these pictures replace the notation. You need both and they need to interact with each other. Visualizations and pictures allow you to be specific and vague, whereas more typical mathematical analysis allows you to be precise and general. The two complement each other. So visualizations that try to omit all notation are doing you a disservice, practically ensuring a steeper learning curve when you finally need to translate between syntax and idea. Likewise, mathematics authors that provide no examples and no intuition are also doing you a disservice. Just imagine trying to teach programming where you never show syntax but just draw pictures of the “man inside the machine.” And now imagine you just hand out a list of syntax forms with no connection to an intuitive understanding of their semantics. Both are ridiculous, but the pop-math-visualization crowd are basically demanding the former as a method to become competent in mathematics.

But the real problem, I’d say, is that math literature is too close to the latter kind of author, the kind who is too terse and provides no examples. This is a much subtler issue than whether mathematics is hard, having to do with the culture of mathematics, unwritten expectations of authors, and the limited time and incentive of experts. I don’t have a solution to help someone overcome these barriers. Maybe if we paid math graduates a reasonable fraction of what they could make on Wall Street or at the NSA there would be better resources available. Today any mathematician who blogs or writes a textbook about any topic more advanced than calculus does it as a labor of love; it generally detracts from their career by giving them less time for research, they generally don’t get much (or often, any) income from it, and it takes years to do something substantial. With all of this in mind it’s no wonder they’re so terse. And that’s not even mentioning that any mathematician who wants to devote their lives to teaching guarantees themselves a comparatively puny salary and even less autonomy.

What Microsoft lost when it closed MSR Silicon Valley

2014-09-22T09:00:35-07:00

Since I started thinking about my own job opportunities, I have always heard and considered Microsoft as the best place for research in industry. Other companies are also considered pretty excellent, but Microsoft tends to make the top of the list in terms of who they hire and how they make it easy for great people to do great work.

For example, when Yahoo closed their New York research lab two years ago, Microsoft offered every fired researcher a job, and even opened a new lab in New York so they didn’t have to move! And though I haven’t verified this, from what I’ve heard Microsoft has never (before now) fired a researcher. They vet their candidates and hire people they intend to keep for the long haul. Microsoft puts people in charge of the research labs who understand that the primary goal is to further the state of the art. And they have a strong track record of doing just that.

So shock and awe that Microsoft would close MSR Silicon Valley (and fire dozens of fantastic researchers overnight) is the only reasonable response. They fired almost everyone, offering to retain only a couple of the very highest caliber researchers provided they’re willing to move. But make no mistake, we’re talking about giants among people who are still extremely tall, here, so to kick anyone out is literally to say, “We don’t want Microsoft to be associated with awesome breakthroughs and innovations in computer science.”

I’m familiar with the theory folks, and if I can convince you that theoretical work is important, then certainly the applied researchers are doing similarly impactful work. So let me give a quick, and by the nature of a short article totally underappreciative, overview of the work done by theory folks at MSR.

Let’s start with Leslie Lamport. Since the early 70’s Lamport has had a steady stream of impressive algorithms and impossibility results in distributed systems. His seminal paper on the “bakery algorithm” gave a beautifully simple solution to the “semaphore problem” of multiple processors corrupting shared memory, which improved over prior solutions by adding fault tolerance and priority access. Things only get better from there. Lamport invented consensus protocols and the Paxos algorithm, invented Byzantine fault tolerance, logical clocks, and fantastically impossible-sounding ways to maintain global state in a distributed system that doesn’t have global communication capabilities.

Leslie Lamport’s research has changed the way we think about distributed computing many times in his career, and at the ripe age of 73, he’s still going strong. For a better overview of his research than I can give here, see this blog post.

Lamport may seem impressive (and he is!) but revolutionizing computer science is par for the course at Microsoft Research SV. Cynthia Dwork has changed the discussion around privacy, inventing a new way to make statistical information public without compromising the identity of any individual in a database. She has spearheaded the entire subfield of cryptography, known as differential privacy. This is not even to mention all of her other impressive contributions to cryptography, including the first lattice-based cryptosystem paving the way for fully homomorphic encryption, and the ideas that formed the basis of cryptocurrencies. Dwork’s work on privacy is a reaction to contemporary incidents of de-anonymization of medical and internet data, so to imagine that these researchers are living in some abstract world devoid of application is a fantasy. Dwork’s work influences Microsoft’s products (and everyone else’s) by changing the way we think about privacy.

And then there are people like Omer Reingold, who furthered the study of space-efficient algorithms by solving a long-standing open problem about undirected graph connectivity. He also changed the way we think about randomness through his work on expander graphs, which has permeated science all the way to discussions about the physics and philosophy of consciousness. As if that weren’t enough, Reingold has made countless other contributions to cryptography and fault tolerance, and he’s just getting started! With so much impressive work under his belt, this is what he had to say about his colleagues at MSR:

In a place with no borders between research areas, I was free to follow my intellectual curiosity with colleagues I wouldn’t normally have the great fortune of working with. My non-theory colleagues have left me a much more complete computer scientist than I ever been. My theory colleagues left me in absolute awe! Being surrounded by the creativity and brilliance of a unique collection of young scientists was such a rush. I am confident that they will make many departments and research groups much better in the following months and years. My only regret is every free minute I didn’t spend learning from these wonderful colleagues and friends.

His blog post containing that quote is showered with comments by powerhouses of research all lamenting the loss of the hub of innovation and science.

There are just too many accomplished people to do justice to them all, but the point is that Microsoft stands to lose more than just world-class researchers. They stand to lose (and to some degree already lost) their image among academics. Serious academic work needs stability, time, and collaboration, and can’t happen with only two out of the three. Important researchers are already questioning the security of industry lab jobs.

And this seems to be a trend across industry: big company gets a new CEO and downsizes research. The interesting thing is that Microsoft could have maintained face and stability by giving researchers a year’s notice, enough time to secure academic positions (which are notoriously slow to materialize). Was the cost of keeping a relatively small research lab for one year really worth the dent to their image?

If the trend continues and industry research is no longer an obvious goal for young researchers, and when the few good university slots fill up, where will they go? And by they I mean we, because I will soon be forced to decide for myself. Will we work abroad (I’m certainly considering that option), and weaken the country’s relative level of innovation? Politicians don’t seem to like that, but they won’t increase science funding to compensate. Will we switch careers, go into finance and potentially cause another economic meltdown? Or waste our collective skill and talent designing iPhone games or CRUD apps? Or worst of all go work for the NSA and undermine modern security?

I can think of a few worthwhile endeavors specific to my interests (read: not evil or selfish, and also with a reasonable salary), but I know a lot of PhD students don’t have such clear backups they can be enthusiastic about. They would be much more likely to fall for the salary of Goldman Sachs or the stability of the NSA instead of navigating the new frontiers of computer science. Skeptics might say that’s just how the market works, but chances are that skeptic’s job wouldn’t exist if it weren’t for the pioneers of computer science. Innovation breeds better opportunities for everyone. Behind Snapchat and Google are Fourier analysis and distributed networks. Behind Amazon is a century of combinatorial optimization. Behind every banking and finance firm is a mountain of cryptography keeping their secrets and clients safe.

Theory informs practice by mapping out what can and can’t be done, and industry labs have some of the best track records of producing great research. So when Microsoft sends a message implying they don’t care, they risk deterring talent, but we all risk losing the fruits of the work great researchers might have done at a hub like MSR Silicon Valley.

Why don't mathematicians write great code?

2014-08-25T06:00:23-07:00

In the discussion surrounding a series of recent articles on the question of how mathematics relates to programming (one of my favorite navel-gazing topics), the following question was raised multiple times

If mathematics is so closely related to programming, why don’t professional (research) mathematicians produce great code?

The answer is quite a simple one: they have no incentive to.

It’s pretty ridiculous to claim that a mathematician, someone who typically lives and breathes abstractions, could not learn to write well-organized and thoughtful programs. To give a simple example, I once showed my advisor a little bit about the HTML/CSS logical flow/style separation paradigm for webpages, and he found it extremely natural and elegant. And the next thing he said was along the lines of, “Of course, I would have no time to really learn and practice this stuff.” (And he says this as a relatively experienced programmer)

That’s the attitude of most researchers. Most programming tools are cool and would be good to have expertise in, but it’s not worth the investment. Mostly that comes off as, “this is a waste of time,” but what’s keeping them from writing great code is their career.

Mathematics and theoretical computer science researchers (and many other researchers) are rewarded for one thing: publications. There is no structure in place to reward building great software, and theoretical computer scientists in particular are very aware of this. There have even been some informal proposals to change that, because everyone understands how valuable good software libraries are to progress in our fields.

But as it currently stands, the incentives for mathematicians reward one thing and one thing only: publishing influential papers. There are very small emphasis given to things like teaching, software, or administrative duties. But the problem is that they don’t replace publications. So spending work time on things that are not publications takes away from time that could be spend on papers. Everyone understands this about the job market. Say you have two candidates of equally good work, but the first candidate has one more top-tier paper and the second has contributed an equal amount of work to open source software. Though I have never seen this happen first hand, every career panel I have posed this question to has agreed the first candidate would be chosen with high probability.

So when mathematicians or theoretical computer scientists do write code, they have an incentive to get it working as quickly and cheaply as possible. They need the results for their paper and, as long as it’s correct, all filthy hacks are fair game. This is most clearly illustrated by the relationship between mathematicians and their primary paper-writing tool, the typesetting language TeX. All mathematicians are proficient with it, but almost no mathematicians actually learn TeX. Despite everyone knowing that TeX is a true programming language (it has a compiler and a Turing-complete macro system), everyone prefers to play guess-and-check with the compiler or find a workaround because it’s way faster than determining the root problem.

With this in mind, it’s hard to imagine your average mathematician having a deep enough understanding of a general-purpose language to produce code that software engineers would respect. So something like adequate testing, version control, or documentation is that much more unlikely. Even if they do write programs, most of it is exploratory, discarded once a proof is found achieving the same result. Modern software engineering practices just don’t apply.

For the majority of mathematicians, I claim this is mostly as it should be. Building industry-strength tools is not the core purpose of academic research, and much of mathematical research is not immediately (or ever) applicable to software. And most large companies who want to utilize bleeding-edge research for practical purposes form research teams. For example, Google does this, and from what I’ve heard many of their researchers spend a lot of time working with engineers to test and deploy new research. At places like Google (and Yahoo, Microsoft, IBM, Toyota), researchers negotiate with their company how their time is split between academic-style paper writing and engineering pursuits, and there are researchers at both extremes.

But even there, where coding is part of the goal, the best industry research teams still hire based on publication history. I can only hypothesize why: a great researcher can be taught programming practices trivially, so a strong research history is more important.

Programming is not math, huh?

2014-07-18T11:41:55-07:00

You’re right, programming isn’t math. But when someone says this, chances are it’s a programmer misunderstanding mathematics.

I often hear the refrain that programmers don’t need to know any math to be proficient and have perfectly respectable careers. And generally I agree. I happen to think that programming only becomes fun when you incorporate mathematical ideas, and I happen to write a blog about the many ways to do that, but that doesn’t stop me from realizing that the vast majority of programmers completely ignore mathematics because they don’t absolutely need it.

So when Sarah Mei argues in her article “Programming is not Math” that math skills should not be considered the only indicator of a would-be programmer’s potential, I wholeheartedly agree. I’ve never heard anyone make that argument, but I’m much younger than she is. Having faith in Mei’s vast life experience, I’ll assume it was this way everywhere when she was writing Fortran in school, and it seems plausible that the attitude lingers at the most prestigious universities today.

But then she goes on to write about mathematics. As much as I respect her experience and viewpoints, her article misses the title’s claim by a long shot. It’s clear to me that it’s because she doesn’t understand the mathematics part of her argument. Here’s the best bit:

Specifically, learning to program is more like learning a new language than it is like doing math problems. And the experience of programming today, in industry, is more about language than it is about math.

This is the core of her misunderstanding: being good at math is not about being good at “doing math problems” (from the context of her article it’s clear that she equates this with computation, e.g. computing Riemann sums). And the experience of programming in your particular corner of industry is not representative of what programming is about. The reality of the mathematics/programming relationship is more like this:

Mathematics is primarily about conjecture, proof, and building theories, not doing slews of computations.
Learning to do mathematics is much more like learning language than learning to program is like learning language.
Large amounts of effort are spent on tedious tasks in industry for no reason other than that we haven’t figured out how to automate them yet. And novel automations of tedious tasks involve interesting mathematics by rule, not exception.
That doesn’t change how crucially reliant every programmer (and every company) is on the mathematical applications to programming that allow them to do their work.

Mathematics is closer to language #

Item 2 is probably why Mei isn’t able to find any research on the similarities between math and programming. There is a ton of research relating mathematics to language learning. For an extended bibliography with a nice narrative, see Keith Devlin’s book The Math Gene.

One big reason that mathematics is much more like language than programming, is that doing mathematics involves resolving ambiguities. In programming you have a compiler/interpreter that just dictates how an ambiguity resolves. But in mathematics, as in real language, you have to resolve them yourself based on context. This happens both in the modeling side of mathematics and in the hard-core theory side. Contrary to the most common internet wisdom, almost no working mathematicians do math from a purely axiomatic standpoint. The potential for ambiguities arises in trying to communicate a proof from one person to another in an elegant and easy-to-understand way. Note the focus on communicating. This is essentially the content of a first course in proofs, which, by the way, is usually titled something like “A transition to advanced mathematics.” The reason that this never shows up when you’re computing Riemann sums is because in that context you’re playing the role of the computer and not the mathematician. It’s like getting the part of a spear carrier in a play and claiming, “acting is just about standing around looking fierce!” It’s a small, albeit important, part of a much larger picture.

Having studied all three subjects, I’d argue that mathematics falls between language and programming on the hierarchy of rigor.

Human language
Mathematics
Programming

and the hierarchy of abstraction is the exact reverse, with programming being the most concrete and language being the most abstract. Perhaps this is why people consider mathematics a bridge between human language and programming. Because it allows you to express more formal ideas in a more concrete language, without making you worry about such specific hardware details like whether your integers are capped at 32 bits or 64. Indeed, if you think that the core of programming is expressing abstract ideas in a concrete language, then this makes a lot of sense.

This is precisely why learning mathematics is “better” at helping you learn the kind of abstract thinking you want for programming than language. Because mathematics is closer to programming on the hierarchy. It helps even more that mathematics and programming readily share topics. You teach graph coloring for register allocation, linear algebra and vector calculus for graphics, combinatorics for algorithms. It’s not because you need to know graph coloring or how to count subsets of permutations, but because it shows the process of reasoning about an idea do you can understand the best way to organize your code. If you want to connect language to programming you almost always have to do so through mathematics (direct modeling of sentence structure via programming is a well-tried and unwieldy method for most linguistic applications).

Big-O is “pretty much meaningless” #

Another issue I have with Mei’s article is on her claim that “big-O” is meaningless in the real world. More specifically, she says it only matters what the runtime of an algorithm is on “your data.”

Let’s get the obvious thing out of the way. I can name many ways in which a result in improving the worst-case asymptotic complexity of an algorithm has literally changed the world. Perhaps the biggest is the fast Fourier transform. So if you’re applying to work at a company like Google, which deservingly gets credit for changing the world, it makes total sense for interviewees to be familiar with the kind of mathematical content that has changed the world in the past. Maybe it’s a mistake for smaller companies to emulate Google, but you can’t blame them for wanting to hire people who would do well at Google.

But at a deeper level I don’t believe Mei’s argument. Her example is this.

An algorithm that is O(n**2) for arbitrary data may actually be constant time (meaning O(1)) on your particular data, and thus faster than an algorithm that is O(n log n) no matter what data you give it.

First, the chance is absolutely negligible that you will come across a nontrivial problem where the runtime of a standard algorithm meets the worst case on “your” data, but when you use a generally-considered worse algorithm it does much better. Second, there is a very rich mathematical theory of, “algorithms that run extremely fast and return correct answers to queries with high probability.” So again, you can turn to mathematics where the expectations are quantifiable rather than arbitrary and guessed.

But more deeply, nobody in industry has any clue what it is that characterizes “real world data” that allows you to make worst-case guarantees. They have a fuzzy idea (real social networks are usually sparse, etc.), but little in the way of a comprehensive understanding. This is a huge topic, but it’s a topic of active research, which is uncoincidentally filled to the brim with mathematics. The takeaway is that even if you have an algorithm that seems to run quickly on “your” data, “seems” is the best you’ll be able to say without answering big open research questions. You won’t be able to guarantee anything, which means you’ll be stuck telling your manager that you’re introducing more points of failure into the system, and you risk being paged in the middle of the night because the company has expanded to China and their data happens to break your algorithm on average.

But you’re a smart engineer, so what do you do? You run your clever algorithm and track how long it takes; if it takes too long, you abort and do the standard O(n log n) solution instead. Problem solved. But wait! You needed to know the difference between your algorithm’s worst case complexity and the baseline complexity, and you had to determine how long to wait before aborting.

The fact is, you can’t function without knowing the baselines, and asymptotic runtime (big-O) is the gold standard for comparing algorithms. Certainly you can mix things up as appropriate, as the fictional engineer in our story did, but if you’re going to do a more detailed analysis you have to have a reference frame. At a company where a one-in-a-million error happens a hundred times a day, mathematical guarantees are (literally) what help you sleep at night. Not every programmer deals with these questions regularly (which is why I don’t think math is necessary to be a programmer), but if you want to be a great programmer you had better bet you’ll need it. Companies like Google and Amazon and Microsoft face these problems, aspire to greatness, and want to hire great programmers. And great programmers can discuss the balance issues of various algorithms.

But Sarah Mei is right, there might be some interesting ways to model algorithms running better on “your” data than the worst case (and if I were interviewing someone I would gladly entertain such a discussion), but I can say with relative certainty that even an above-average math-phobic interviewee is not going to have any new and deep insights there. And even if one does, one needs to be able to answer the question of how this relates to what is already known about the problem. Without that how can you know your solution is better?

A “minor specialization” #

Now my biggest beef is with her conclusive dismissal of mathematics.

If a small and shrinking set of programming applications require math, so much so that we cordon you off into your own language to do it, then it’s pretty clear that heavy math is, these days, a minor specialization.

Oh please. You can’t possible think that every mathematician who programs does so in Fortran or Haskell. I’m a counterexample: I’m proficient in C, C++, Java, Python, Javascript, HTML and CSS. I have only really dabbled in Haskell and Racket and other functional languages (I like them a lot, but I just get more done in Python).

But what’s worse is that I have so many programming applications of mathematics that I don’t know what to do with them all. It’s like they’re sprouting from my ears!

Let’s take the examples of what Mei thinks are purely unmathematical uses of programming: “ease of use, connectivity, and interface.” I’m assuming she means the human-computer interaction version of these questions. So this is like, how to organize a website to make it easy for users to find information and streamline their workflow. I’d question whether anyone in the industry can really be said to be “solving” these problems rather than just continually debating which solution they arbitrarily think is best. In fact, I’m more inclined to argue that companies change their interface to entice users to pay for updates more than to make things easier to use (I’m looking at you, Microsoft Word).

In any case, it’s clear that Mei is biased toward one very specific kind of programming, which does have mathematical aspects (see below). But moreover, she blurs the distinction between an application of mathematics to programming and what she finds herself and her colleagues actively doing in her work. Let me counter with my own, more quantifiable examples of the mind-bogglingly widespread applications of mathematics to industry, both passive and active.

Optimization: the big Kahuna of mathematical applications to the real world. Literally every industrial company relies on state of the art optimization techniques to optimize their factories, shipping lines, material usage, product design. And I’m not even talking about the software industry here. I’m talking about Chevron, Walmart, Goldman Sachs. Every single Fortune 500 company applies more “heavy” math on a daily basis than could be taught in four years of undergraduate education. They don’t care about ease of use, they care about getting that extra 0.05% profit margin. And as every mathematician knows, there is a huge theory of optimization that ranges from linear programming to functional analysis to weird biology-inspired metaheuristics.

Signal processing: No electric device or wireless communication system would exist without signal processing. The entire computer industry relies on digital signal processing techniques and algorithms proliferated via mathematics. Literally every time you type a key on your keyboard, you’re relying on applications of mathematics to programming. Sure, you don’t need to know how to build a car to drive it, but signal processing techniques extend to other areas of programming, such as graphics, data mining, and optimization, and a large portion of the software industry is disguised as the hardware industry because they use languages like VHDL instead of Ruby. They really need to know this topic, and it’s not fair to forget them. That being said, let’s not forget all the engineers who do signal processing in Matlab. Our list just keeps getting bigger and bigger, huh?

Statistics: Every company needs to manage their risk and finances via statistics, and every application of mathematics and statistics to risk and finance is done via programming. Whether you use SAS, JMP, R, or just Excel, it’s all programming and all requires mathematical understanding. This is not even to mention all of the statistical modeling (via programming) that goes on in a non-financial setting. For example, in Obama’s presidential campaign and in sports forecasting. Even as I write this, NPR is reporting on the Malaysia flight that was shot down in Ukraine, and how technicians are using “mathematics and algorithms” to pinpoint the location of the crash.

Machine Learning: A hot topic these days, but for a long time engineers have been trying to answer the question, “what does it mean for a computer to learn?” Surprise, surprise, the generally accepted answer these days came from mathematicians. The theory of PAC-learning, and more generally its relationship to the many widely-used machine learning techniques, paved the way for things like boosting and the study of statistical query algorithms. Figuring out smart ad serving? Try bandit-learning techniques. It’s mathematics all the way down.

Graphics/Layout: You want ease of use in human computer interaction? You want graphics. You want special effects in movies? You need linear algebra, dynamical systems, lots of calculus, and lots of graphics programming. You want video games? Data structures, computational geometry, and twice as much graphics as you thought you’d ever need. You want a dynamic, adaptive, tile-based layout on your website? Get ready for packing heuristics, because that stuff is NP-hard! Information trees, word clouds, rankings, all of these layout concepts have rich mathematical underpinnings.

You see, Mei’s fundamental misconception is that the kind of applications that we haven’t yet automated and modularized constitutes what programming is all about. We don’t know how to automate the translation of obscure and ambiguous business rules to code. We don’t know how to automate the translation from a picture you drew of what you want your website to look like to industry-strength CSS. We don’t know how to automate the organization of our code so as to allow us to easily add new features while maintaining backwards compatibility. So of course most of what goes on in the programming industry is on that side of the fence. And before we had compilers we spent all our time tracking memory locations and allocating registers by hand, too, but that’s no more the heart and soul of programming than implementing business rules.

And by analogy, most of writing is not literature but fact reporting and budget paperback romance novels, but we teach students via Twain and Wilde. And most cooking is reheating frozen food, not farm-to-table fine cuisine, so should a culinary student study McDonald’s?

But if you wanted to genuinely improve on any of these things, if you wanted to figure out how to automate the translation of drawn sketches to good HTML and CSS, you can count on there being some real mathematical meat for you to tenderize. I hope you try, because without mathematics we programmers are going to have an extremely hard time making real progress in our field.

The world will never need more than five quantum computers

2014-06-24T13:18:44-07:00

I have been gradually making my way through Scott Aaronson’s wonderful book, “Quantum Computing Since Democritus.” The book is chock-full of deep insights phrased in just-technical-enough language (the kind which I want to relay to the world through an internet megaphone). Scott really has learned how to apply the good and bad attitudes of the past to the problems of today.

For example, did you know that originally computers had so many problems with errors that many people argued fault-tolerant computers would never exist? This was before the transistor, of course, but it was believed that the external world would always have such an adverse interference with the physical machine that one could not reliably use the outputs. John von Neumann proved to the contrary that even with the error-prone hardware of the time it was possible to design perfect fault-tolerance into a machine. But his accomplishment was largely forgotten after the transistor was invented and shown to be so reliable as not to need any extra error-correction scaffolding.

But the idea that computers would never be error-tolerant enough was probably the origin of the famous slew of quotes that the world would never need more than five computers. It’s not so ridiculous a proposition in that context, since the world also only has need for around five particle accelerators. Scott notices the parallel for quantum computers, the worry that the outside world would interfere with the computations so as to render them useless, and discusses the existence of quantum fault-tolerance in the same vein as von Neumann’s theorem.

Nevertheless, the question of whether the world will ever need more than five quantum computers (assuming they’re feasible to scale) is still a poignant one. It’s not because of error, but because of what kinds of problems quantum computers are believed to be better at than classical computers.

You see, it’s widely known that quantum computers aren’t more powerful than classical computers in the sense that they can compute things that classical computer cannot. The real question is one of efficiency, and by efficiency I mean the difference between polynomial time and worse-than-polynomial time and the problem scales. The truth is we only know of a few key problems that we know quantum computers can solve efficiently, and that we don’t know for sure that classical computers can’t.

One example is factoring integers. We know that quantum computers can factor integers quickly, but we don’t know for sure that classical computers cannot. In fact, many researchers believe that, because of recent advances in computer science and cryptography, we will find a polynomial-time algorithm for factoring integers relatively soon.

The question is, who really needs to factor integers on a regular basis? The only answer I can come up with is number theorists (trying to prove theorems) and the government (trying to break encryption). But these days people are moving away from factoring-based encryption. So who’s left to care?

There are, admittedly, other ways that quantum computers can speed up things, but it’s not as drastic as the mainstream media would have one believe. For example, the best known speedup for solving NP-complete problems, which includes most scheduling, packing, and routing problems (an efficient algorithm for this would revolutionize the world), is on the order of a square-root. That is, it reduces the time from an exponential to a square-root-of-an-exponential, which is still egregiously slow.

This is not to downplay the importance of quantum computing. It’s a multifaceted subject providing a vast trove of interesting problems, answers, and discussions. It excites me that one day I might actually contribute some small fact to nudge forward human knowledge about quantum computing. But the set of useful problems we know how to solve efficiently with quantum computers is just so minuscule. In order to convince me that quantum computers may someday become commonplace, one would need to present a problem that quantum computers can solve with applications on the scale of Facebook. It needs to be something that potentially every human could have use for. And while I am not an expert in quantum computing, if such a problem and solution existed I’d probably have heard of it by now (it would be trumpeted along with factoring and the hidden subgroup problem as triumphs of the model).

So unless there are extreme revolutions in theoretical computer science, which is certainly possible, it seems safe to reuse that infamous quote here: the world will never have need for more than five quantum computers.

Reductions are the Mathematical Equivalent of Hacks

2014-05-05T10:44:10-07:00

Though I don’t remember who said it, I once heard a prominent CS researcher say the following:

Reductions are the lifeblood of theoretical computer science.

He was totally right. For those readers who don’t know, a reduction is a systematic way to transform instances of one problem into instances of another, so that solutions to the latter translate back to solutions to the former.

Here’s a simple example. Say you want to generate a zero or a one at random, such you’re equally likely to get either outcome. You can reduce this problem to the problem of generating a zero or a one with some biased probability (that’s not completely biased).

In other words, you can simulate a fair coin with a biased coin. How do you do it? You just flip your biased coin twice. If the outcome is “heads then tails,” you call the outcome of the fair coin “heads.” If the outcome is “tails then heads” you call the outcome of the fair coin “tails.” In any other event (TT or HH), you try again. This works because if you know you flipped one heads and one tails, then you’re just as likely to get the heads first as you are to get the tails first. If your coin is biased with probability p, these two events both happen with probability p(1-p).

Even more fascinating is that you can go the other way too! Given a fair coin, you can simulate coins with any bias you want! This is a quantifiable way to say, “biased coins and unbiased coins are computationally equivalent.” Theoretical computer science is just bursting with these cool proofs, and they are the mathematical equivalent of a really neat “hack.”

Why do I call it a hack? The word is primarily used for bad but effective solutions to programming problems (avoiding bugs without fixing their root cause and such). But another use of the word is to successfully use a thing for a purpose against or beyond its original intention. Like exploiting a buffer overflow to get access to sensitive data or using building lights to play Tetris, hacks have a certain unexpectedness about them. And most of all hacks are slick.

Reductions come in many colors, the most common of which in computer science is the NP-hardness reduction. This is a reduction from a specific kind of problem (believed to be hard) to another problem while keeping the size “small,” by some measure. And the reason it’s important is because if you show a problem is NP-hard (has a reduction from a known NP-hard problem), then you are including it in a class of problems that are believed to have no efficient solution. So in this case a reduction is one way to measure the difficulty of a problem you’re studying.

One really fun example is that the rule-sets of most classic Nintendo games are NP-hard. That is, you can design a level of Donkey Kong Country (or Super Mario Brothers, or Pokemon Red) so that getting to the end of the level would require one to solve a certain kind of logic problem. So if you could write a program to beat any Donkey Kong level (or even tell if there is a way to beat it), you could solve these hard logic problems.

The key part of the reduction is that, given any such logic problem, you can design a level that does this. That is, there is an algorithm that transforms descriptions of these logic problems into Donkey Kong levels in an efficient manner. The levels are quite boring, to be sure, but that’s not the point. The point is that Donkey Kong is being used to encode arbitrary logic, and that’s a sweet hack if I’ve ever see one.

If you enjoy the hacker mindset, and you want to get more into mathematics, you should seriously try reading about this stuff. You have to wade through a little bit of big-O notation and know that a Turing machine is roughly the same thing as a computer, but the ideas you unlock are really fun to think about. Here’s an article I wrote about P vs NP, actually implementing one of the famous reduction proofs in code.

Even better, once you understand a few basic NP-hardness reductions, you can already start contributing to open research problems! For example, nobody knows if the problem of factoring integers is NP-hard. So if you could find a way to encode logic in a factoring problem the same way you can for a Donkey Kong level, you’d be pretty famous. On the easier side, it just so happens that potentially NP-hard problems show up a lot in research. Two of my current research projects are about problems which I suspect to be NP-hard, but for which I have no proof. And once you prove they’re NP-hard then you can start asking the obvious follow-ups: can I find good approximate solutions? How much easier do I need to make the problem before it becomes easy? The list goes on, giving more and more open questions and, the best part, more opportunities for great hacks.

My New LinkedIn Summary: Breaking the Fourth Wall of a Resume

2014-05-01T15:38:56-07:00

LinkedIn is a weird niche in the internet: it’s a place for recruiters to reach out to candidates without a completely cold-email approach, along with a smattering of other relatively unimportant things going on (lots of “congrats” notes and the occasional unsubstantiated endorsement).

It’s not clear whether it’s a good niche or a bad one, but what is clear is that the most likely person to get their first introduction to me via my LinkedIn profile is a recruiter. So I can target my resume more effectively. I know exactly where to aim in terms of the reader being familiar with me and my work. With that in mind I recently rewrote my profile summary:

If you’re looking at my LinkedIn profile (as opposed to my academic CV [1] or my blog [2]), then chances are you’re a recruiter at a software company. Chances are also good that you haven’t got the first impression most people have of me: I love math.

Let me say it again: I really love math. I like doing it [3], learning it, talking about it, and writing about it [4]. So it would be foolish to try to get me into a job where I’m not spending at least 20% of my time thinking about math.

That being said, my favorite kinds of math are the kinds that unlock fascinating programs. I was originally trained as a software engineer, and so I love it when mathematical ideas and programs together allow one to, for example, recognize faces [5], design economic markets [6], or create fun games [7]. That’s part of the beauty of math: it can apply in wild and unexpected places.

If I’m going to work for a company that isn’t explicitly mathematical, this would be my dream job: finding ways to apply mathematics to improve existing features or add new ones. It doesn’t have to involve genuinely original mathematics, it doesn’t even have to involve particularly clever mathematics. But I require some minimal amount of mathematical engagement in whatever I do.

Finally, I’m guaranteed to decline all job offers before I finish my PhD. But if we have a chat and your company seems to fit, I’d be glad to contact you once I’m on the market.

What "Counts" as a Mathematician?

2014-04-18T09:30:17-07:00

The “Best Job” of 2014 #

A few days ago the website CareerCast (Adicio, Inc) released a list of the top jobs in 2014 which put “Mathematician” as number 1. Most news sites have used this as a platform to discuss the centrality of mathematics and technology in the world economy, or the importance of STEM (Science, Technology, Engineering, and Mathematics) in education. I’m not against such discussions — indeed I spend a large fraction of my time writing long and detailed posts explaining mathematics to anyone who will listen — but I do suspect the ranking is misleading.

As every mathematician knows definitions are extremely important, so I wonder how mathematician is defined for the purpose of this ranking. After a bit of snooping it appears at least part of their analysis comes directly from the US Bureau of Labor Statistics website, which in turn uses an aggregation of two occupational classification models.

The first is the North American Industry Classification System, which has no record solely for a mathematician. The closest they get is the following

The second source, the 2010 Standard Occupational Classification is more useful. They distinguish between a variety of mathematical fields, but still give no information about how the data was collected for any occupation. Mathematicians just has a list of examples that are clearly biased:

But on the other hand the major group category gives some more pleasing discriminations, such as operations research analyst and “mathematical technician” (which I think is a wonderfully useful category).

The search hits a dead end here, because neither does the Census Bureau or the Bureau of Labor Statistics state how they determined their statistics from the classification (Did they use job title? Did they ask the people surveyed what they consider themselves?) nor does CareerCast state how they aggregated their 200 jobs out of the 500+ jobs that the BLS aggregated statistics for.

Mathematicians also have an odd place in a survey of occupations because mathematics is so intertwined with other disciplines. Two people with the same job title (say, “Security Expert”) could have different enough jobs that one is a mathematician while the other isn’t. Indeed, even the Bureau of Labor Statistics agrees,

Most people with a degree in mathematics or who develop mathematical theories and models are not formally known as mathematicians.

So the question is what counts as a mathematician? Or better, since nobody has seemed to ask this question: what should count as a mathematician? I’ll give my answer with a few examples to ramp up to my final answer. I want to preface my examples with a large and bold claim: I am not making a value judgement either about mathematician’s being good or other jobs being bad. I like being a mathematician, but I don’t consider people in mathematical fields (who I don’t consider mathematicians) to be lesser in any way. I am simply trying to come up with a well-defined (if somewhat informal) classification rule that aligns with my idea of what a mathematician is. So here it goes.

Examples and a Definition #

I don’t consider an actuary to be a mathematician (and I’m glad to see that CareerCast appears to agree). Actuaries certainly need to know and use a lot of mathematics in what they do; they manage risk and risk is naturally mathematical. But they use mathematics as opposed to doing mathematics.

In the same vein, many data scientists are not mathematicians. Why? Because despite their analytical skills and statistical know-how, data scientists largely apply known statistical and machine learning models as black boxes to their data sets. This of course depends on the scientist, and there are many critics of posers in the land of data science. As Cathy O'Neil puts it:

My basic mathematical complaint is that it’s not enough to just know how to run a black box algorithm. You actually need to know how and why it works, so that when it doesn’t work, you can adjust.

I would take this even farther: a data scientist should not be considered a mathematician unless their job requires them to significantly modify standard models and algorithms to suit their needs. Even better, they should be creating new models.

More generally, I can now make the following definition of a mathematician.

Definition: A mathematician is someone who, as part of their occupation, devotes a nontrivial portion of their time to the invention of new mathematics.

“Inventing new mathematics” also requires a definition, and I would consider it to be one of the two following things:

Any original model, algorithm, problem, heuristic, or mathematical definition.
An original theorem, proof, conjecture, or analysis pertaining to one of the above.

One might protest: nothing can truly be original anymore! I don’t mean to use original in the sense that something has never been done before, but that it is something that you have never seen or done before. You cannot be called a mathematician if your “new model” is knowingly (by you) and trivially derivative of someone else’s work. You cannot be called a mathematician if you use or implement someone else’s algorithm. You can be called a mathematician if you prove the correctness or efficiency of an algorithm. This is regardless of whether the algorithm is widely known, considered interesting or important, or even whether an identical analysis was done in the 70’s. All that matters is whether you’re the one producing original mathematics.

This extends to invalidate typical measures: you are not a mathematician just by having a mathematical publication, nor are you not a mathematician if you have no publications. But if you publish regularly in mathematical journals or conferences, then you are a mathematician.

As a thought experiment to illustrate my point, say you were being paid, as an occupation, to reinvent basic geometry while you were devoid of contact with the outside world. You might spend much of your time puzzling over trivial facts taught in high schools every day, and you might come up with definitions and theorems that are wholly worse than Euclid’s. But you are still a mathematician.

The finer point is that doing mathematics is equivalent to inventing mathematics. That’s why I think the classification “mathematical technician” is such a wonderful category: it gives a name to the people who apply standard techniques to solve problems that don’t require new mathematics. They are the engineers, financial analysts, and operations researchers using satisfiability-solvers to optimize their chip designs and Black-Scholes to price their options.

A perhaps displeasing consequence for mathematicians hoping to keep the #1 spot on that list is that this definition declares graduate students in mathematics to be mathematicians. And I would argue this is rightly so: the purpose of a PhD program is to induct one into the research community as a peer. So the second you start trying to tackle original problems is when you don the title of mathematician. (Fair disclosure, I am a PhD student in mathematics)

The unfortunate part is that the median salary of a graduate student is quite low. Many, including myself, make roughly the federal minimum wage. Considering that mathematics PhDs take 4-6 years and that drop-out rates are nontrivial and career switches are common (many PhDs end up teaching without inventing any new mathematics), it seems deceitful not to factor that into the analysis.