Deep Learning and Cat Photos

So I've been learning about "deep learning" for work, which in layman's terms (and I am very much a layman) is a way of getting computers to beat the captcha code - understand things we understand intuitively, only sometimes better.

I should explain.

To a computer, a picture isn't "a picture" the way we think about it (we could argue that even we don't think of it the way we think we do). To a computer, a picture is a "grid" of pixels, and each picture is a color. A color is just a number (okay, a vector or matrix) - often a three part code called an RGB like (0,0,0) is black and (255,255,255) is white. Many computers don't even have eyes, but they're all overgrown calculators, so numbers they understand.

They also understand differences in numbers. When two similar numbers are close together, that's probably two bits of the same thing. If two different numbers are close together, that's an edge.

That's where cat pictures come in.

Suppose you were to tell a computer that a cluster of edges was a cat (1). That'd be swell, and if that cat sat in that position at that distance from the computer, it might even recognize it. Maybe you'd also show it a picture of an ice cream, or a dinosaur, and inform it that this was not-a-cat (0). If you did that a couple thousand times, the computer could come up with a better than even guess of what a cat "looked" like, based on the relationship between all those edges.

Like the sadistic parent you are, you then give your beaming machine 100 pictures and don't tell it what they are. It has to guess. Then you tell the poor thing it got 40% of the answers wrong and it says "01100110011101010110001101101011" which is a curse word in binary.

Scientists don't actually know how our brain works. A simple neuron does a lot more than fire or not fire, and the brain's storage capacity is very very big (or at least much bigger than our attention span). And the task is frankly unfair; computers can't take 2 years of pulling the family pet's ears, smelling kitty litter, and getting scratched to figure out what a cat is, mostly because we're not that patient.

Fortunately for them, computers can do a lot of math really quickly. So it takes a look at what change in its assumptions would make its guess more accurate (it takes the slope and subtracts it bla bla bla calculus stuff) and does that a couple thousand times, for each individual pixel, all at once. Then it tries again. Then it wipes the slate and tries again from another direction. It tosses its assumptions to the wind, ignores some pixels one round and stares at them another, because a computer doesn't really care what a cat is, but it's pretty dedicated to being right.

And once it knows what a cat is, it can crawl around the web collecting thousands of cat pictures for your perusal without ever taking a break.

So when you're pissed your screen won't unlock, or facial recognition doesn't work, or your spellchecker missed a missing word, keep in mind all the work that programmer had to go through to get your fancy calculator to surf cat pictures.



Subscribe in a reader