Skip to content

Latest commit

 

History

History
146 lines (79 loc) · 41 KB

we-really-do-not-know-how-to-compute.md

File metadata and controls

146 lines (79 loc) · 41 KB

by Gerry Sussman
at Strange Loop conference on Oct 27, 2011

Note: This document includes sligthly edited transcript of Sussman's talk with associated slides. It is organized around topics discussed.

All slides can be found here.

We are in a real trouble and have not the foggiest idea how to compute very well. There might be some glimmer of hope on the horizon, in number of different directions, some of which I might point out.

Most things are obsolete

Most things we are talking about, even here, are obsolete. As I said to somebody at the yesterday's Haskell session: this is the most advanced of obsolete languages.

You now see the triange that is not there. What did you do? It took only 100 or maybe 200 ms, or few tens of neuron times. I do not care how parallel the machine is. I do not have the foggiest idea how to write a program that organizes complex process like that in the latency of about 30 or 40 steps. That is the example of something we do not understand how to do.

Human genome

And more serious than that, human genome is about a GB. That is instructions for making you from a cell. You are pretty complicated machine. A GB is the same size as the source code appropriate for making say Microsoft Windows. It is not much bigger or smaller, it is something like that. It has all the possible things it can do, all possible drivers which might be relevant. It is probably of that size. It is not much bigger, not 3 orders of magnitude bigger. A GB is not that big, but what is interesting is that with the small change you make a ? out of the person, so it is flexible too. So we have enormously flexible language there that we do not have the foggiest idea what it is like. I am not talking about DNA. DNA is like gates, or something. What I am interested in is the higher level language that is necessary to do things like that and we certainly do not know how to do that.

Salamander example

Biologists do nasty experiments on animals. Suppose you took a salamander. Salamander can regrow it's limbs if you cut them off. So all these biologists do nasty experiments. They cut salamander's arm off between the shoulder and elbow, and between the elbow and the hand, flip around that segment and resow it back on. What do you think you get? You get 3 elbows. It grows for the fact that the wrong thing is connected to the wrong thing. That's pretty weird. But that is a clue to something that we should be thinking about. How do you talk about how to make a bazillion - human is 10^12 cells, salamander is probably 10^11 or 10^10 - but the fact is how do you make something big like that grow itself? Little neighbours talking to each other and they say: I am supposed to connect to something like this but it ain't there, so we will make some of that. That is sort of amazing kind of process that is going to be necessary if we want to approach the future in our computing because it is almost impossible to setup the situation in which we know everything that is going to happen in the future.

Let me go into the past a bit. When I started computing in 1961 I was computing on the IBM 650 at the Columbia State University. That machine costs half a million bucks. It had 2000 words of 10 decimal digits each of memory on a drum that rotated so that access time was 12.5 ms. 10 ms cycle time. Total amount of memory was probably about 10 KB. For the price of that machine now I can buy 50 thousand PCs each with a GB of RAM (that is 10^4 times as much) and million times faster. So one of the things that has been happening over the years (I am not trying to worry about Moore's law or anything like that). What I am worried about is that all our intuitions from being programmers have come from a time of assuming a kind of scarcity, a world in which computation is somehow expensive, where memory is somehow expensive, where we have to optimize it. Well, there some applications like that, but the answer is that most applications are not like that at all. Right now, memory is free, computing is free. It is our problem to figure out how to use computing like that. And if I have 50 thousand PCs in a network which will probably cost about million (modern) dollars, I can make a very serious computer out of that. It is no longer necesssary to minimize operations, to worry about how much memory we take up. It is very important to minimize latency. That is critical, you have to understand that.

The real cost of programming is a cost of programmers. Here is a quote from Mr. Evans made a while ago. It turns out that almost all cost of computing is you, and me. That's a different world. That's the real problem.

Things people worry about

There is lots of things people worry about. People worry about correctness. That is a problem. Is it a real problem? Maybe, probably not. Most things do not have to work. Look at Google. If it makes a mistake, it does not matter, so long if the next time it gets the right answer, or maybe something close to the right answer, reasonable answer. People worry about security. Well they have no idea how to handle security. We will find out for example that humans manage to survive for about 70 years. We are continuously attacked by the parasites. It must be that there are some bad things we are doing. It is all in that GB of code. You have to think about that. We certainly have no good ideas how to make things last for very long time by continuously being attacked by mutating parasites.

Evolvability and modifiability of code

The other thing that seems really important to me is that we spend all our time modifiying our existing code. The problem is that our code is not adequately evolvable and modifiable to fit the future. In fact, what we want is to make systems that have the property that they are good for things that the designer did not even think of or intend. That is a big problem. That is a real problem I want to go after a little bit. The problem is that when we build systems we program ourselves into holes. I have done it a number of times. I have been programming since 1961, and found more ways to screw up that anybody I know. I learned a lot I hope but not an infinite amount. So here is this problem. How do we keep ourselves from programming ourselves into holes? What does that mean? It means that we make decisions early into some process that spread all over our system. The consequences of those decisions are such that the things we want to change later are difficult to change because we have to change very large amount of stuff. That is the problem. I want to figure out ways we can organize systems so that the consequences of the decisions we make are not expensive to change. We cannot avoid making decisions, but we can try to figure out ways to minimize the cost of changing decisions we made.

Let me start with the simple example, one that is so familiar that we all do it these days, certainly in good dynamically typed langages like Lisp or Python or whatever, or good statically typed languages like Haskell or whatever. We always have generic operators, we call it operator overloading and things like that. Lets talk about that for a second because I am going to be very extreme about this. Here is a traditional arithmetic you will find in Scheme. I can add integers together, I can add integers to rationals, I can make new rationals, I can add integers to reals (floating-point thingies that are very dangerous, nothing brings fear to me than a floating-point number, numerical analysis - the blackest art I know). We can deal with complex numbers, etc. But, those things are built in. How about dynamically extensible generic operations? Now, we can do that. Of course if you separate compile time and runtime, something weird is happening here. I want to be able to dynamically extend things while my program is running. I want a program to compute up what it is going to do. Remember, in the old days we used to program in the assembly language. It was a great language because you could write the next instruction you are going to execute. You could put it into the place you are going to go to. It just was. Many interesting programs were written that way and it was very flexible. There is too many fearful people in the audience. I see a lot of people cringing, but the fact is, you can do it in Lisp also - you can eval something you cons up. That is very useful.

I can expand this to the arithmetic on functions. I can add sin and square. I get a new function because that is a classical thing people can do in algebra - if you have 2 functions with the same kind of input, then the sum of those is assumed to be the sum of the outputs. That is for things that are typed procedures with the same number of arguments. I can talk about sum of the squares of sin and cos applied and I get 1. That is an example of simple extension. You have all these nice things. I was wathing somebody talking about scalaz today, and of course you can do that.

Then you get to things where Lisp gets things better. There is symbolic algebra. Now, this is also a generic extension. I am not going to explain how it works except that I can tell substraction, multiplication, division, all those things to deal with symbolic entities as well, literal numbers. And what I get is a symbolic expression as a result of such a computation and I can of course ... that is a fragment of a program that came out. So I can write a program that writes a program, and I can use it. I can tell you a good example of the use of that. It is numerical analysis example - Gear integrator, for Siff equations, differential equations. It turns out there are some polynomials you have to compute up. If you compute them ahead of time you will double your step size. But if you want to be able to make the step size any size you want, you want to be able to compute the polynomial at the moment. So you write something that symbolically does that, then you pass that through compiler as you need it and then you run it and use it, in the same program that is running, integrating along. The point is that you can get essentially fully native code efficiency. I like Lisp because it has uniform representation of programs and data, so that symbolic manipulation produces stuff that I can execute.

You can do stuff that is even more wild. Did you know that automatic differentiation can be done as a generic extension? What you do is create hyperreal numbers, numbers that look like x+Δx, in the same way complex number is real part plus "j" times imaginary part, "i" if you are matematician ("j" if you are electrical engineer). You can say: derivative of the sum of cos and square function applied to x. Now I am using my symbolic part as well, so that these things are mixed together. I can look at something more complicated, derivative of the function of 2 variables, which is a structure of 2 partial derivatives, which is what it should be.

I am going to show you a trick about how to do this. This is producing compositions in the sam way you have combinators. So that is useful kind of thing. I actually use this all the time, because I teach a class in advanced classical mechanics. I use computer programming in my class to clarify what we write in equations, because mathematical notations are both awful and it turns out ambiguous. To be perfectly honest, it is impressionistic. When I am talking to you in English it is the same phenomena. Mathematicians are talking to each other because almost all of them are alike. The reason you can understand me because we are almost all identical. Only few bits have to transferred in order to get idea across. Mathematicians are painting the idea in the platonic world of the other matematician's mind by talking to them and they write down basically impressionistic scribbles in the same way my speech is not gramatical, beautiful English.

But when we write programs we have to be precise because computers are dumb at this point and as a consequence the great way to show someone the real story I can write things like [this]((slides/page_10.pdf). The definition of the method of computing Lagrange equations from Lagrangian, given configuration space path. For those of you who do not about these things it does not matter, I am not going to dwell on it. That is easy stuff. What if I am dealing with something with the couple of rigid bodies that are wobbly and have all sorts of ... Then I cannot write equations on this board. If you do celestial mechanics like I have done, sometimes equations are 20 pages long, and it better be done by the computer. In the old days people did it by hand. They got it right most of the time. It is amazing. I cannot do things that people did in 1911 or so. Now, we can do this easily, and it turns out this is very simple because of simple generic extension. I am not trying to say that this solves any important problem. Very careful about that.

Little tiny feeling what is needed to make things more flexible

This is little tiny feeling what is needed to make things more flexible. Consider the value here. The value is that I am dynamically allowed to change the meaning of all my operators to add new things they can do, dynamically, not at compile time, at runtime. Our programs can produce things which are new way to add, that sort of stuff. And then it can attach this to addition, take it off. It can look like binding. I can bind that idea. That is wonderful. What is it doing? It is giving me tremendous flexibility because the program I wrote to run on a bunch of numbers, with only a little bit of tweaking, suddenly runs on matrices as long as I did not make a mistake of computing something wrong. If I did, I am in trouble. This makes proving theorems about things very hard, in fact maybe impossible. The cost and the benefit are very extreme. I can pay correctness, or proof of correctness, or belief in correctness, for tremendous flexibility. Now, we worry about the ideas like whether correctness is the essential idea I want. I am not opposed to proofs, I like proofs, I am an old mathematician. The problem is that putting us in the situation, which Mr. Dijkstra got us into, about 30 or 40 years ago, where you are supposed to prove everything to be right before you write a program. Getting yourself in that situation puts us into the world where we make our specifications for parts as tight as possible because it is hard to prove general things, except occassionaly sometimes it is easier. Usually it is harder to prove general theorem about something. You make a very specialized thing that works for particular case. You build this very big tower of stuff, and boy is that brittle, change the problem a little, and it all falls over. We did not learn something from electrical engineering world, the digital abstraction, whereby the inputs to something accept much bigger range than the outputs they are allowed to produce, and that way you get rid of noise at every stage. That is kind of flexibility I am expecting to get. By the way, none of the old stuff ... when you do this kind of thing, generic extensions, it does not mean that the old stuff break, it means that the new stuff is not proved. So, many of the techniques I am advocating make proof much more difficult and perhaps impossible. Now, I am going to be little more extreme. I haven't been extreme yet. This is the beginning of stuff.

Everything in the world looks like my 650

I think that one of the problems in the kind of architecture we are assuming. Everything in the world looks like my 650. Occasionally people have made wonderful experiments. GPU is different. The Connection Machine was the experiment which was different. But in the future it is going to be the case that computers are so cheap and so it is easy to make them the size of the grain of sand with MB of RAM, buy them by the bushel I suppose, you can pour them in your concrete, buy your concrete by a mega flop, and have a wall that's smart. So long that you can get power to them, they can do something. That is going to happen. Remember, your cells are pretty smart. They have about a GB of ROM, few bytes of RAM probably, or maybe a few hundred (we do not know how many). But, they seem to talk to each other and do useful things. So, we have to begin to worry about that kind of world.

Teaching circuit theory at MIT

I am going to tell you the story that started around 1975 and recently had some advances. It is called propagators. It started when I was teaching my first classes at electrical engineering at MIT in circuit theory, actually the first class was field theory, but then I taught circuit theory class with real wizard of circuitry Paul Penfield. And I observed what we taught to students was not at all what students were actually expected to learn, i.e. what an expert person when presented with the circuit that looks like that was quite different from what we tell them to write down node equations and there are certain for resistors and capacitors and inductors and transistor which has complicated non-linear equation, and you are supposed to grind these equations somehow and solve them to find out what is going on. That is not what a really good engineer does. What really good engineer does is looks at these this at says: let's consider the bias analysis here. That means these capacitors are open circuits, etc. (follows analysis of the circuit). Therefore I can tell you all the bias conditions in this transistor. You can use this to check whether my assumptions are right, and they are. (Follows more circuit analysis ...) Within 5% that is right, and every real engineer does that. That is not the sort of thing that we were teaching the students. So I tried to figure out how can we teach students that sort of thing. Of course, my vision of things is that you write a program and give it to the students to read. What I did is I hired Richard Stallman in 1975 who worked for me and he and I wrote that program together. It was quite a success.

Modern version of the program for circuits

I will show you a modern version of it. The old code does not seems to exist anymore. I went looking for it, it was on the old ITS operating system, and it is gone, probably on some backup tape somebody has it can be found but it is not around now. Here is a new version of the same program. Here is a representation of that amplifier. I am not going to read it to you. It is a wiring diagram. The same lambda calculus combination structure that you use for functions works for other kinds of things too. All that it is required is consistent naming scheme and says: let Rb1 be a resistor, etc. Wiring diagram - description of how these things are put together. That is a really nice thing. Remember, underneath this is lambda calculus, nothing more. There is no more magic in there. It is just naming. As one very smart mystic ones said: if you have name of the spirit, you have power over it. So then, once you do that, then you can for example make test for this that looks like that. And I can tell it what kinds of parameters I want. Notice that I am allowed to use ... all my values are also allowed to have units. That is another generic extension. It does not matter. It is just simple - once you have generic operators you can have units and it works just great. I can make various other assumptions ... And I can ask what is the potential of the base of the amplifier in the bias region (shown on this slide; (value (the potential b amp bias)) expression). And it does some deduction and comes up with the answer. The really interesting thing is here. I can ask why?: Mr. Computer why do you believe that the potential of the amplifier in the bias region is what it is. And the outcome is a remarkable QED that has nothing to do with proving. It is just to be arrogant. What is telling is ... because remember all those things are lies anyway. The model for the transistor I told you is a lie. All of physics and EE is based on useful lies which are appropriate approximations for particular jobs. When Galileo invented modern physics, for all practical purposes what he discovered is the value of a lie. Aristotle understood when you roll the ball on the floor it always stopped. Galileo said forget about the friction, lets figure it out, and than put the friction back in. That was a required lie to understand what was really going on. That changed the world. That was the big breakthrough of Galileo. So, this QED does not mean anything.  

What really matters is that I am chasing the provenance of every value. As the values move around in this thing ... and what is really going on here is I am doing what the expert did. I am seeing something very obvious. What I am seeing is that if I know the value here then I know the value here becaue of local phenomena. I am propagating information around. Information is propagating around. We call this propagators. That was a big idea in 1975. However it does have some power. Imagine that propagators are independent little statless machines that are connected to the number of these little things called cells (not biological). Information moves around and it is allowed to go in any direction. So there are the stateful cells that have state and there are these propagator machines. Guess what, this is very nice for very parallel machine. But even better, Alexey Radul, my graduate student, two years ago got his doctor's degree, made a breakthrough here. The great breakthrough was that we do not actually put values in these cells, we put information about a value. I will get to that a little later. What it means is that what the cell contains is monotonically increasing information about a value. It gets better and better information and as a consequence things like synchronization problems go away, for a parallel machine. This becomes a very parallel mechanism for building a very large complex machine in a simple way. But it is electrical engineer's point of view, it is not a computer science point of view. It is not dataflow. It is closely related, but it is not. It is a very different kind of thing because we are worried about information that is being accumulated in these cells.

But let me go a little further. What are the other problems? I hate computer languages now, almost all of them, including those I invented. And part of the reason is becaue they are full of expressions. It could be worse, it could be statements. But they are expressions. And what I mean by that is ... expression is a tree. You got a bunch of values and they only go up the tree and they are collected in one output. That is what we want to think about. And you do not have the names for the intermediate parts, as in the circuit diagram of the electrical circuit. Those are useful places to put names on. Sometimes they are pain in the butt. You have to be careful not to get overwhelmed with names. One difference in this (here is it's usage) case, I am going to change it so in propagators all connections are explicit, they have names. That is a computation for the square root by Heron's method. It is a wiring diagram. I am looking at something that looks like an assembly language. I do not have a good language for this yet, but I am using lambda glue to put it together. Lambda glue is excellent for getting going before we understand what the real language is. Each of these things are autonomous machines running continuously. If I can burn processor I can do that. Suppose I have as much processor as I have memory. Suppose that for every memory module I have big processor associated with it. That will happen. Because there is no other way to make future happen. You can also make it iterative. Wiring diagram view of the world. What we are seeing is a wiring diagram for the machine. Machine is potentially infinite, becaues it has the copy of itself inside of itself. This is the way to think about things which is quite different. These things might have multiple outputs. For example, when you divide integers, you get a quotient and the remainder. Why would I get one or the other? Why not even think about that as a normal way to think? Most processes have that property. What if you have things which have many things going in and many go out? The way square-iter knows whether or not to expand itself, that is the implementation question. This is the question between applicative order and normal order in lamba calculus. Am I choosing to expand something when all inputs are there or only one input or you want the ouput, or anything like that ... that is a different question. I am not going to worry about that right now. We can make machines do it anyway we like and we could have policies that could be allowed to both. Remember, a real engineer does not want just a religion on how to solve the problem, like object-oriented or functional or imperative or logic programming. This piece of the problem wants to be functional program, this piece wants to be imperative, etc. and they all want to work together usefully. And because the way the problem is structured, whatever it is. I do not want to think there is any correct answer to any of those questions. It would be awful bad writing device driver in a functional language. It would be bad to write symbolic manipulator with the thing with complicated syntax. It is bad to write numerical program in anything but Fortran. That is a joke. Fortran is a wonderful language in some ways. It is ancestor of them all, besides Lisp. Fortran and Lisp are 2 oldest languages that are still in use. I think they are the keys to everything else, current version of langages.

Why I am pushing this idea?

What can you do with these propagators? I am only pushing this idea, not because I think it is the right answer. I am trying to twist us, so that we can say, this is a different way to think. We have to think 52 different ways to fix this problem. I do not have the answer to get out of this, I do not know how to make a machine that can build a person out of a cell. But I think that the problem is that we are stuck for too long diddling with our details, worrying about our type system, and we should be worrying about how to get flexible machines and flexible programming.

One of the things I can do with propagators is I can make constraint propagators out of directional ones. I can make a thing that is an adder. It does not matter in which direction information flows. I can build multidirectional propagator out of 3 directional propagators. Multiplication here is a lie. It is more complicated than that because multiplication by zero has special properties. Multiplication by zero depends on the other argument.

Once I can do that I can go back to my electricity stuff. A resistor really is a two-terminal device with a constraint ... (follows description of resistor).

But the crucial thing here is what I said before, what Mr. Radul invented 2 or 3 years ago, for his doctoral thesis, the idea that cells merge information monotonically. I am going to tell you the story about if you got a barometer, stopwatch and ruler, how to measure the height of the building. There are lots of ways of doing that. So, here is one thing. I can have similar triangles. Example follows ... I can go to the top of the building, drop barometer off and see on the stopwatch how long does it take to fall. I can get a building's height which is slightly different because measurements are different. But I can do something better. I can combine these measurements. The best thing is that that information propagates back to the measurements. If the guy made a slight mistake we can find it. Because each of these measurements enforces the others. That is always true in real science. Each of these values is improved from the values we had before, the ones we measured. Now we can bribe superintendent by giving him a stopwatch and we expect the honest answer out of this. So we get value which is not an interval. He claims that building height is 45. All measurements are improved.

I can do better than that because in the same way you invented these hairy monads to be lambda calculus plumbing, which is what they are. It is wonderful stuff by the way, but they are very hairy for people how haven't understood that very carefully and thought about it for a long time. In the same way you want to be able to carry extra information with something, and what a monad allows you to do is to pipe around the value you wish to carry also ... it is hidden plumbing. It is a lambda calculus trick. Most Lisp type programmers knew about it long ago but did not have the name for it. It is very useful to realize that it is connected to category theory and things like that. Here we are tracking a provenance. I am going to attach other information besides things like units and dimensions. I want to carry with it where information came from so that I can have arguments where does it come from. You saw that in the electrical circuit case. That is very helpful in debugging complicated problem. I am going make my values be what I call contingent values. So this value is a contingent value which is based on the premise called "shadow". It could be several premises, and this is why there is list there. I am assuming that it is a symbol right now. Premises do not have interesting structure other than they can tell whether two of them are the same. These are all "shadow" premises and I get answers which are contingent on the shadow. I can do the same thing with the fall time. But this one I am going to make dependent upon premise I call "time". Building height is contingent now on both "time" and "shadow". Super gives me an authoritative result. Then I can look at the building height. Barometer's height is now dependent on 3 things, the superintendent measurement, time estimate and the shadow estimate. But the building shadow estimate is still contingent on the shadow and the fall time has been improved. Information is monotonically increasing in each of these cells. Monotonic increase is crucial. One way of doing that kind of thing is with things like intervals, but imagine you have unification algorithms for example. That is a way of combining information. And there are lots of things like that. Suppose I have Fourier transform of the signal approximate and I have time domain approximate of the signal both with noise. You can combine those and get better estimates. This gives us even more power. Because I am going to assume that information is usually inconsistent. Almost all information in my head is inconsistent. I believe lots of things that are false. We all do. But the important thing is that I do not fall apart every time I think as some robot in some sci-fi story. Why? It is because if I went in one of these inconsistencies I giggle. It is because I can keep apart locally consistent subsets of my thinking. With those I can make deductions that I know where they come from, I hope, and are dependent only on things that I specify.

I am now going to make TMSes. They were invented again originally partly by Stallman and me, but they were given a name by Jon Doyle who was one of my graduate students who wrote a master thesis about it and David McAllester did some work on them, made another version of them, and there is a whole book by two of my former students Johan de Kleer and Ken Forbus which is called "Building Problem Solvers" which is pretty much about truth maintenance systems. It is a collection of facts with appropriate dependencies such that it is possible to say what is the best estimate you can give me of the consequence of these facts be it logical or anything else. In every cell I am going to put TMS which is a thing that is going to maintain best estimates of what is going on with the ability to back out. I am not going to go through this, this and this but if you see what is inside the building height at the end, there are 3 possible contingencies depending on different dependencies and this can grow quite large but does not grow very large because anything which is subsumed by some other deduction goes away. And eventually we can change our point of view. I can change my world view. I want to kick out the time (as shown here). Suppose I want to kick out time measurement because it was not so good. Well, I can get a value for building height which is dependent only on the shadow measurement. I can bring back time measurement and kick out the shadow measurement, etc. But better than that, I can get contradictions. Maybe somebody made a mistake. Maybe there is a lie. Maybe superintendent is a liar. Or maybe the ruler was not held quite right when I measured the height. Suppose I have corpus of all the medical records in the world from all the journals. Do you know that most of the articles are wrong? There is a good reason for that. The reason is that people do experiments with p less than .05. That is their measure whether something is a good experiment or not. The chance of being wrong is 1 in 20. Now, since they do far more than 20 experiments, they only publish the once that come positive. You make a deduction from that. There are some great papers about why most medical research is wrong. We still live better than we lived 50 years ago. Here I am going to look at contradiction. So I can maintain inconsistent world views by having locally consistent worldviews.

Even better I can also make machine automatically find for me sub worldviews that are consistent. That is called dependency-directed backtracking. Unlike chronological backtracking which you would get by using the sequence monad and a failure mode, I can do much more efficient backtracking because I know the provenance of every fact. So I can say, if I have a contradiction over here and I have another contradiction, the intersection is this guy, it is highly probable that he is the bad guy. But even so, if I just find something to rule out, I can rule out the whole subtree without undoing anything and redoing computations which were already correct but they were chronologically later. Remember this because the parallel machine in principle (I am simulating them now of course; I have little time-sharing system) ... Parallel machine where all these things are independently looking at information and carrying them from one place to the next.

Here is the example of the famous story. It is very efficient. If I did that with the chronological process this could be something like 3000 backtracks to investigate this space. Not that search is a good thing. Search is always exponential and always eats you if you let it. This is not saying that we should be doing search, it is just showing that I can automatically find consistent subsystems of a complex system.

Let me go back to the original thing I was talking about at the beginning. When I was an undergraduate at MIT, I used to walk by Jerome Lettvin's lab. He was famous neurophysiologist, who first measured potentials of neurons. He made great amplifiers. He was professor of EE, Biology and Humanities at MIT. I used to go by the door of his lab which said "Laboratory of Experimental Epistemology". I was employed as one of Minsky's hackers, and I would go over there and pass by Lettvin's lab and I dropped in and listened him talking about wonderful things about neurophysiology and things like that. One thing he would explain quite often is that almost all the cells in central nervous system are very thiny. It is very hard to see them, and it is very hard to measure potentials from them. He had this idea that maybe these neurons are not explicitly directional, that they are actually things like these constraints, that feel a bunch of other things and adjust the sensitivity of other neurons they touch. This idea might be false, I am not trying to suggest that it is true or false. Getting back to the Kanizsa's triangle illusion, is it possible that almost all the computation that we see that is really fast is filling in details. What was going on in that electric circuit example I showed you? It is filling in details. Diagram is a memory, and I am writing in specific, well defined places the answers that I can deduce from local information. Sometimes I have to invent the variable and propagate it around and get an equation, but I hope not very often. Most of the time if I pick my models correctly, I am choosing to make a problem simple enough so I can do this kind of propagation which fills in details. What is happening here is that this guy is giving me evidence that there is something this black circle, etc. Maybe that is done by some magic process that looks something like that. I do not know this is true, I do not claim this is true. It is different way to think. Maybe it is worth investigating. It might be completely nonsense. On the other hand, we have to throw away our current ways of thinking if we ever expect to solve these problems.

I am claiming the problem facing us as computer engineers is not correctness, it is evolvability. A trivial example of making things more evolable is extensible generic operations because they allows to extend functions without modification but they make proofs very difficult. A more radical proposal is maybe there are freedoms we can unlock by throwing away our idea of architecture (how machines have to work). Here is one particular example which is concurrent and parallel in essential way. It is redundant and degenerate meaning it is possible to compute same thing many times and put them into same place and it does not cause trouble. If one thing dies, you can compute the answer the other way. Degenerate means, I can compute it different way, same way I can compute in my physics class a motion of the planet by two methods, I get the same answer, but they reveal different phenomena. Degneneracy means many ways of doing something. If I cannot eat sugar because I am screwed up in some way, then I can get my calories out of proteins. Every biological system has many ways of doing things. I can maintain provenance. I can know where data comes from and follow it around and where data combines I can combine provenance. Sometimes I have to subtract as in conditional proof. You say, assume A to conclude B, therefore you can conclude A implies B without the assumption of A. There are certain kinds of rules I have to delete ... but you have to know that. Means of combining provenance have to be taken care of. And I can tell you about dependency-directed backtracking which is useful way of controlling non-deterministic search sometimes. These are examples of things I am just pushing forward as being possible way to break out and I want to hear every way that can and that is all I have to say today.

Some related links:

Quora question
Propagator-Based Computing, A Programming Foundation for Decentralized Systems
The Art of the Propagator
Propagation Networks: A Flexible and Expressive Substrate for Computation
Propagator System Documentation
Propagator System code and documentation tarball
A conversation with Sussman on AI and asynchronous programming
Propagating, artfully
Talk notes: We Really Don't Know How to Compute! Gerald Sussman 2011
Towards Context-Aware Propagators
My comments