Applying Theory to Practice — and Practice to Theory

Aug 28, 2020 18:22 · 10655 words · 51 minute read would take constant c1 people

JEFF WELSER: Hi, team, and welcome. I’m Jeff Welser, the vice president of Exploratory Science here at IBM Research, and I’m really happy today to start what’s going to be a series of periodic talks we’re going to be doing over the next several months. Hopefully, many of you got to join our first Distinguished Speaker Series with Yoshua Bengio a few weeks ago. And, of course, we’ll continue bringing in distinguished speakers from outside, but as IBM researchers, we actually have the pleasure each day of working and collaborating with an amazing array of the world’s most brilliant minds right here in the labs. Some of our colleagues have been recognized with very high honors, by the external scientific community for making foundational advances, which have transformed the technical area or an entire industry or even for starting an entirely new field of research. In some cases we may work with these individuals for years without fully coming to appreciate the contributions that they’ve actually made to science and to IBM.

00:52 - So these stories are an inspiring example internally even of the imminent careers and research and stories which we can all learn from. Therefore, on occasion we’ll be inviting distinguished members of our community to tell their story in what we’re calling an IBM Research Special Seminar, describing their notable technical accomplishments, reflecting on their career path, and sharing the inspiration that made these possible. So today I’m excited to welcome Ron Fagin, IBM Fellow at the Almaden Research Center. Ron is recognized in the computer science community as a founder of relational database theory and a creator of the field of finite model theory. He has also authored similar work in the field of information integration and aggregation and of reasoning about knowledge.

01:33 - His work advanced both the theory and practice of modern computing systems, especially data management systems. Ron’s key inventions used in IBM products include extendable hashing, differential data backup, and critical tools for database design. Ron’s earned a long list of distinctions through his 47-year career at IBM Research. He is an ACM Fellow, an IEEE Life Fellow, and a fellow of the American Association for the Advancement of Science. The work Ron has published with his colleagues has been awarded the 2014 Gödel Prize and the 2020 Alonzo Church Award, some of the world’s highest honors for research in theoretical computer science and logic, and he’s a member of three prestigious national academies: the National Academy of Engineering, the American Academy of Arts and Sciences, and, most recently, the National Academy of Sciences.

02:20 - So that’s a three-fer for the National Academies forum. Ron is also known among his coworkers, anyone in Almaden in particular, for his unfailing cheer and kindness. He’s always a favorite with the interns who are in town. When we used to have all hands in the auditorium, he had a special seat right in the front, where he could always ask the very first question after every talk. He’s always open to answering questions as well.

02:42 - He’s known as well for his remarkably quick wit and his remarkably slow driving speed, as anyone can attest with who’s ever followed him up the Almaden Hill. But today Ron will share some perspectives on applying theory to practice and practice to theory gained through the course of his notable career. In this talk, Ron will describe two case studies from his career in which the mutual benefit, close collaboration between theoreticians and practitioners combined to propel significant scientific and technological advances. I’d also like to introduce Alex Gray, the VP of Foundations at AI at the IBM T.J. Watson Research Lab. He’ll be my cohost. Alex will be moderating the question and answer period at the end of Ron’s seminar.

03:23 - Please use the Q&A panel on your console to submit any questions you have throughout today’s session, and then we’ll get to them at the end. So with that and no more further ado, Ron, welcome, and thanks for sharing your experiences and inspiring story with us today. >> RON FAGIN: Great. Well, thank you so much, Jeff. Appreciate it. I was not ready for the slow driving part, but that’s interesting to hear anyway. So hello, everybody. So welcome to my talk. So – now the purpose of today’s talk is to – a purpose is to encourage collaboration between theoreticians and system builders. I call them practitioners, but Laura Haas objected. She told me call them system builders. It sounds better. So system builders it is.

04:03 - And I’m going to do this, as Jeff mentioned, of two case studies. One of these case studies was initiated by the theoreticians and the other by the practitioners. Now, there’s several messages here. For theory people, one message is how to apply theory to practice and why this leads to better theory, and for practitioners or system builders, the value of theory and the value of involving theoreticians in your work. So in the first case study started way back in 1996, Garlic, and Laura Haas came up to me. Now, Laura Haas at the time was just a first-level manager, but later on she became an IBM Fellow.

04:47 - She became the director of computer science. But she knocked on my door one day way back in 1996 and said, “Okay, Mr. Database Theoretician, we’ve got a problem with Garlic, our multimedia database system.” Now, what was Laura problem? Well, here’s what it was. So Garlic is a middleware system. It’s on top of other systems. So it’s on top of things like Db2, which is a relational database system on top of QBIC, which is queried by image content, where you can query on the basis of color, shape, or texture on top of other stuff too.

05:27 - So now the problem Laura had was that the answers were not compatible she was getting. The answers in Db2 were set -- so we’ll see an example in a second – whereas the answers in QBIC are sorted lists, sort of like what you get in Google, meaning you get a list of answers. You don’t get a set. So now – so let’s look at an example. So let’s say we have the CD database, and you’re searching for artist equals the Beatles, say Db2. So you get – Db2 won’t have this fancy thing appearing, but you would get a set of albums. Now, this is from MusicBrainz, which has 12 million CDs in its collection.

06:10 - Now, so let’s say we have the query album color equals red, well, and you get a sorted list, and here’s what you get back from MusicBrainz. The reddest thing first, I guess counting red pixels with a redness score there. And so now how would you make sense of a query like artist equals Beatles and album color equals red? Well, you’ll say, “Come on, Ron, big deal. We’ll just have a list of the albums by the Beatles sorted by how red they are. What could be easier than that?” Okay, smart person, what about artist equals Beatles or album color equals red? Not at all clear what you do, is it? And how about let’s say we’re doing this multimedia query. Color is red and shape is round.

06:59 - How would you make sense of that? How would you give an answer to that? So that’s what I dealt with. So what was my solution? Well, first of all, it’s often the case that when a practitioner asks a theoretician a problem, they leave out something important. And what Laura left out was that these weren’t just sorted lists, they were scored lists. You saw with the redness thing, there was a score there. I thought ah-hah, scores. And so I thought, gee, now we’re in there. We can match these things up. Sets are scored lists also. Zero if it’s not in, 1 if it’s in.

07:32 - So they’re both scored lists, so I can combine these scored lists. Wonderful. And that reminded me of real-valued logic. Some flavors of that are called fuzzy logic, where things have scores, not necessarily 0, 1, but in between 0 and 1, and you deal with that. Now, in real-valued logic, conjunction is often taken to be the min. This is what was the original motivation in doing fuzzy logic by Zadeh. He took conjunction to be the min and disjunction to be the max. But people use other rules and other real-valued logics. So then – so I said, “Great.” I went back to Laura, I said, “Laura, good news. I’ve got a solution for you. Use real-valued logic.” I was happy. I thought I was done. And Laura said, “Well, hmm,” she says, “You know, I like your solution,” she says, “but, you know, we need a practical algorithm. We can’t just look at every object in the database, get its, say, redness score and roundness score, if that’s what you’re looking for. It scores in every attribute.

08:36 - And use real-valued logic to get the answer. We just can’t afford that, Ron. We need an efficient algorithm.” So I thought, hmm. So I scratched my head, came back to her a couple days later and said, “Laura, good news. I’ve got an efficient algorithm. Instead of a linear time n, where n’s the number of items in the database, where you have to look at everything, it’s the square root of n. So you can use square root of n accesses and get the top k.” So that’s great. I was very happy. But Laura said – and I’ll never forget her answer – she said, “Good,” she said, “It’s certainly better than linear,” she said, “but you know, we database people are spoiled.

09:12 - We’re used to log n and things like B-trees.” So she said to me, and I really -- I’ll never forget it – “Be smarter. Go get me a log n algorithm.” So I thought, hmm. So I scratched my head. Came back to her a couple days later and said, “Guess what, Laura, I can prove you can’t do any better than square root of n. That’s it. No better algorithm.” She said, “Fine. Okay. We’ll take it.” And they implemented it in Garlic. Now, the – let’s just see the difference between n and square root of n. Time for the accesses. Let’s go back to MusicBrainz with this 12 million CDs.

09:45 - So – now, if you assumed a thousand accesses per second, n – you know, searching every item of the n items in the database would take in – in the naive algorithm, it would take you about three hours. Square root of n, about three seconds. So it’s a huge win to do square root of n versus n. So generalizing the algorithm, well, you know, we theoreticians like to generalize things, so the algorithm would work not just for min or anything like that, but any monotone scoring function, one where if you increase the scores, you can’t decrease the overall answer. So influence. Well, first of all, the algorithm was implemented in Garlic, and it’s influenced a number of IBM products, including the ones on this list. We have Watson, Bundled Search system, InfoSphere Federation Server, WebSphere Commerce, and others.

10:42 - So it influenced a number of IBM products. And the paper that influenced the algorithm, which, by the way, is now called Fagin’s Algorithm, has about 2,000 citations, according to Microsoft Academic. By the way, I used Microsoft Academic instead of Google Scholar because it gives more – gives a bigger number. They search more stuff. So, you know, why not give the bigger number. Okay. Okay. I see a comment on my screen. I can’t read it, but okay. So now – so now something happened, though, in 2001.

11:21 - So Amnon Lotem, Moni Naor, and I came up with a new algorithm, which we called the threshold algorithm. And just to – instead of just telling you the algorithm, I’ll tell you what the problem is, just to be more concrete. So there – there are m attributes – let’s say, redness and roundness – we’ll take m to be 2 in our example, and every object has a score for each of these attributes. It’s got a redness score and a roundness score, for example, in our example. And now – so we have m sorted lists, like a redness list and a roundness list, and now our goal is to find the top k objects according to some monotone scoring function while minimizing database accesses. That’s our goal. That’s the problem. So here’s an example.

12:08 - Here is a table with redness and roundness. So object 177 is the very reddest object. It’s got a score of.993, as you see in the upper left. In the bottom right, you see it’s not so round. It’s got a score of.406. So it’s a little bit round. In the upper right corner, you see object 235 with a roundness of.999. That’s probably a circle. But in the bottom left, you see it’s not so – not so red. There’s its redness score. So – so now – so now let’s talk about scoring function. So let f be a scoring function.

12:41 - So, as I said, the one commonly used – like Zadeh used in fuzzy logic, was min. There’s many others you can use, one of which is average. It turns out that in real life, they’ve done psychological studies on people, and when you use average and give the answer with the highest average, people like the answer better. So average is a good choice. So let X1 through Xm be the scores of object R, its redness score and its roundness score in that example. And then f of X1 through Xm, you apply your scoring function. F is the overall score. We may write that as f of R.

13:16 - So that’s the score of object R in this – under this function f. And so now – now here’s the formal definition of monotone. If function f is monotone if whenever the Xis are lesser to the Yis, then f of the Xis is less than what f of the Yis. So modes of access. Well, QBIC is limited in its access. It has two flavors of access. Sorted access, where you say give me the reddest thing, the next reddest, the third reddest, and so on. That’s sorted access. And also random access. Maybe you just saw some object with a high redness score, and you say, “What is this roundness score?” So you can go ask what’s the roundness. You can go ask for that.

13:55 - And – now, our goal was, I said, to minimize the number of accesses to the database. So we want an algorithm to find the top k objects, take k as ten, for example. Now, the naive algorithm goes and looks at absolutely everything, finds the top k, and gives the answer back, but that’s way too expensive, Laura made very clear to me. So we wanted to do better than that. So here’s the algorithm that we came up with, Amnon Lotem, Moni Naor, and I, the Threshold Algorithm. So how does it work? Well, it does sorted access in parallel to each of the lists.

14:31 - So it does a sorted access to the redness list and to the roundness list. And then – and as soon as you see an object under sorted access in one of the lists, you’d go get a score in the other list. So if you get an object appearing in the redness list high at the list, you’d go get its roundness score, and then you could compute its overall score because you’d have all of its – the values and all of its attributes, so you get its overall score. And then if this is one of the top k objects you’ve seen, great, remember it. This is a candidate for the final top k’s.

15:05 - It’s one of the top k’s so far, and you throw away anything that it replaces. And so you have your current top k lists. And now we need a stopping rule. When do we stop? Well, for each of these objects, we see – for each of these lists, i, let ti be the last object you saw under sorted access. So it’s the last object you saw under sorted access in the redness list and in the roundness list. You’d have those, the scores. And then you defined the threshold t to be – take your scoring function f and apply it to those thresholds. Now, that is not the score of any object, but still it makes mathematical sense.

15:44 - You can apply f to these numbers, and we’ll call that the threshold t. And here’s our stopping rule. As soon as you’ve seen k objects whose overall score is at least t, stop, and output your current top k list. That’s the algorithm. So let’s see why it’s good. Let’s - well, first, let’s see an example. So here we go. So here’s the redness list and the roundness list. Object 17 – we got the top red – the redness object and the roundness object.

16:13 - And now for the top red object should get a score in the roundness list down there, that bottom right. And for the top object you’ve seen so far in the roundness list, you go and get its redness score. And then you compute their overall scores by applying your scoring function f to them, and you get their overall scores. So we’re saying min for simplicity in our example. So object 177 is the min of.993 and.406. So it’s.406, so that’s its score. And this is the threshold so far. You take the scoring function f, which is the min, applied to the last things you saw in sorted access. So it’s the min of these two numbers,.993,.999, which is.999.

16:55 - And you keep on going, and you keep doing that process. And you’ll notice the threshold keeps dropping. Now it’s the min of these two smaller numbers. And you keep going, and now it’s the min of these two smaller numbers. So the threshold keeps on dropping. And you’re waiting – you’re watching carefully to see if your k objects use scores at least that threshold.

17:15 - So now let’s say why is this halting rule correct? Why is a threshold item successfully stopping then and not making a mistake? Well, suppose you have k objects so far whose scores at least t the threshold. What could go wrong? Well, what could go wrong is you have some object R you have not seen yet somewhere deep in one of these lists. You have some object S in your top k, but the score of R is bigger than the score of S, then you’ve screwed up. So let’s show this bad thing cannot happen, you cannot screw up like that. Well, let’s say R has scores X1 through Xm. It’s got a redness score and a roundness score. So now since you haven’t seen object R yet, its score and the Xi is less than the ti. You haven’t seen it yet, so in each of the lists its score is less than that threshold value ti for that list. So now we’re going to apply monotonicity. The score of object R is f of these Xis, which by monotonicity is less than equal to f of the tis. Well, by definition, f of the tis is the threshold T.

18:20 - And now we know that the score of S is at least T because that was in our final stopping list, so we stopped when this object – we had k objects whose score was at least T. So – but now you see there’s a contradiction. Up above we have f of R is bigger than f of S. And down here we have f of R less than or equal to f of S. Contradiction so this bad thing cannot happen. So, indeed, the stopping rule is, indeed, correct. So now we found this wonderful notion in working on the threshold algorithm, something we called instance optimality,a wonderful, magical property of an algorithm. So let me tell you what it is. So let’s say A is a class of algorithms, legal algorithms for a problem. D is a class of legal inputs. I’m thinking of these as databases, so I may call them databases. Now, for each algorithm A in the set of A algorithms and D in the set of inputs or databases, it’s got a cost of applying algorithm A to input D, and some – and this cost may be the number of accesses, it may be time, it may be whatever, but you have some scoring – some cost.

19:24 - And now here’s what it means to the instance optimal. So this algorithm A in bold A is instance optimal over bold A and bold D, the set of possible algorithms and possible databases. If there’s some constant C1 and C2 such that for – if the adversary picks an algorithm A prime and the algorithm – and the adversary picks a database D, that’s his choice, the cost of running your algorithm on the adversary’s database is less than equals C1 times the cost of the adversary running his algorithm on his database plus C2. I mean that’s pretty heavy-duty, because the adversary can fine-tune his algorithm to his database. He can fine-tune his database to his algorithm.

20:09 - A pretty brutal condition to have to – have to be able to do that. And what more furthest number C1 as the optimality ratio, this low duplicative factor. So now I’m claiming the threshold algorithm in this instance optimal. So now you might think that follows because you might say, well, the next objects you see might have score equal to both threshold values, so this is why you can’t do any better, but, unfortunately, that doesn’t quite go through. Life is a little more delicate, so we need another notion that I’ll tell you about that we’re going to forbid, something called wild guesses.

20:45 - So what a wild guess is is you ask for a score of an object you’ve never seen before under random access. So not something you’ve seen under sorted access, but for some reason just out of thin air, you say, you know, I’m – out of the blue, I’ve never seen object 393, but tell me its redness score. That’s a wild guess. You have no good reason to do it. It’s not occurred in any of the lists. That’s a wild guess. Now, neither Fagin’s Algorithm nor the Threshold Algorithm make these wild guesses. And, in fact, your system may not even allow wild guesses. So now here’s the theorem. For every monotone f, let A be a class of algorithms that are legal.

21:24 - They always give the right answer for every input, and it does not make any wild guesses. Then – and D is the class of all inputs or databases. Then the Threshold Algorithm is instance optimal over these. So we actually have theorem proving it is instance optimal under this assumption. Right? And the only tricky assumption is no wild guesses. Now – oh, I want to – oops. Let me back up a second. Sorry. So – sorry about that. Okay.

21:54 - So now the late David Johnson came to my office one day, and I very proudly told him this theorem about instance optimality, and he said, “You know, Ron,” he said, “I would have a sterner notion of instance optimality, not only the stuff you say, but it’s got to have the best possible optimality ratio, that constant C1.” I thought, oh, man, that’s an interesting hard problem. So I went back to my office or the next day, and I found the optimality ratio. It’s this complicated thing. M is the number of lists, like two of its redness and roundness. Cr is the cost of random access. Cs is the cost of the sorted access. This is the optimality ratio for the Threshold Algorithm, and I proved it’s best possible.

22:36 - So it’s among all instance optimal algorithms. So it’s instance optimal in that strong David Johnson sense. So then, great, I went to Laura. I said, “Laura, new algorithm, Threshold Algorithm, even better than Fagin’s Algorithm because it’s optimal in a stronger sense. It’s not a worst-case sense, the usual method, or average case, but in every case, it’s optimal every instance.” And Laura said, “Ron, we can’t put one of your algorithms all over the place. You told me it was optimal.

23:06 - What’s going on?” I said, “Well, Laura, there’s optimal and then there’s optimal.” So that was the way it was. Influence. Well, so we submitted this paper to PODS 2001. It’s the top database theory conference. And I was really worried. You saw the Threshold Algorithm. It takes about 11 lines to write it down. And I thought, man, oh, man, the program committee’s going to look at this and say, “Are you kidding me? We’re the top database theory conference, and we’re going to accept a paper with an 11-line algorithm? Forget it. Reject.” So I thought how am I going to get around this? So I thought, ah-hah. So what I did is in both the abstract and the introduction, I called it a remarkably simple algorithm, turning a bug into a feature.

23:49 - That’s something good to remember if you’re trying to get a paper accepted at a conference. Turn the bug into a feature. A remarkably simple algorithm. And the paper and patent was accepted, but it won the Best Paper Award, so it really, really worked. The paper was very influential. It got 3800 citations. It won the Test of Time Award, you know, ten years after the conference. And I won the IEEE Technical Achievement Award for Fagin’s Algorithm and Threshold Algorithm together. And then the biggie, the Gödel Prize. This is the highest prize for a paper in theoretical computer science, and this paper won the Gödel Prize.

24:26 - You can’t do any better than that for a paper in theoretical computer science. And I would mention that there’s something called Gems of PODS. They looked back over the last 20 or 30 years and said one of the best papers that have ever appeared in PODS. Let’s have it in 2016, the very first time they did it, and this was one of the two papers selected and that they told me why it was the top rated among those two for Gems of PODS. So it’s got a gazillion applications. I’m certainly not going to read this to you, but one reason why it’s been so influential and one reason it won the Gödel Prize is people have – use it all over the place in many, many, many different applications. Measures of success.

25:02 - Well, making our products better, that’s certainly an ultimate measure of success for a practitioner. Creating a new subfield, that is an ultimate measure of success for a theoretician. And so we did both of these. And if I would make an argument to a theoretician, you will do better theory if you just talk to those practitioners. I cannot make a stronger argument than this. Here’s a paper that arose from a very practical real-world problem, and, by golly, it won the Gödel Prize.

25:33 - What more could you ask? I mean that’s the strongest argument I can get to a theoretician, talk to those messy old practitioners. Second case study, Clio. So, now, Clio deals with data exchange. I’ll explain in a moment what that means, where you convert data from one format to another. Now, Laura Haas left Garlic and – after it was successful and implemented and started Clio. And I followed her. She was never my manager. I was in the theory group, but I thought, Laura – you know, I had a wonderful success with Laura before. I will start going to Laura’s meetings. And I actually sat in Clio meetings for a full year. And then something happened.

26:15 - So the four of us, Phokion Kolaitis, Renee Miller, Lucian Popa, and I said, look, let’s – how do we do at the end of the day change from scratch if we didn’t have any, you know, thing that all this work has gone into Clio, just start it from scratch, what we do. By the way, Renee Miller is very sensitive about her picture. She will not allow her picture to be public, so that’s a picture of Renee Miller age 3. So she allowed me to use that. So she was not a child prodigy. That was – that’s the picture she allowed me to use. So data exchange. So you have a source database and a target, and you’re trying to convert data from one format to another. I’ll give you an example in a second. So – well, here’s the example.

26:58 - So let’s say you’re merging two different companies. One of them has an employee-manager relationship, and the other has an employee department-department manager relationship, and we want to merge these two, so we want to somehow combine this, so here’s where we’re going to use data exchange. What do we do? So, now, a – there’s something called tuple-generating dependencies or TGDs, which has been used in the past for a number of contexts, including normalization. And this is what we’re going to use to describe it. As I said, it was originally used for other instances.

27:31 - So here is the tuple-generating dependency or TGD when used in this case. We say if tuple EM is the employee-manager relationship, so employee E as manager M. And there’s some Department D, where employee E is in Department D, and Department D is managed by Manager M. So now here’s an example. We had this database here. Gödel is reporting to Hilbert – Turing is reporting to Hilbert, and Hilbert’s reporting to Gauss. Now, I’m going to give you three possible databases that you can convert this to, and you’re going to think about which one you want.

28:06 - Now, if I were doing this in person, which I might give this talk in person, I’d make people vote. I’d say, “Here’s the three answers. How do you vote for number one, number two, and number three?” And people don’t do very well on that, but here you have to privately vote yourself and see how you did. Okay. So here’s solution number one. So we named the department after the manager. We say Gödel is in the Hilbert department, which is managed by Hilbert. Turing is in the Hilbert department managed by Hilbert.

28:29 - And Hilbert’s in the Gauss department managed by Gauss. So that’s solution number one. Solution number two, you create these dummy values. D1, D2, and D3, and you say, okay. Gödel and Turing are in Department D1, which is managed by Hilbert, and Hilbert’s in Department D2 managed by Gauss. So that’s solution number two. Since there was three, you say, well, wait a second, just because Hilbert is a manager of both Gödel and Turing, they may be in different departments. Maybe this guy’s managing several departments. So we have D1, D2, and D3. Gödel’s in D1. Turing’s in D2. Hilbert’s in D3.

29:09 - And D1 and D2 are managed by Hilbert, and D3 is managed by Gauss. So decide for yourself what your solution did. Would you think number one, number two, or number three? So we have good reason for which one we choose. We have this notion that we call a universal solution. And the universal solution is something that is as general as possible. It makes as few assumptions as possible. And in this case, solution number three is the universal solution. For example, it’s not assuming that each manager manages only one department. Now, if we had this other restraint that says here each manager manages at most one department, then in that case, we would say solution two was the universal solution. Okay. Now, how do we obtain a universal solution? Well, there’s this well-known method called the chase that’s been used for many, many years in database design and used in normalization theory and other places. And so we’re going to use the chase to get the – create the universal solution.

30:16 - So how does it work? So let’s go back to our TGD, tuple-generating dependency, that says employee and Manager M in the employee-manager relationship that’s in Department D, where employee is in Department D and D is managed by M. So what we do now is we take each tuple in that first relation, like Gödel, Hilbert, and now we create this dummy value D, this so-called labeled null, and say Gauss in in – Gödel’s in Department D, and D is managed by Hilbert. And these are called labeled nulls. So that’s what we do. And then if there happened to be – the EGD is like each manager manages at most one department, you would equate some of these labeled nulls. You would equate both of the labeled nulls dealing with the same manager. So now something else happened. When you have a problem, many problems arise afterwards.

31:10 - So one day Lucian Popa came to my office and said, “Ron, we got another real-world problem with these – with these mappings.” He said, “We use these TGDs to go from Schema 1 to Schema 2, and we have these TGDs going from Schema 2 to Schema 3. How do we describe what you – how you go from 1 to 3 directly?” So my first thought was, well, there’s going to be another TGD. That’s more complicated than -- you know, complicated, but, you know, still some other TGD will describe directly how to go from 1 to 3. Now, let me remark that TGDs are very special cases of formulas of first-order logic.

31:42 - If you know what first-order logic is, you saw it from writing it down, it uses first-order quantifiers for every X and stuff like that. So this is – but here’s what we discovered. So Phokion Kolaitis, Lucian Popa, Wang-Chiew Tan, and I studied these things, and we found the following very surprising thing. When you compose these things, you leave first-order logic. So you use it – and the simple teaching is special case of first-order logic, you can go from Schema 1 to Schema 2, uses very simple TGD from 2 to 3, but going directly from 1 to 3, you need second-order logic, where you quantify over sets and over relations and over functions. That was quite a surprise.

32:22 - And so we invented something we called second-order TGDs, which is a special case of second-order logic. And we showed that this exactly does the job, that you compose schema mappings with TGDs, you’ll get a second-order TGD and given -- and a second-order TGD, it’s resulted composing first-order TGDs, normal TGDs. And so – so now we – and we gave an algorithm for doing composition for -- computing these. So measures of success. Well, our – first of all, our stuff is used in these various IBM products that I’ve listed here, Db2 Control Center, Rational Data Architect, and Control Manager. By “used,” I mean the following. So, first of all, they use universal solutions. They decided that’s a good answer.

33:17 - The result we’re going to take will be a universal solution. People weren’t doing that before. Even at Almaden, when I worked on this originally, they were doing kind of crazy-ass, crazy solutions. Sorry. And – and – but they weren’t universal. So the – so they not only used our universal solutions, but they used our algorithms. They used our algorithms for producing universal solution, and they used our algorithm for composition. So it was really heavily used, our stuff. Now, our paper, our initial paper, won the Test of Time Award for the International Conference of Database Theory.

33:59 - That’s like the second-best database theory conference. That’s the one we submitted that paper to. And now something funny happened. I got a letter one day from the editor-in-chief of the journal, “Theoretical Computer Science,” which is where the journal version appeared. He said, “Congratulations, your paper was the second-most highly cited paper of the decade of papers that appeared in ‘Theoretical Computer Science.’” So great. But I just – out of curiosity I sent him an email saying, “Okay, what was first?” He said, “Well, it was a survey paper.

34:27 - We felt a little bit bad about giving it the top number one, but we had to. It was in the Journal, so what can we do.” So that was the top-rated paper. Now, our paper on composition won the Test of Time Award for International Conference on Database Theory ten years after it appeared. and then I wrote a follow-up paper with two other people, Marcelo Arenas and Alan Nash, that won the Best Paper Award for another International Conference on Database Theory. This was the follow-up paper on composition that showed some surprising things, like a secondary TGD can be attained by composing only two first-order TGDs. You didn’t need some large number of them. But, anyway, it won the Best Paper Award. Influence. Well, we must create a new subfield.

35:15 - You know, I told you that’s a great measure of success for theoreticians. In fact, every major database conference suddenly had sessions on data exchange. And now here’s something that Jeff mentioned that made us very happy, that we just won this award, the Alonzo Church Award, which is given for outstanding work in logic and computation. They can look at a paper, a body of papers, they look over a 25-year period and see what’s had the most influence. And we won. By the way, I was told that -- by people who serve on these committees that normally people look at things near the borderline.

35:57 - Like if it’s a 25-year period, they get paper that’s about to expire because they want to, you know, give that paper a chance. So ours was way less, like half of that, roughly, and yet we still won, which is really great. So, oh, before I go on to very recent work – I was told to put in this slide here on very recent stuff, so I did that. But first I want to say one more comment, and that’s the following. So one important issue is to figure out what the real problem is that theoreticians have to do, practitioners have to do.

36:33 - What is the real problem? So I want to read to you a quote from my academic grandfather, Albert Einstein. Pretty impressive academic grandfather. And in the Q&A I’m hoping someone will ask me, “Ron, how was Einstein your academic grandfather?” And I will answer that question. But let me read to you Einstein’s quote that shows why figuring out what the problem is is important. Einstein said, “If I had an hour to solve a problem and my life depended on the answer, I would spend the first 55 minutes figuring out the proper questions to ask, for if I knew the proper questions, I could solve the problem in less than five minutes.” That’s what Einstein said. What more do you want? This is a quote from the great man, Albert Einstein, himself. Okay. So now recent work.

37:26 - So I was told to add to this one more slide about what I’m doing very recently, and I’m very happy to do that. Very recent work. Very, very recent, like in the last few months. So I’m working with Alex Gray’s department on neuro-symbolic computation. And one of the important issues that arise there is dealing with real-valued logics because you have uncertainties. So you have [indiscernible] neurons, and they had scores between 0 and 1, you know. So it’s like a measure in certainty or probability, these things. And so we wanted a logic for dealing with these, for dealing with this – combining how much does this uncertainty affect that uncertainty. Now, it turned out that what people had done in the past was the following. For all these real-valued logics, what they had done is they had a very weak form of axiomatization. They said – they found a bunch of things that were tautologies in their logic.

38:21 - They were a true under – always true in their logic, no matter what truth values you assigned to these things. And from that they could prove everything else that was a – tautologists proved. Not good enough for us. We wanted really how – this uncertainty applies that uncertainty. And I did that. I found the very, very first example. I gave a sound and complete axiomatization – sound and complete means that everything it proves is correct, and anything that – something is an implication of other things, you can prove it in your logic. So I gave the very first one that shows, for example, this amount of uncertainty here, that amount of uncertainty there, how much uncertainty does it yield here.

39:04 - And, by the way, I ran this by some experts in the field, experts in real-valued logic and fuzzy logic and said, “Has anyone ever done this?” “No, no, no. It’s the very first time anyone’s ever done that.” So I was very, very happy, the very first sound and complete axiomatization that deals with complete – this uncertainty and applies that uncertainty. So that is my recent stuff. So that’s it. That’s my – and we – it will be in an archive paper. We’re still – haven’t quite put the final version in archive, but if anyone wants a copy, I’m thrilled to send you a copy of it.

39:39 - And any questions, of course, I’d be happy to answer, you know, either now or, you know, later on. So that’s it. That’s my talk. Thank you very much. >> ALEX GRAY: Okay. Jeff, did you want to say some closing words or… >> JEFF WELSER: Let’s jump into the Q&A, and then we’ll go from there. >> ALEX GRAY: Okay. All right. Thanks for a great talk, Ron. >> RON FAGIN: Thank you, Alex. >> ALEX GRAY: So, looking at the questions, I received one offline as well. So about your recent work, you said that – >> RON FAGIN: Oh, good.

40:27 - >> ALEX GRAY: – that about the class of logics that you proved this with. >> RON FAGIN: Okay, that’s great. So I dealt with it turned out arbitrary real-valued logics. You know, there’s something about real-valued logics, for example, in Zadeh’s logic and also called Gödel’s logic, you take the “and” by taking a min, and you take the “or” by taking a max. But there’s – other real-valued logics have very different rules. They have more complicated formulas. The “and” is obtained – or the “or” is obtained by adding the scores of the two GUIDs and making – and then clamping it off so it doesn’t get – so it stays between 0 and 1.

41:03 - And there’s many, many real-valued logics out there in the world that people do for various reasons. They have various properties. No one real-valued logic turns out has all the greatest properties you want. Like, for example, tautologists’ propositional logic are never going to meet the tautology of many of these real-valued logics. It turns out there’s a famous theorem called the Bellman-Gertz Theorem, which says that if you want to preserve tautologies among things that have known negation in them, the only choice you can do is min and max. Min for “or” and max for “and.” And so any of these other real-valued logics, it’ll fail on one of these tautologies.

41:42 - For example, the one commonly used logic, A and A is not equivalent to A. That’s pretty bad, but that’s the way it is. You know, they always have some flaw. And the problem in – in the Zadeh logic is A or not A does not have value 1, because you take the min, and so if A has score.5, not A has score.5, A or A has the max of those, which is.5, not 1. So A or not A does not have the value 1. So, you know, there are many, many real values. Actually, one of the things I did in my sound and complete axiomatization is I parameterized it by whatever the rules are of that logic, so it applies to every real-valued logic. So you plug in your real-valued logic.

42:28 - You plug in your rules for what – how you take the “and,” how you take the “or,” how you think it applies, all that stuff, and it gives you the sound and complete axiomatization. So I’m very happy that a very, very general one that applies to any real-valued logic whatsoever. Thank you for the question. >> ALEX GRAY: Great. And I can personally attest, being very close to that work, this – I think this is the future big award winner. It’s very, very fundamental. Another question. So, Ron, sometimes – the question is: Sometimes it is very hard to explain theoretical ideas to practitioners in business units. >> RON FAGIN: That’s true. >> ALEX GRAY: How do you go about that, especially when many times if they don’t get it or if they feel it’s too complex, they won’t adopt it.

43:19 - >> RON FAGIN: That’s a – that’s a great question. In fact, let me just remark, there’s one reason why practitioners don’t like to hang around with theoreticians. They view the theoreticians as pointy-headed people who say these wild things that are very hard to understand. They want nothing to do with them. And theoreticians, by the way, want nothing to do with practitioners. They say, “Yuk.” I mean, “Ooh, they’re doing this gory stuff. They’re not proving theorems.

” Who cares? 43:41 - So it’s a real challenge to be able to get theoreticians and practitioners to talk to each other. And the answer to how you talk to them is – the practitioners is very patiently. You try and explain it in terminology they will understand, what – and presumably it’s a problem of interest to them, so they understand all of the fundamental concept in their area, so you say in a very careful way exactly what you’re doing, exactly how it relates to the notions they understand, and then you tell them – you know, then you work with that. So the answer is, how do you talk to them, clearly and patiently, and you explain everything very, very clearly. >> ALEX GRAY: I’m imagining, Ron, that was a big part of your success over the years is the ability to bridge those two kinds of conversations. >> RON FAGIN: Thank you, Alex. >> ALEX GRAY: Another question.

44:38 - So, Ron, you had an impressive number of citations to your papers. >> RON FAGIN: Thank you. >> ALEX GRAY: Aside from the simplicity and extent of impact of the algorithms you described, stepping back, is there anything you did to specifically drive the external visibility of your work? >> RON FAGIN: Wow – >> A GRAY: That’s a good question for – you know, I think we’d all like the answer to is how – >> RON FAGIN: That’s a great question. >> ALEX GRAY: – how do we follow in your footsteps? >> RON FAGIN: Wow, so what was the secret to it? Well, one secret is do good research, but also try and make sure your colleagues know about it. So, for example, I presented this stuff in major conferences, and that’s an important thing, present at a major conference, people will hear about it. And it certainly doesn’t hurt to win a Best Paper Award or a Test of Time Award. That’s certainly publicized. And then people get to know about it.

45:28 - So that’s an external factor you have no control over but that helps people to know about it. So another thing is give lectures. So I give lectures. Here I’m giving a lecture. So if you didn’t know about my work before on data exchange or any of this stuff, the Garlic work, the Clio work, now – now you know. So give talks, talk to people. And, you know, in those coffee breaks, you know, “What are you doing, Ron?” And, you know, tell them about it and try and convince them it’s interesting. So it’s real salesmanship, so it’s not easy. And I told you how, like, for example, on that one paper my salesmanship was to call it a remarkably simple proof. And you sell these things. So you have to sell it. So nothing is automatic.

46:13 - And there’s some very, very good papers out there that are barely ever cited, they’re barely ever noticed. It’s a real challenge. There’s no very easy answer about how to get people to know about your work. So just talk about it, be open about it, try and put it in good conferences. Win three Best of Time Award, but it’s not easy. That’s a great question. >> ALEX GRAY: And what about the writing of the – this is Michael. What about the writing of the papers? Any tips there? >> RON FAGIN: Oh, yeah. I – my coauthors will all tell you I’m a fanatic in writing. I’m a fanatic about clarity. You know, they will write a sentence, and I will brutally change the wording. I’ll say, “This can be possibly misinterpreted in the following way.” So any of my coauthors can tell you writing a paper with Ron Fagin is an interesting experience because Ron will go over the version over and over, write it, rewrite it until it’s, in his view, brutally clear.

47:12 - So that is something I do when I’m writing a paper. And people tell me sometimes, “Oh, I can tell this is a Ron Fagin paper; it’s so clear.” So it’s really a huge, huge goal of mine. It’s not good enough to prove good results. It’s not good enough to have nice results, but you’ve got to make it clear. You’ve got to make it totally unambiguous, totally understandable. And I make it – it’s a huge amount of time I spend into making my work as clear as possible. >> ALEX GRAY: Great. I have a follow-up to that, but let me get to some other questions. There’s another one. How often does it happen that you observe things from practice, for example, computational experiments, that would inspire you to create and prove some theory? >> RON FAGIN: Well, unfortunately, it’s not that often. I mean in the cases of – the two cases in today’s study, one of them Laura Haas just came to me, “Ron, Mr. Database Theoretician, we’ve got a problem. Help, help.

” And the other case I gave to you today, 48:16 - you know, we – I just was attending Clio meetings. I thought it was interesting. I thought it might lead to something. And then we decided on our own to do it, so – so, you know, it’s – I don’t remember exactly the statement of the question, but – what was the question again, Alex? I mean, I sort of drifted a little. What was it? >> ALEX GRAY: How often does it happen that you observe things from practice – RON FAGIN: Oh, unfortunately, not that often. Plus, I have to tell you, these things very often don’t succeed. I mean, people will come to my office with a question, and I’ll give an answer, and sometimes it’s satisfactory to them and sometimes it’s not. So you can’t be slow. Today I gave you two success stories.

48:54 - There are a lot of failure stories out there, unfortunately, including involving me, where I just either didn’t come up with something that the practitioner could use or, you know, it just – it just didn’t pan out. So it’s certainly not the case, oh, great, you work on it, boom, it happens. It’s a challenge. So it’s unfortunately not a common event. If it were such a common event, I would not be going back and giving some of these stories from the past, although I picked these two in particular because they won major awards, but still it’s so – it’s a challenge, and it, unfortunately, does not happen that often. But this is one reason I give a talk. My talk is to encourage theoreticians and practitioners to talk to each other, to convince the theoreticians, look, if you talk to a practitioner, you’re getting a problem that’s low-hanging grapes. No one’s worked on it before. You’ll write the first paper. You won’t write the 20th paper in the area where you’re subtly improving someone else’s algorithm from – you know, to – by subtly improving their running time. It’s low-hanging fruit. So talk to those practitioners.

50:02 - You might just get a problem that can really lead somewhere. And convince practitioners, look, talk to those theoreticians. I know they’re a little hard to talk to, but there’s some theoretician down the hall from you who might be interested in talking to you about this. And it could really help you. So that’s one of the main reasons I give this talk. I really want to appeal to both theoreticians and practitioners. Talk to each other. You will – you can help each other.

50:28 - The theoretician will produce better theory, and the practitioners will make better products. So it’s a challenge, a real challenge. >> ALEX GRAY: Great. Maybe I’ll do my – well, here’s another question, and then I have some follow-up. So you mentioned the paper that got the first place and that it was a survey paper, but – >> RON FAGIN: Oh, yeah. I remember – >> RON FAGIN: – do you remember what it was? >> RON FAGIN: I’m afraid I don’t. I remember it was something goofy, though.

51:01 - The title was something – I remember I made fun of the paper a little bit because hence – I should look it up. It was some goofy topic that – not some clear-cut, easy thing. So I thought, man, oh man, I got beaten out by a paper on this goofy topic. But it was a survey paper. So survey papers tend to be cited a lot because people say – even though it’s goofy, a lot of people work on goofy stuff, so people cited that paper. Unfortunately, I don’t remember. I – you inspired me, though. I will go look it up and see, but I can’t remember what it was, unfortunately. So… >> ALEX GRAY: Okay.

51:37 - A question: What is your sense about quantum-inspired breakthroughs in this field of data exchange and information extraction? >> RON FAGIN: Quantum in data exchange? Wow, I don’t know about that. So, I mean, I certainly follow quantum to some extent. I don’t do quantum, but I know there’s wonderful algorithms in quantum, but I don’t know of any quantum out in the data exchanges. So if someone has – if someone has it, please send an email and give me a link to that. I’d love to see that. I know nothing whatsoever about using quantum stuff to do data exchange.

52:06 - So if there’s some work out there, I’d love to hear about it. I don’t know anything at all about that. >> ALEX GRAY: Okay. That was a question – that was a question from Edward Chen, so, Edward, maybe – >> RON FAGIN: Edward, send me – send me a link to my email. I read my email. I’m not so good on Slack, but send me to my email a link to that paper. I’m really curious now. And if you’ve read the paper, Edward, I apologize for not knowing about it. >> ALEX GRAY: Did you ever successfully work on a practical problem where IBM wanted to keep the resulting work internally and you could not publish? >> RON FAGIN: Oh, wow. Actually, this never happened.

52:43 - So in all these cases, I mean – let me mention some cases, like in my introduction by Jeff, he mentioned differential backup. Differential backup arose because Norm passed – Glen Day (ph) put his arm around me in the hall and said, “Ron, we got a problem with Tivoli Storage Manager, our backup system, that if you make a tiny change, then you have to back up the entire file. We want to make it simple.” So I – so I formed a little group, myself, Mickey Itye (ph), and Murray Sockmeyer (ph), we did this thing differential backup, which only backs up the change and showed them how to do that. And the practitioners at first were a little bit wary of it. They thought, oh, it’s going to break all our code. We don’t want to use it.

53:30 - I wasn’t sure whether they would even make this public. Then they – I said – well, we had somebody else we were working with who implemented it and, by golly, it fit right into their system very easily. And they said, “Fine. Okay. We will – we will implement it, and we will – it’s now a part of our system. We’ll make it public. It’s a feature.” And by the way, right after they announced differential backup, the sales of Tivoli Storage Manager tripled. People suddenly went, like, wow. That really was a differentiator among all the backup systems. We were the first to get it.

54:08 - I think other people have done it since, but we were the first to do it. And it wasn’t as clear to me whether that would ever see the light of day, but it did. So – but I’ve never had it be the case – among other things, I feel like it’s such a valuable contribution that they’ll want to sell it. Like here they sold it. They said, “Look, we’ve got this wonderful new feature, differential backup,” and tripled the sales within a very short period of time. The number of licenses tripled in a very short period of time.

54:35 - So it’s to their advantage to make it public. So I’ve never had it happen that it’s kept a secret, but, you know, I suppose it could happen to me. That’s never happened to me before. It’s so good. >> ALEX GRAY: All right. Great. >> RON FAGIN: I’m pretty sure that they wanted to publicize it. >> ALEX GRAY: So we have just two more questions before we run out of time. So, Ron, for how long do you allow yourself to work on a given problem? Do you ever decide that something is not worth pursuing anymore? >> RON FAGIN: I have worked on the P versus NP problem for about 50 years.

55:09 - So when they – when the challenges were formed, I thought that would have been awfully nice, so we don’t do it full-time. I mean, that would be not a good use of my time. But when the challenges were formed, I thought – and the challenges were supposed to be very far-out stuff that have high-risk. I thought, okay, so P versus NP is the challenge I introduced, and I had some people working with me on it, and we had fun phone calls every day, so I can’t guarantee at all. We probably will not resolve P versus NP, but, you know, it’s worth a shot.

55:40 - And, furthermore, we may get interesting intermediate results along the way. So I am patient. 50 years working on a problem. I did not give up even though it’s been 50 years working on it, I still am driven by it. I still am actually at this moment in daily talks about that problem. Now, I think I know what your last question is going to be, Alex, but I may be wrong. If not, well, I can anticipate. If not, I’ll ask it. >> ALEX GRAY: Yes. It’s a big surprise, actually. How was Einstein your academic grandfather? >> RON FAGIN: Great. I was waiting for that question.

56:13 - So how was he my academic grandfather? Well, when I was an undergraduate at Dartmouth College, I was a research assistant to John Kemeny, who was then the head of the math department. And when – and he later on became president of Dartmouth college. Kemeny, when he was at Princeton, was the research assistant to Albert Einstein. Now, it’s a little bit of a stretch to use research assistant rather than, like, PhD adviser, but, hey, for Einstein, I’ll cheat. Now, one day, by the way, I was – Kemeny had a bust of Einstein on his desk, and I said to him, “Tell me more about this connection with Einstein.

” I said, “What did 56:47 - you do for Einstein?” And Kemeny’s answer was – this will surprise you – “I did Einstein’s math.” I said, “What? You did Einstein’s math? You mean Einstein couldn’t do his own math? Why was that necessary?” And Kemeny said, “Einstein was a great physicist but just an okay mathematician.” And since that time, I’ve read a number of biographies of Einstein, and, by golly, it’s true. He always needed help with his math, including his wife, who was a mathematician, or one of his wives. So Einstein always needed help with his math, and that’s what Kemeny did.

57:20 - So that is – so I’m very proud to call Einstein my academic grandfather. >> JEFF WELSER: Hey, that’s terrific. That is really terrific, Ron. And I really appreciate you spending the time going through your career and some of the interesting things you’ve done. And thank you very much, Alex, for hosting us in the Q&A. Thanks, everyone, for joining. And our next talk from Ron about a year from now he’ll give us the answer to P equal NP, and that’ll be a very exciting talk as well. >> RON FAGIN: Does not equal NP. >> JEFF WELSER: So thanks for that. >> RON FAGIN: It’ll be does not equal NP. >> JEFF WELSER: Ah, probably, very good. Thanks, everyone. >> RON FAGIN: Thank you. Thanks, everybody. Really appreciate your joining in. And thanks for – Alex and Jeff for helping coordinate.

58:01 - And Will Breed (ph), who coordinated the whole thing. .