Microsoft Research India Podcast

Microsoft Research India
Microsoft Research India Podcast

A technology and research podcast from Microsoft Research India

  1. 05/20/2024

    Evaluating LLMs using novel approaches. With Dr. Sunayana Sitaram

    [Music] Sunayana Sitaram: Our ultimate goal is to build evaluation systems and also other kinds of systems in general where humans and LLMs can work together. We're really trying to get humans to do the evaluation, get LLM's to do the evaluation, use the human data in order to improve the LLM. And then just this continues in a cycle. And the ultimate goal is, send the things to the LLM that it's good at doing and send the rest of the things that the LLM can't do to humans who are like the ultimate authority on the evaluation. Sridhar Vedantham: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham. [Music] Sridhar Vedantham: LLM's are perhaps the hottest topic of discussion in the tech world today. And they're being deployed across domains, geographies, industries and applications. I have an extremely interesting conversation with Sunayana Sitaram, principal researcher at Microsoft Research about LLMs, where they work really well and also challenges that arise when trying to build models with languages that may be under resourced. We also talk about the critical work she and her team are doing in creating state-of-the-art methods to evaluate the performance of LLMs, including those LLMs that are based on Indic languages. Related Microsoft Research India Podcast: More podcasts from MSR IndiaiTunes: Subscribe and listen to new podcasts on iTunesAndroidRSS FeedSpotifyGoogle PodcastsEmail  [Music] Sridhar Vedantham: Sunayana, welcome to the podcast. Sunayana Sitaram: Thank you. Sridhar Vedantham: And I'm very excited to have you here because we get to talk about a subject that seems to be top of mind for everybody right now. Which is obviously LLMs.  And what excites me even more is I think, we're going to be talking about LLMs in a way that's slightly different from what the common discourse is today, right? Sunayana Sitaram: That's right. Sridhar Vedantham: OK. So before we jump into it, why don't you give us a little bit of background about yourself and how you came to be at MSR? Sunayana Sitaram: Sure. So it's been eight years now since I came to MSR. I came here as a postdoc after finishing my PhD at Carnegie Mellon. And so yeah, it's been around 15 years now for me in the field, and it's been super exciting, especially the last few years. Sridhar Vedantham: So, I'm guessing that these eight years have been interesting, otherwise we won't be having this conversation. What areas of research, I mean, have you changed course over the years and how is that progressed? Sunayana Sitaram: Yeah, actually, I've been working pretty much on the same thing for the last 15 years or so. So I'll describe how I got started. When I was an undergrad, I actually met the principal of a blind children's school who himself was visually impaired. And he was talking about some of the technologies that he uses in order to be independent. And one of those was using optical character recognition and text to speech in order to take documents or letters that people sent him and have them read out without having to depend on somebody. And he was in Ahmedabad, which is where I grew up. And his native language was Gujarati.  And he was not able to do this for that language. Whereas for English, the tools that he required to be independent were available. And so, he told me like it would be really great if somebody could actually build this kind of system in Gujarati. And that is when it sort of it was like a, you know, aha moment for me. And I decided to take that up as my undergrad project. And ever since then, I've been trying to work on technologies trying to bridge that gap between English and other languages- under resourced languages. And so, since then, I've worked on very related areas. So, my PhD thesis was on text to speech systems for low resource languages. And after I came to MSR I started working on what is called code switching, which is a very common thing that multilinguals all over the world do. So they use multiple languages in the same conversation or sometimes even in the same sentence. And so you know, this was a project called Project Melange that was started here and that really pioneered the code switching work in the research community in NLP. And after that it's been about LLMs and evaluation but again from a multilingual under resource languages standpoint. Sridhar Vedantham: Right. So I have been here for quite a while at MSR myself and one thing that I always heard is that there is this in general, a wide gulf in terms of the resources available for a certain set of languages to do say NLP type work. And the other languages is just the tail, it's a long tail, but the tail just falls off dramatically. So, I wanted you to answer me in a couple of ways. One is, what is the impact that this generally has in the field of NLP itself and in the field of research into language technologies, and what's the resultant impact on LLMs? Sunayana Sitaram: Yeah, that's a great question. So, you know the paradigm has shifted a little bit after LLM's have come into existence. Before this, so this was around say a few years ago, the paradigm would be that you would need what is called unlabeled data. So, that is raw text that you can find on the web, say Wikipedia or something like that, as well as labeled data. So, this is something that a human being has actually sat and labeled for some characteristic of that text, right? So these are the two different kinds of texts that you need if you want to build a text based language model for a particular language. And so there were languages where, you know, you would find quite a lot of data on the web because it was available in the form of documents or social media, etc. for certain languages. But nobody had actually created the labeled resources for those languages, right? So that was the situation a few years ago. And you know the paradigm at that time was to use both these kinds of data in order to build these models, and our lab actually wrote quite a well-regarded paper called, ‘The State and Fate of Linguistic Diversity and Inclusion’, where they grouped different languages into different classes based on how much data they had labeled, as well as unlabeled. Sridhar Vedantham: Right. Sunayana Sitaram: And it was very clear from that work that, you know only around 7 or 8 languages of the world actually can be considered to be high resource languages which have this kind of data. And most of the languages of the world spoken by millions and millions of speakers don't have these resources. Now with LLMs, the paradigm changed slightly, so there was much less reliance on this labeled data and much more on the vast amount of unlabeled data that exists, say, on the web. And so, you know, we were wondering what would happen with the advent of LLMs now to all of the languages of the world, which ones would be well represented, which ones wouldn't etc. And so that led us to do, you know, the work that we've been doing over the last couple of years. But the story is similar, that even on the web some of these languages dominate and so many of these models have, you know, quite a lot of data from only a small number of languages, while the other languages don't have much representation. Sridhar Vedantham: OK. So, in real terms, in this world of LLMs that we live in today, what kind of impact are we looking at? I mean, when you're talking about inequities and LLMs and in this particular field, what's the kind of impact that we're seeing across society? Sunayana Sitaram: Sure. So when it comes to LLMs and language coverage, what we found from our research is that there are a few languages that LLMs perform really well on. Those languages tend to be high resource languages for which there is a lot of data on the web and they also tend to be languages that are written in the Latin script because of the way the LLMs are designed currently with the tokenization. And the other languages, unfortunately there is a large gap between the performance in English and other languages, and we also see that a lot of capabilities that we see in LLMs in English don't always hold in other languages. So a lot of capabilities, like really good reasoning skills, etc, may only be present in English and a few other languages, and they may not be seen in other languages. And this is also true when you go to smaller models that you see that their language capabilities fall off quite drastically compared to the really large models that we have, like the GPT 4 kind of models. So when it comes to real world impact of this, you know, if you're trying to actually integrate one of these language models into an application and you're trying to use it in a particular language, chances are that you may not get as good performance in many languages compared to English. And this is especially true if you're already used to using these systems in English and you want to use them in a second language. You expect them to have certain capabilities which you've seen in English, and then when you use them in another language, you may not find the same capabilities. So in that sense, I think there's a lot of catching up to do for many languages. And the other issue also is that we don't even know how well these systems perform for most languages of the world because we've only been able to evaluate them on around 50 to 60 or maybe 100 languages. So for the rest of the 6000ish languages of the world, many of which don't even have a written form, most of which are not there on the web. We don't even know whether these language models are, you know, able to do anything in them at all. So I think that is another, you know, big problem that is there currently. Sridhar Vedantham: So, if you want to change the situation where we say that you know even if you're a speaker of a language that might be small, maybe sa

    33 min
  2. 08/21/2023

    HyWay: Enabling Mingling in the Hybrid World. With Dr. Venkat Padmanabahan and Ajay Manchepalli

    Podcast- HyWay: Enabling Mingling in the Hybrid World Ajay Manchepalli: One thing we have learned is that, you know as they say, necessity is the mother of invention. This is a great example of that because it's not that we didn't have remote people before. And it's not that we didn't have technology to support something like this. But we have had this Black Swan moment with COVID, which required us to be not in the same physical location at all time and that accelerated the adoption of digital technologies. You can build all the technology you want. But having it at the right time and right place matters the most. [Music] Sridhar Vedantham: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham. [Music] Sridhar Vedantham: The COVID pandemic forced most of us into a new paradigm of work from home and a number of tools to cater to remote work became popular. However, the post pandemic environment has seen interesting scenarios with some people preferring to continue to work from home, some people preferring to return full time to work and a number of people adopting something in between. This hybrid work environment exists today in the workplace as well as in other scenarios such as events. While tools such as Microsoft Teams do extremely well in supporting scheduled and agenda driven work meetings, there is need for a tool that supports a mix of virtual and in-person gatherings in an informal or semi-structured work environment, such as in hallways or at water coolers. In this edition of the podcast, I speak to Venkat Padmanabhan, Deputy MD (Deputy Managing Director) of MSR India and Ajay Manchepalli. Principal Research Program Manager, about a project called HyWay. HyWay’s a system to support unstructured and semi structured hybrid and informal interactions between groups of in-person and remote participants. Venkat Padmanabhan is Deputy Managing Director at Microsoft Research India in Bengaluru. He was previously with Microsoft Research Redmond, USA for nearly 9 years. Venkat’s research interests are broadly in networked and mobile computing systems, and his work over the years has led to highly-cited papers and paper awards, technology transfers within Microsoft, and also industry impact. He has received several awards and recognitions, including the Shanti Swarup Bhatnagar Prize in 2016, four test-of-time paper awards from ACM SIGMOBILE, ACM SIGMM, and ACM SenSys, and several best paper awards. He was also among those recognized with the SIGCOMM Networking Systems Award 2020, for contributions to the ns family of network simulators. Venkat holds a B.Tech. from IIT Delhi (from where he received the Distinguished Alumnus award in 2018) and an M.S. and a Ph.D. from UC Berkeley, all in Computer Science, and has been elected a Fellow of the INAE, the IEEE, and the ACM. He is an adjunct professor at the Indian Institute of Science and was previously an affiliate faculty member at the University of Washington. He can be reached online at http://research.microsoft.com/~padmanab/. Ajay Manchepalli, as a Research Program Manager, works with researchers across Microsoft Research India, bridging Research innovations to real-world scenarios. He received his Master’s degree in Computer Science from Temple University where he focused on Database Systems. After his Masters, Ajay spent his next 10 years shipping SQL Server products and managing their early adopter customer programs. For more information about the HyWay project, click HyWay - Microsoft Research. For more information about the Microsoft Research India click here. RelatedMicrosoft Research India Podcast: More podcasts from MSR IndiaiTunes: Subscribe and listen to new podcasts on iTunesAndroidRSS FeedSpotifyGoogle PodcastsEmail  Transcript [Music] Sridhar Vedantham: So, Venkat and Ajay, welcome to the podcast. Venkat Padmanabhan: Good to be here again. Ajay Manchepalli: Yeah, likewise. Sridhar Vedantham: Yeah, both of you guys have been here before, right? Venkat Padmanabhan: Yeah, it's my second time. Sridhar Vedantham: OK. Ajay Manchepalli: Same here. Sridhar Vedantham: Great! So anyway, we wanted to talk today about this project called HyWay, which, unlike the way the name sounds, is not related to one of your earlier projects which was called HAMS, which actually had to do with road safety. So, tell us a bit about what HyWay is all about and especially where the name comes from? Venkat Padmanabhan: Right. Yeah. So, HyWay, we spell it as H Y W A Y. It's short for hybrid hallway. It's really about hybrid interaction. What we mean by that is interaction between people who are physically present in a location- think of a conference venue or an office floor- and people who are remote. So that's where hybrid comes from, and it's really about sort of enabling informal mingling style, chitchat kind of interaction in such settings, which perhaps other platforms don't quite support. Sridhar Vedantham: OK. And why come up with this project at all? I mean there are plenty of other solutions and products and ways to talk to people that already are out there. So why do we really need something new? Venkat Padmanabhan: Yeah, yeah. So maybe I can give you a little bit of background on this. I think in the very early days of the pandemic, I think in March or April of 2020, you know, all of us were locked up in our respective homes. And obviously there were tools like Teams at Microsoft and equivalent ones like Zoom and so on elsewhere, that allowed people to stay connected and participate in work meetings and so on. But it was very clear very soon that what's missing is these informal interactions, bumping into someone in the hallway and just chatting with them. That kind of interaction was pretty much nonexistent because, you know, if you think of something like a Teams call or, you know, Zoom call, any of those, it's a very sanitized environment, right? If, let's say the three of us are on a Teams call, no one else in the world knows we are meeting, and no one else in the world can overhear us or be, you know, have an opportunity to join us unless they're explicitly invited. So, we said, OK, you know, we want to sort of make these meeting porous, not have these hard boundaries. And that was the starting point. And then as the months went along, we realized that, hey, the world is not going to be just remote all the while. You know people are going to come back to the office and come back to having face-to-face meetings. And so how do you sort of marry the convenience of remote, with the richer experience of being in-person, and so that's where hybrid comes in. And that's something that in our experience, existing tools, including the new tools that came up in the pandemic, don't support. There are tools that do all virtual experiences. But there is nothing that we have seen that does hybrid the way we are trying to do in HyWay. Sridhar Vedantham: Right. So, I wanted to go back to something you just said earlier and basically, when you use the term porous, right, and what does that actually mean? Because like you said, the paradigm in which we are used to generally conducting meetings is that it's a closed, sanitized environment. So, what exactly do we mean by porosity and if you are in a meeting environment, why do you even want porosity? Venkat Padmanabhan: OK. Maybe I can give an initial answer then maybe Ajay can add. I think we're not saying every meeting is going to be porous, just to be clear, right. You know when you have a closed-door meeting and you know, maybe you're talking about sensitive things, you don't want porosity, right? You want to sort of maintain the privacy and the sanctity of that environment, but when you are trying to enable mingling in a, say, conference setting where you’re sort of bumping into people, joining a conversation, and while you're having the conversation, you overhear some other conversation or you see someone else and you want to go there. There we think something like porosity and other elements of the design of HyWay, which we can get to in a moment, allow for awareness, right? Essentially, allow you to be aware of what else is going on and give you that opportunity to potentially join other conversations. So that's where we think porosity is really important. It's not like it's something that we are advocating for all meetings. Ajay Manchepalli: One way to think about this is if you are in a physical space and you want to have a meeting with somebody on a specific topic. You pick a conference room, and you get together and it's a closed-door conversation. However, when you're at a workplace or any location for that matter, you tend to have informal conversations, right? So where you're just standing by the water cooler or you're standing in the hallway and you want to have discussions. And at that point in time, what you realize is that, even though you're having conversations with people, there are people nearby that you can see, and you can overhear their conversations. It's a very natural setting. However, if you're remote and you're missing out on those conversations, how do you bring them into play, right? And where it is not predefined or a planned conversation and you're just gonna happen to see someone or happen to hear someone and join in. And what we talk about is a natural porous nature of air and we are trying to simulate something similar in our system. Sridhar Vedantham: OK. So, it's kind of trying to mimic an actual real life physical interaction kind of setting where you can kind of combine some degree of formality and informality. Ajay Manchepalli: Correct! And many of these platforms like Teams or Zoom and things like that, it is built on this notion of virtual presence, so you could be anywhere, and you could join and have discussions. However, our concept is more aligned with, ho

    36 min
  3. 06/13/2022

    HAMS- Using Smartphones to Make Roads Safer. With Dr. Venkat Padmanabhan and Dr. Akshay Nambi

    Episode 013 | June 14, 2022 Road safety is a very serious public health issue across the world. Estimates put the traffic related death toll at approximately 1.35 million fatalities every year, and the World Health Organization ranks road injuries in the top 10 leading causes of death globally. This raises the question- can we do anything to improve road safety? In this podcast, I speak to Venkat Padmanabhan, Deputy Managing Director of Microsoft Research India and Akshay Nambi, Principal Researcher at MSR India. Venkat and Akshay talk about a research project called Harnessing Automobiles for Safety, or HAMS. The project seeks to use low-cost sensing devices to construct a virtual harness for vehicles that can help monitor the state of the driver and how the vehicle is being driven in the context of the road environment it is in. We talk about the motivation behind HAMS, its evolution, its deployment in the real world and the impact it is already having, as well as their future plans. Venkat Padmanabhan is Deputy Managing Director at Microsoft Research India in Bengaluru. He was previously with Microsoft Research Redmond, USA for nearly 9 years. Venkat’s research interests are broadly in networked and mobile computing systems, and his work over the years has led to highly-cited papers and paper awards, technology transfers within Microsoft, and also industry impact. He has received several awards and recognitions, including the Shanti Swarup Bhatnagar Prize in 2016, four test-of-time paper awards from ACM SIGMOBILE, ACM SIGMM, and ACM SenSys, and several best paper awards. He was also among those recognized with the SIGCOMM Networking Systems Award 2020, for contributions to the ns family of network simulators. Venkat holds a B.Tech. from IIT Delhi (from where he received the Distinguished Alumnus award in 2018) and an M.S. and a Ph.D. from UC Berkeley, all in Computer Science, and has been elected a Fellow of the INAE, the IEEE, and the ACM. He is an adjunct professor at the Indian Institute of Science and was previously an affiliate faculty member at the University of Washington. He can be reached online at http://research.microsoft.com/~padmanab/. Akshay Nambi is a Principal Researcher at Microsoft Research India. His research interests lie at the intersection of Systems and Technology for Emerging Markets broadly in the areas of AI, IoT, and Edge Computing. He is particularly interested in building affordable, reliable, and scalable IoT devices to address various societal challenges. His recent projects are focused on improving data quality in low-cost IoT sensors and enhancing performance of DNNs on resource-constrained edge devices. Previously, he spent two years at Microsoft Research as a post-doctoral scholar and he has completed his PhD from the Delft University of Technology (TUDelft) in the Netherlands. More information on the HAMS project is here: HAMS: Harnessing AutoMobiles for Safety - Microsoft Research For more information about the Microsoft Research India click here. Related Microsoft Research India Podcast: More podcasts from MSR IndiaiTunes: Subscribe and listen to new podcasts on iTunesAndroidRSS FeedSpotifyGoogle PodcastsEmail Transcript Venkat Padmanabhan: There's hundreds of thousands of deaths and many more injuries happening in the country every year because of road accidents. And of course it's a global problem and the global problem is even bigger. The state of license testing is as that by some estimates of public reports, over 50% of license are issued without a test or a proper test. So we believe a system like HAMS that improves the integrity of the testing process has huge potential to make a positive difference. [Music] Sridhar Vedantham: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham. [Music] Sridhar Vedantham: Venkat and Akshay welcome to the podcast. I think this is going to be quite an interesting one. Venkat Padmanabhan: Hello Sridhar, nice to be here. Akshay Nambi: Yeah, Hello Sridhar, nice to be here. Sridhar Vedantham: And Akshay is of course officially a veteran of the podcast now since it's your second time. Akshay Nambi: Yes, but the first time in person so looking forward to it. Sridhar Vedantham: Yes, in fact I am looking forward to this too. It's great to do these things in person instead of sitting virtually and not being able to connect physically at all. Akshay Nambi: Definitely. Sridhar Vedantham: Cool, so we're going to be talking about a project that Venkat and you are working on, and this is something called HAMS. To start with, can you tell us what HAMS means or what it stands for, and a very brief introduction into the project itself? Venkat Padmanabhan: Sure, I can take a crack at it. HAMS stands for Harnessing Automobiles for Safety. In a nutshell, it's a system that uses a smartphone to monitor a driver and their driving, with a view to improving safety. So we look at things like the state of the driver, where they're looking, whether they're distracted, and so on. That’s sort of looking at the driver. But we also look at the driving environment, because we think, to truly attack the problem of safety, you need to have both the internal context inside the vehicle as well as the external context. So that's the sort of brief description of what HAMS tries to do. Sridhar Vedantham: Ok. So, you spoke about a couple of things here, right? One is the safety aspect of, you know, driving both internal and external. When you're talking about this, can you be more concise? And especially, how did this kind of consideration feed into, say, the motivation or the inspiration behind HAMS? Akshay Nambi: Yeah, so as you know, road safety is a major concern, not just in India globally, right? And when you look at the factors affecting roads safety, there is the vehicle, there's the infrastructure and the driver. And majority of the instance today focus on the driver. For instance, the key factors affecting road safety includes over speeding, driving without seatbelts, drowsy driving, drunken driving. All centering around the driver. And that kind of started that was motivating towards looking at the driver more carefully, which is where we build the system HAMS, which focuses on monitoring the driver and also how he's driving. Sridhar Vedantham: And India in particular has an extremely high rate of deaths per year, right, in terms of in terms of roads accidents. Akshay Nambi: Yes, it's on the top list. In fact, around 80,000 to 1.5 lakh people die every year according to the stats from the government. Yeah, it's an alarming thing and hopefully we are doing baby steps to improve that. Venkat Padmanabhan: In fact, if I may add to that, if you look at the causes of death, not just road accidents, diseases and so on, road accidents are in the top 10. And if you look at the younger population, you know people under 35 or 40, it's perhaps in the top two or three. So it is a public health issue as well. Sridhar Vedantham: And that's scary. Ok, so how does this project actually work? I mean, the technology and the research that you guys developed and the research that's gone into it. Talk to us a little bit about that. Venkat Padmanabhan: Sure yeah, let me actually wind back, maybe 10-15 years to sort of when we first started on this journey, and then talk more specifically about HAMS and what's happened more recently. Smartphones, as you know, have been around for maybe 15 years. A bit longer maybe. And when smartphones started emerging in the mid 2000s and late 2000s, we got quite interested in the possibility of using a smartphone as a sensor for, you know, road monitoring, driving monitoring and so on. And we built a system here at Microsoft Research India back in 2007-08, it's called Nericell, where we used a leading-edge smartphone of that era to do sensing. But it turned out that the hardware then was quite limited in its capabilities in terms of sensors, even accelerometer was not there. We had to pair an external accelerometer and so on. And so the ability for us to scale that system and really have interesting things come out of it was quite limited. Fast forward, about 10 years, not only did smartphone hardware get much better, AI and machine learning models that could process this information became much better and among the new sensors in the newer edge smartphones or the cameras, the front camera and the back camera. And machine learning models for computer vision have made tremendous progress. So that combination allowed us to do far more interesting things than we were able to back then. Maybe Akshay can talk a bit more about the specific AI models and so on that we built. Akshay Nambi: Yeah, so if you compare the systems in the past to HAMS, what was missing was the context. In the past, systems like what Venkat mentioned- Nericell, right, it was correcting the sensor data, but it was lacking context. For example, it could tell did the driver did this rash braking or not, but it could not tell, did he do it because somebody jumped in front of the vehicle, or was he distracted? These cameras new smartphones have can provide this context, which makes these systems much more capable and can provide valuable insights. And in terms of specific technology itself, we go with commodity smartphones, which have multiple cameras today. The front camera looking at the driver, the back camera looking at the road, and we have built numerous AI models to track the driver state, which includes driver fatigue and driver gaze, where the driver is actually looking. And also with the back camera we look at how the driver is driving with respect to the environment. That is, is he over speeding, is he driving on the wrong side of the road and so on. Sridhar Vedantham: So, this is all happening in real time. Akshay Nambi: The system can support both rea

    27 min
  4. 05/30/2022

    A Random Walk From Complexity Theory to Machine Learning. With Dr. Neeraj Kayal and Dr. Ravishankar Krishnaswamy

    Episode 012 | May 30, 2022 Neeraj Kayal: It’s just a matter of time before we figure out how computers can themselves learn like humans do. Just human babies, they have an amazing ability to learn by observing things around them. And currently, despite all the progress, computers don't have that much ability. But I just think it's a matter of time before we figure that out, some sort of general artificial intelligence. Sridhar Vedantham: Welcome to the MSR India podcast. In this podcast, Ravishankar Krishnaswamy, a researcher at the MSR India lab, speaks to Neeraj Kayal. Neeraj is also a researcher at MSR India and works on problems related to or at the intersection of Computational Complexity and Algebra, Number Theory and Geometry. He has received multiple recognitions through his career, including the Distinguished Alumnus award from IIT Kanpur, the Gödel prize and the Fulkerson Prize. Neeraj received the Young Scientist Award from the Indian National Science Academy (INSA) in 2012 and the Infosys Prize in Mathematical Sciences in 2021. Ravi talks to Neeraj about how he became interested in this area of computer science and his journey till now. For more information about the Microsoft Research India click here. Related Microsoft Research India Podcast: More podcasts from MSR IndiaiTunes: Subscribe and listen to new podcasts on iTunesAndroidRSS FeedSpotifyGoogle PodcastsEmail Transcript Ravi Krishnaswamy: Hi Neeraj, how are you doing? It's great to see you after two years of working from home. Neeraj Kayal: Hi Ravi, yeah thank you. Thank you for having me here and it's great to be back with all the colleagues in office. Ravi Krishnaswamy: First of all, congratulations on the Infosys prize and it's an amazing achievement. And it's a great privilege for all of us to have you as a colleague here. So, congratulations on that. Neeraj Kayal: Thank you. Ravi Krishnaswamy: Yeah, so maybe we can get started on the podcast. So, you work in complexity theory, which is I guess one extreme of, I mean, it's very theoretical end of the spectrum in computer science almost bordering mathematics. So hopefully by the end of this podcast we can, uh, I mean, convince the audience that there's more to it than intellectual curiosity. Before that right, let me ask you about how you got into theoretical computer science and the kind of problems that you work on. So, could you maybe tell us a bit about your background and how you got interested into this subject? Neeraj Kayal: Yeah, so in high school I was doing well in maths in general and I also wrote some computer programs to play some board games, like a generalized version of Tic Tac Toe where you have a bigger board, say 20 by 20, and you try to place five things in the row, column, or diagonal continuously and then I started thinking about how could a computer learn to play or improve itself in such a game? So, I tried some things and didn't get very far with that, but at that time I was pretty convinced that one day computers will be able to really learn like humans do. I didn't see how that will happen, but I was sure of it and I just wanted to be in computer science to eventually work on such things. But in college in the second year of my undergrad, I enrolled for a course in cryptography taught by Manindra Agrawal at IIT Kanpur and then the course started off with some initial things which are like fairly predictable that something called symmetric key cryptosystems where, essentially it says that let's say we two want to have a private conversation, but anyone else can listen to us. So how do we have a private conversation? Well, if we knew a language, a secret language which no one else did, then we could easily just converse in that language, and no one will understand this. And so, this is made a little more formal in this symmetric key cryptosystem. And then, one day, Manindra ended one of the lectures with the following problem: but now suppose we did not know a secret language. Then we just know English, and everyone knows English and then how do we talk privately when everyone can hear us? I thought about it for a few days. It seemed completely impossible. And then Manindra told us about these wonderful cryptosystems, called the Diffie Hellman cryptosystem and the RSA cryptosystem where they achieved this and it was very surprising. And the key thing that these cryptosystems use is something that lies at the heart of computer science, a big mystery still even to this day at the heart of computer science. There are these problems which we believe are hard for computers to solve in the following sense, that even if a computer takes a very long amount of time, if we give it a fairly long amount of time, a reasonable amount of time it cannot solve it. But if we give it time like till the end of the universe, it can in principle solve such problems. So that got me interested into which problems are hard and can we prove they are actually hard or not? And to this day, we don't know that. Ravi Krishnaswamy: So, I'm guessing that you're talking about the factoring problem, right? Neeraj Kayal: Yes, factoring is one of the big ones here. And the RSA cryptosystem uses factoring. Ravi Krishnaswamy: So, it's actually very interesting, right? You started out by trying to show that some of these problems are very, very hard, but I think, looking back, your first research paper, which happens to be a breakthrough work in itself, is in showing that a certain problem is actually easier to solve. Then we had originally thought right so, it is this seminal work on showing that primality testing can be solved in deterministic polynomial time. I mean, that's an amazing feat and you had worked on this paper with your collaborators as an undergrad, right? Neeraj Kayal: Yes. Ravi Krishnaswamy: Yeah, that's an incredible achievement. So maybe to motivate others who are in undergrad and who have an interest and inclination in such topics, could you maybe share us some story on how you got working in that problem and what sort of led you to this spark that eventually got you to this breakthrough result? Neeraj Kayal: So, my advisor Manindra, who also was the professor who taught us cryptography - he had been working on this problem for a long time and there were already algorithms that existed which are very good in practice- very very fast in practice, but they had this small chance that they might give the wrong answer. The chance was so small that practically it did not matter, but still as a mathematical challenge, it remained whether we could remove that small chance of error, and that's what the problem was about. So, Manindra had this approach and he had worked with other students also- some of our seniors- on it, and in that course, he came up with a conjecture. And then when we joined, me and my colleague Nitin, we joined this project , we came across this conjecture and my first reaction was that the conjecture is false. So, I tried to write a program which would find a counterexample and I thought we would be done in a few days-Just find that counterexample and the project would be over. So, I wrote a program- it will train for some time, didn't find a counterexample, so I decided to parallelize it. A huge number of machines in the computer center in IIT Kanpur started looking for that counterexample. And then to my surprise, we still couldn't find the counterexample. So there seemed to be something to it. Something seemed to be happening there which we didn't understand, and in trying to sort of prove that conjecture, we managed to prove some sort of weaker statement which sufficed for obtaining the polynomial time algorithm to test if a number is prime or not. But it was not the original conjecture itself. Many days after this result came out, we met a mathematician called Hendrik Lenstra who had worked on primality testing, and we told him about this conjecture. And after a few days he got back to us and it showed that if you assume some number theoretic conjecture is true, which we really really believe, it's true. Ravi Krishnaswamy: Ok, I see. So, the original conjecture, which you hoped to prove true is false, but the weaker conjecture was actually true, you proved it to be true, and that was enough for your eventual application. Neeraj Kayal: Yes, so in some sense we are very lucky that in trying to prove something false we managed to prove something useful. Ravi Krishnaswamy: Yeah, I mean it's a fascinating story, right? All the experiments that you ran pointed you towards proving it, and then you actually went and proved it. If you had found, I imagine what would have happened if you found a counterexample at that time, right? Neeraj Kayal: So yeah, Hendrix proof was very interesting. He showed that modulo this number theory conjecture a counterexample existed. But it would have to be very, very large and that's why you couldn't find it. So, he explained it beautifully. Ravi Krishnaswamy: Yeah, thanks for that story Neeraj. So. I guess from then on you've been working in complexity theory, right? Neeraj Kayal: That's right, yeah. Ravi Krishnaswamy: So, for me at least, the Holy Grail in complexity theory that I've often encountered or seen is the P versus NP problem, which many of us might know. But you've been working on a very equally important, but a very close cousin of the problem, which is called the VP versus VNP problem, right? So, I'm going to take a stab at explaining what I understand of the problem. So, correct me whenever I'm wrong. So, you are interested in trying to understand the complexity of expressing polynomials using small circuits. So, for example, if you have a polynomial of the form X ^2 + Y ^2 + 2 XY, you could represent it as a small circuit which has a few addition operations and a few multiplication operations like you could express it as X ^2 + Y ^2 + 2 XY itself. Or you could express it as (X + Y)^2. Which may have a smaller

    22 min
  5. 01/17/2022

    Collaborating to Develop a Low-cost Keratoconus Diagnostic Solution. With Dr. Kaushik Murali and Dr. Mohit Jain

    Episode 011 | January 18, 2022 Keratoconus is a severe eye disease that affects the cornea, causing it to become weak and develop a conical bulge. Keratoconus, if undiagnosed and entreated, can lead to partial or complete blindness in people affected by it. However, the equipment needed to diagnose keratoconus is expensive and non-portable, which makes early detection of keratoconus inaccessible to large populations in low and middle income countries. This makes it a leading cause for partial or complete blindness amongst such populations. Doctors from Sankara Eye Hospital, Bengaluru and researchers from Microsoft Research India have been working together to develop SmartKC, a low-cost and portable diagnostic system that can enable early detection and mitigation of keratoconus. Join us as we speak to Dr. Kaushik Murali from Sankara Eye Hospital and Dr. Mohit Jain from Microsoft Research India. Dr. Kaushik Murali, President Medical Administration, Quality & Education, Sankara Eye Foundation India (Sri Kanchi Kamakoti Medical Trust) which is among the largest structured community eye hospital network in India, (www.sankaraeye.com) with an objective of providing world class eye care with a social impact.  A paediatric ophthalmologist, Dr. Kaushik has completed a General Management Programme and is an alumnus of Insead. He has done a course on Strategic Management of Non Profits at the Harvard Business School. He has been certified in infection control, risk management for health care and digital disruption. He is a member of Scalabl, a global community promoting entrepreneurship.   Dr. Kaushik is a member of the Scientific Committee of Vision 2020, the Right to Sight India. He is currently involved in collaborative research projects among others with the University of Bonn & Microsoft. Dr. Kaushik has received many recognitions, key among them being the Bernadotte Foundation for Children's Eyecare Travel Grant, Mother Teresa Social Leadership Scholarship ,International Eye Health Hero, All India Ophthalmological Society best research, International Association for Prevention of Blindness (IAPB) Eye Health Hero, Indian Journal of Ophthalmology Certificate of Merit.  Beyond the medical world, he is part of the National Management Team of Young Indians – Confederation of Indian Industry (CII). He represented India at G20 Young Entrepreneur Alliance 2018 at Argentina and led the Indian delegation for the Inaugural India- Israel Young Leaders Forum in 2019. More recently, he led the first citizen’s cohort for a workshop on Strategic Leadership at LBSNAA (Lal Bahadur Shastri National Academy of Administration).  Mohit Jain is a Senior Researcher in the Technology and Empowerment (TEM) group at Microsoft Research India. His research interests lie at the intersection of Human Computer Interaction and Artificial Intelligence. Currently, he focuses on developing end-to-end systems providing low-cost smartphone-based patient diagnostic solutions for critical diseases. Over the past decade, he has worked on technological solutions for the developing regions focusing on health, accessibility, education, sustainability, and agriculture. He received his Ph.D. in Computer Science & Engineering from the University of Washington, focusing on extending interactivity, accessibility and security of conversational systems. While pursuing his Ph.D., he also worked as a Senior Research Engineer in the Cognitive IoT team at IBM Research India. Prior to that, he graduated with a Masters in Computer Science from the University of Toronto, and a Bachelors in Information and Communication Technology from DA-IICT. For more information about the SmartKC project, and for project related code, click here. For more information about the Microsoft Research India click here. Related Microsoft Research India Podcast: More podcasts from MSR IndiaiTunes: Subscribe and listen to new podcasts on iTunesAndroidRSS FeedSpotifyGoogle PodcastsEmail Transcript    Dr. Murali Kaushik: Sitting in an eye hospital, often we have ideas, but we have no clue whom to ask. But honestly, now we know that there is a team at MSR that we can reach out to saying that hey, here is a problem, we think this warrants attention. Do you think you guys can solve it? And we found that works really well. So, this kind of a collaboration is, I think, a phenomenal impact that this project has brought together, and we hope that together we will be able to come up with few more solutions that can align with our founders’ dream of eliminating needless blindness from India.  [Music] Sridhar Vedantham: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham. [Music] Sridhar Vedantham: Keratoconus is a severe eye disease that affects the cornea, causing it to become weak and develop a conical bulge. Keratoconus, if undiagnosed and entreated, can lead to partial or complete blindness in people affected by it. However, the equipment needed to diagnose keratoconus is expensive and non-portable, which makes early detection of keratoconus inaccessible to large populations in low and middle income countries. This makes it a leading cause for partial or complete blindness amongst such populations. Doctors from Sankara Eye Hospital, Bengaluru and researchers from Microsoft Research India have been working together to develop SmartKC, a low-cost and portable diagnostic system that can enable early detection and mitigation of keratoconus. Join us as we speak to Dr. Kaushik Murali from Sankara Eye Hospital and Dr. Mohit Jain from Microsoft Research India. [Music]   Sridhar Vedantham: So, Dr. Kaushik and Mohit, welcome to the podcast.    Mohit Jain: Hi, Sridhar.  Dr. Kaushik Murali: Hi Sridhar, pleasure to be here.    Sridhar Vedantham: It's our pleasure to host you, Doctor Kaushik, and for me this is going to be a really interesting podcast for a couple of reasons. One is that the topic itself is kind of so far afield from what I normally here at Microsoft Research and the second is I think you're the first guest we are having on the podcast who's actually not part of MSR, so basically a collaborator. So, this is really exciting for me. So let me jump right into this. We're going to be talking about something called keratoconus, so could you educate us a little bit as to what keratoconus actually is and what its impact is?    Dr. Kaushik Murali: So, imagine that you were a 14-year-old who was essentially near sighted. You wore glasses and you were able to see. But with passing time, your vision became more distorted rather than being blurred, which is what you would have expected if just your minus power kept increasing, especially for distance. And to add to your misery, you started seeing more glare and more halos at nighttime. Words that you started to read had shadows around them or even started to look doubled. This essentially is the world of a person with Keratoconus. Literally it means cone shaped. Keratoconus is a condition of the cornea, which is the transparent front part of the eye, similar to your watch glass, where instead of it normally retaining its dome shape, it is characterized by progressive thinning and weakening of the central part, what we call as a stroma, and this makes the cornea take on a conical shape. In a few, this can actually even progress beyond what I describe, where the central cornea overtime becomes scarred and the person could no longer be corrected, with just optical devices like a glass or a contact lens but may actually end up requiring a corneal transplant.     Sridhar Vedantham: I see, and what are the causes for this?    Dr. Kaushik Murali: So there have been very many causes that have been attributed, so it's thought to be multifactorial. So, this again makes it a little tricky in terms of us not being able to prevent the condition, so to speak. But multiple risk factors are known. Ultraviolet exposure, chronic allergies; habitual eye rubber is thought to be more prone for this. Essentially, you end up seeing it more during the pubertal age group, and more in men.    Sridhar Vedantham: I see. And how widespread is this problem, really? Because frankly, I'm of course as lay a person as you can get, and I hadn't really heard of eye disease called keratoconus until I spoke to Mohit at some point and then of course after reading papers and so on. But what is the extent of the issue and is it really that widespread a problem?  Dr. Kaushik Murali: So, unlike most other conditions, there is no real population-based survey where we have screened every household to arrive at numbers. But largely, we base our estimation on small surveys that have been done across different parts of the world. Based on this, we estimate that it is approximately affecting about one in 2000 individuals. So, in the US, for example, it is thought to affect almost about 55 people in about 100,000, who had been diagnosed with keratoconus. But in countries like India, it is thought to be more widespread. So there was actually a survey in central India where they found almost 2300 people out of 100,000 people being affected with keratoconus. So, the numbers are quite large. And again, all of this could be underestimated simply because we don't have enough ability to screen. And what makes this number even scarier is this is a disease that typically affects people between the age group of 10 and 25. So, once they're affected and they’re progressively going to have their vision come down, they're going to spend most of their protective years not being able to see clearly.    Sridhar Vedantham: OK, that is kind of scary.    Mohit Jain: I would just like to add to that is that there is actually a combination of demographics, genetic and weather condition which makes India a really good host for this disease.

    28 min
  6. 09/29/2021

    Accelerating AI Innovation by Optimizing Infrastructure. With Dr. Muthian Sivathanu

    Episode 010 | September 28, 2021Artificial intelligence, Machine Learning, Deep Learning, and Deep Neural Networks are today critical to the success of many industries. But they are also extremely compute intensive and expensive to run in terms of both time and cost, and resource constraints can even slow down the pace of innovation. Join us as we speak to Muthian Sivathanu, Partner Research Manager at Microsoft Research India, about the work he and his colleagues are doing to enable optimal utilization of existing infrastructure to significantly reduce the cost of AI. Muthian's interests lie broadly in the space of large-scale distributed systems, storage, and systems for deep learning, blockchains, and information retrieval. Prior to joining Microsoft Research, he worked at Google for about 10 years, with a large part of the work focused on building key infrastructure powering Google web search — in particular, the query engine for web search. Muthian obtained his Ph.D from University of Wisconsin Madison in 2005 in the area of file and storage systems, and a B.E. from CEG, Anna University, in 2000. For more information about the Microsoft Research India click here. Related Microsoft Research India Podcast: More podcasts from MSR IndiaiTunes: Subscribe and listen to new podcasts on iTunesAndroidRSS FeedSpotifyGoogle PodcastsEmail  TranscriptMuthian Sivathanu: Continued innovation in systems and efficiency and costs are going to be crucial to drive the next generation of AI advances, right. And the last 10 years have been huge for deep learning and AI and primary reason for that has been the significant advance in both hardware in terms of emergence of GPUs and so on, as well as software infrastructure to actually parallelize jobs, run large distributed jobs efficiently and so on. And if you think about the theory of deep learning, people knew about backpropagation about neural networks 25 years ago. And we largely use very similar techniques today. But why have they really taken off in the last 10 years? The main catalyst has been sort of advancement in systems. And if you look at the trajectory of current deep learning models, the rate at which they are growing larger and larger, systems innovation will continue to be the bottleneck in sort of determining the next generation of advancement in AI. [Music] Sridhar Vedantham: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham. [Music] Sridhar Vedantham: Artificial intelligence, Machine Learning, Deep Learning, and Deep Neural Networks are today critical to the success of many industries. But they are also extremely compute intensive and expensive to run in terms of both time and cost, and resource constraints can even slow down the pace of innovation. Join us as we speak to Muthian Sivathanu, Partner Research Manager at Microsoft Research India, about the work he and his colleagues are doing to enable optimal utilization of existing infrastructure to significantly reduce the cost of AI. [Music] Sridhar Vedantham: So Muthian, welcome to the podcast and thanks for making the time for this. Muthian Sivathanu: Thanks Sridhar, pleasure to be here. Sridhar Vedantham: And what I'm really looking forward to, given that we seem to be in some kind of final stages of the pandemic, is to actually be able to meet you face to face again after a long time. Unfortunately, we've had to again do a remote podcast which isn't all that much fun. Muthian Sivathanu: Right, right. Yeah, I'm looking forward to the time when we can actually do this again in office. Sridhar Vedantham: Yeah. Ok, so let me jump right into this. You know we keep hearing about things like AI and deep learning and deep neural networks and so on and so forth. What's very interesting in all of this is that we kind of tend to hear about the end product of all this, which is kind of, you know, what actually impacts businesses, what impacts consumers, what impacts the health care industry, for example, right, in terms of AI. It's a little bit of a mystery, I think to a lot of people as to how all this works, because... what goes on behind the scenes to actually make AI work is generally not talked about. Muthian Sivathanu: Yeah. Sridhar Vedantham: So, before we get into the meat of the podcast you just want to speak a little bit about what goes on in the background. Muthian Sivathanu: Sure. So, machine learning, Sridhar, as you know, and deep learning in particular, is essentially about learning patterns from data, right, and deep learning system is fed a lot of training examples, examples of input and output, and then it automatically learns a model that fits that data, right. And this is typically called the training phase. So, training phase is where it takes data builds a model how to fit. Now what is interesting is, once this model is built, which was really meant to fit the training data, the model is really good at answering queries on data that it had never seen before, and this is where it becomes useful. These models are built in various domains. It could be for recognizing an image for converting speech to text, and so on, right. And what has in particular happened over the last 10 or so years is that there has been significant advancement both on the theory side of machine learning, which is, new algorithms, new model structures that do a better job at fitting the input data to a generalizable model as well as rapid innovation in systems infrastructure which actually enable the model to sort of do its work, which is very compute intensive, in a way that's actually scalable that's actually feasible economically, cost effective and so on. Sridhar Vedantham: OK, Muthian, so it sounds like there's a lot of compute actually required to make things like AI and ML happen. Can you give me a sense of what kind of resources or how intensive the resource requirement is? Muthian Sivathanu: Yeah. So the resource usage in a machine learning model is a direct function of how many parameters it has, so the more complex the data set, the larger the model gets, and correspondingly requires more compute resources, right. To give you an idea, the early machine learning models which perform simple tasks like recognizing digits and so on, they could run on a single server machine in a few hours, but models now, just over the last two years, for example, the size of the largest model that's useful that state of the art, that achieves state of the art accuracy has grown by nearly three orders of magnitude, right. And what that means is today to train these models you need thousands and thousands of servers and that's infeasible. Also, accelerators or GPUs have really taken over the last 6-7 years and GPUs. A single V-100 GPU today, a Volta GPU from NVIDIA can run about 140 trillion operations per second. And you need several hundreds of them to actually train a model like this. And they run for months together to train a 175 billion model, which is called GPT 3 recently, you need on the order of thousands of such GPUs and it still takes a month. Sridhar Vedantham: A month, that's sounds like a humongous amount of time.  Muthian Sivathanu: Exactly, right? So that's why I think just as I told you how the advance in the theory of machine learning in terms of new algorithms, new model structures, and so on have been crucial to the recent advance in the relevance in practical utility of deep learning.Equally important has been this advancement in systems, right, because given this huge explosion of compute demands that these workloads place, we need fundamental innovation in systems to actually keep pace, to actually make sure that you can train them in reasonable time, you can actually do that with reasonable cost. Sridhar Vedantham: Right. Ok, so you know for a long time, I was generally under the impression that if you wanted to run bigger and bigger models and bigger jobs, essentially you had to throw more hardware at it because at one point hardware was cheap. But I guess that kind of applies only to the CPU kind of scenario, whereas the GPU scenario tends to become really expensive, right? Muthian Sivathanu: Yep, yeah. Sridhar Vedantham: Ok, so in which case, when there is basically some kind of a limit being imposed because of the cost of GPUs, how does one actually go about tackling this problem of scale? Muthian Sivathanu: Yeah, so the high-level problem ends up being, you have limited resources, so let's say you can view this in two perspectives, right. One is from the perspective of a machine learning developer or a machine learning researcher, who wants to build a model to accomplish a particular task right. So, from the perspective of the user, there are two things you need. A, you want to iterate really fast, right, because deep learning, incidentally, is this special category of machine learning, where the exploration is largely by trial and error. So, if you want to know which model actually works which parameters, or which hyperparameter set actually gives you the best accuracy, the only way to really know for sure is to train the model to completion, measure accuracy, and then you would know which model is better, right. So, as you can see, the iteration time, the time to train a model to run inference on it directly impacts the rate of progress you can achieve. The second aspect that the machine learning researcher cares about is cost. You want to do it without spending a lot of dollar cost. Sridhar Vedantham: Right. Muthian Sivathanu: Now from the perspective of let's say a cloud provider who runs this, huge farm of GPUs and then offers this as a service for researchers, for users to run machine learning models, their objective function is cost, right. So, to support a given workload you need to support it with as minimal GPUs as possible. Or in other words, if you have a certain amo

    28 min
  7. 06/14/2021

    Dependable IoT: Making data from IoT devices dependable and trustworthy for good decision making. With Dr. Akshay Nambi and Ajay Manchepalli

    Episode 009 | June 15, 2021 The Internet of Things has been around for a few years now and many businesses and organizations depend on data from these systems to make critical decisions. At the same time, it is also well recognized that this data- even up to 40% of it- can be spurious, and this obviously can have a tremendously negative impact on an organizations’ decision making. But is there a way to evaluate if the sensors in a network are actually working properly and that the data generated by them are above a defined quality threshold? Join us as we speak to Dr Akshay Nambi and Ajay Manchepalli, both from Microsoft Research India, about their innovative work on making sure that IoT data is dependable and verified, truly enabling organizations to make the right decisions. Akshay Nambi is a Senior Researcher at Microsoft Research India. His research interests lie at the intersection of Systems and Technology for Emerging Markets broadly in the areas of AI, IoT, and Edge Computing. He is particularly interested in building affordable, reliable, and scalable IoT devices to address various societal challenges. His recent projects are focused on improving data quality in low-cost IoT sensors and enhancing performance of DNNs on resource-constrained edge devices. Previously, he spent two years at Microsoft Research as a post-doctoral scholar and he has completed his PhD from the Delft University of Technology (TUDelft) in the Netherlands.  Ajay Manchepalli, as a Research Program Manager, works with researchers across Microsoft Research India, bridging Research innovations to real-world scenarios. He received his Master’s degree in Computer Science from Temple University where he focused on Database Systems. After his Masters, Ajay spent his next 10 years shipping SQL Server products and managing their early adopter customer programs. For more information about the Microsoft Research India click here. Related Microsoft Research India Podcast: More podcasts from MSR India iTunes: Subscribe and listen to new podcasts on iTunes Android RSS Feed Spotify Google Podcasts Email Transcript Ajay Manchepalli: The interesting thing that we observed in all these scenarios is how the entire industry is trusting data, and using this data to make business decisions, and they don't have a reliable way to say whether the data is valid or not. That was mind boggling. You're calling data as the new oil, we are deploying these things, and we're collecting the data and making business decisions, and you're not even sure if that data that you've made your decision on is valid. To us it came as a surprise that there wasn't enough already done to solve these challenges and that in some sense was the inspiration to go figure out what it is that we can do to empower these people, because at the end of the day, your decision is only as good as the data. [Music] Sridhar Vedantham: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham. [Music] The Internet of Things has been around for a few years now and many businesses and organizations depend on data from these systems to make critical decisions. At the same time, it is also well recognized that this data- even up to 40% of it- can be spurious, and this obviously can have a tremendously negative impact on an organizations’ decision making. But is there a way to evaluate if the sensors in a network are actually working properly and that the data generated by them are above a defined quality threshold? Join us as we speak to Dr Akshay Nambi and Ajay Manchepalli, both from Microsoft Research India, about their innovative work on making sure that IoT data is dependable and verified, truly enabling organizations to make the right decisions. [Music] Sridhar Vedantham: So, Akshay and Ajay, welcome to the podcast. It's great to have you guys here. Akshay Nambi: Good evening Sridhar. Thank you for having me here. Ajay Manchepalli: Oh, I'm excited as well. Sridhar Vedantham: Cool, and I'm really keen to get this underway because this is a topic that's quite interesting to everybody, you know. When we talk about things like IoT in particular, this has been a term that's been around for quite a while, for many years now and we've heard a lot about the benefits that IoT can bring to us as a society or as a community, or as people at an individual level. Now you guys have been talking about something called Dependable IoT. So, what exactly is Dependable IoT and what does it bring to the IoT space? Ajay Manchepalli: Yeah, IoT is one area we have seen that is exponentially growing. I mean, if you look at the number of devices that are being deployed it's going into the billions and most of the industries are now relying on this data to make their business decisions. And so, when they go about doing this, we have, with our own experience, we have seen that there are a lot of challenges that comes in play when you're dealing with IoT devices. These are deployed in far off locations, remote locations and in harsh weather conditions, and all of these things can lead to reliability issues with these devices. In fact, the CTO of GE Digital mentioned that, you know, about 40% of all the data they see from these IoT devices are spurious, and even KPMG had a report saying that you know over 80% of CEOs are concerned about the quality of data that they're basing their decisions on. And we observed that in our own deployments early on, and that's when we realized that there is, there is a fundamental requirement to ensure that the data that is being collected is actually good data, because all these decisions are being based on the data. And since data is the new oil, we are basically focusing on, ok, what is it that we can do to help these businesses know whether the data they're consuming is valid or not and that starts at the source of the truth, which is the sensors and the sensor devices. And so Akshay has built this technology that enables you to understand whether the sensors are working fine or not. Sridhar Vedantham: So, 40% of data coming from sensors being spurious sounds a little frightening, especially when we are saying that you know businesses and other organizations base a whole lot of the decisions on the data they're getting, right? Ajay Manchepalli: Absolutely. Sridhar Vedantham: Akshay, was there anything you wanted to add to this? Akshay Nambi: Yeah, so if you see, reliability and security are the two big barriers in limiting the true potential of IoT, right? And over the past few years you would have seen IoT community, including Microsoft, made significant progress to improve security aspects of IoT. However, techniques to determine data quality and sensor health remain quite limited. Like security, sensor reliability and data quality are fundamental to realize the true potential of IoT which is the focus of our project- Dependable IoT. Sridhar Vedantham: Ok, so you know, once again, we've heard these terms like IoT for many years now. Just to kind of demonstrate what the two of you have been speaking about in terms of various aspects or various scenarios in which IoT can be deployed, could you give me a couple of examples where IoT use is widespread? Akshay Nambi: Right, so let me give an example of air pollution monitoring. So, air pollution is a major concern worldwide, and governments are looking for ways to collect fine grained data to identify and curb pollution. So, to do this, low-cost sensors are being used to monitor pollution levels. There have been deployed in numerous places on moving vehicles to capture the pollution levels accurately. The challenge with these sensors are that these are prone to failures, mainly due to the harsh environments in which they are deployed. For example, imagine a pollution sensor is measuring high pollution values right at a particular location. And given air pollution is such a local phenomenon, it's impossible to tell if this sensor data is an anomaly or a valid data without having any additional contextual information or sensor redundancy. And due to these reliability challenges the validity and viability of these low-cost sensors have been questioned by various users. Sridhar Vedantham: Ok, so it sounds kind of strange to me that sensors are being deployed all over the place now and you know, frankly, we all carry sensors on ourselves, right, all the time. Our phones have multiple sensors built into them and so on. But when you talk about sensors breaking down or being faulty or not providing the right kind of data back to the users, what causes these kind of things? I mean, I know you said in the context of, say, air pollution type sensors, you know it could be harsh environments and so on, but what are other reasons for, because of which the sensors could fail or sensor data could be faulty? Akshay Nambi: Great question, so sensors can go bad for numerous reasons, right? This could be due to sensor defect or damage. Think of a soil moisture sensor deployed in agricultural farm being run over by a tractor. Or it could be sensor drift due to wear and tear of sensing components, sensor calibration, human error and also environmental factors, like dust and humidity. And the challenge is, in all these cases, right, the sensors do not stop sending data but still continues to keep sending some data which is garbage or dirty, right? And the key challenge is it is nontrivial to detect if a remote sensor is working or faulty because of the following reasons. First a faulty sensor can mimic a non-faulty sensor data which is very hard to now distinguish. Second, to detect sensor faults, you can use sensor redundancy which becomes very expensive. Third, the cost and logistics to send a technician to figure out the fault is expensive and also very cumbersome. Finally, time series a

    28 min
  8. 04/19/2021

    Research @Microsoft Research India: interdisciplinary and impactful. With Dr. Sriram Rajamani

    Episode 008 | April 20, 2021 Microsoft Research India is constantly exploring how research can enable new technologies that positively impact the lives of people while also opening new frontiers in computer science and technology itself. In this podcast we speak to Dr. Sriram Rajamani, distinguished scientist and Managing Director of the Microsoft Research India Lab. We talk about some of the projects in the lab that are making fundamental changes to the computing at Internet scale, computing at the edge and the role he thinks technology should play in the future to ensure digital fairness and inclusion. Sriram also talks to us about a variety of things his own journey as a researcher, how the lab has changed from the time he joined it years ago, and his vision for the lab. Sriram’s research interests are in designing, building and analyzing computer systems in a principled manner. Over the years he has worked on various topics including Hardware and Software Verification, Type Systems, Language Design, Distributed Systems, Security and Privacy. His current research interest is in combining Program Synthesis and Machine Learning. Together with Tom Ball, he was awarded the CAV 2011 Award for “contributions to software model checking, specifically the development of the SLAM/SDV software model checker that successfully demonstrated computer-aided verification techniques on real programs.” Sriram was elected ACM Fellow in 2015 for contributions to software analysis and defect detection, and Fellow of  Indian National Academy of Engineering in 2016. Sriram was general chair for POPL 2015 in India, and was program Co-Chair for CAV 2005. He co-founded the Mysore Park Series, and the ISEC conference series in India. He serves on the CACM editorial board as co-chair for special regional sections, to bring computing innovations from around the world to CACM. Sriram has a PhD from UC Berkeley, MS from University of Virginia and BEng from College of Engineering, Guindy, all with specialization in Computer Science. In 2020, he was named as a Distinguished Alumnus by College of Engineering, Guindy. For more information about the Microsoft Research India click here. Related Microsoft Research India Podcast: More podcasts from MSR India iTunes: Subscribe and listen to new podcasts on iTunes Android RSS Feed Spotify Google Podcasts Email Transcript Sriram Rajamani: We are not like an ivory tower lab. You know we are not a lab that just writes papers. We are a lab that has our hands and feet, dirty, we sort of get ourselves dirty sort of get in there, you know, we test our assumptions, see whether it works, learn from them and in that sense actually the problems that we work on are a lot more real than a purely academic environment. [Music] Sridhar Vedantham: Welcome to the Microsoft Research India podcast, where we explore cutting-edge research that’s impacting technology and society. I’m your host, Sridhar Vedantham. [Music] Sridhar Vedantham: Microsoft Research India is constantly exploring how research can enable new technologies that positively impact the lives of people while also opening new frontiers in computer science and technology itself. In this podcast we speak to Dr. Sriram Rajamani, distinguished scientist and Managing Director of the Microsoft Research India Lab. We talk about some of the projects in the lab that are making fundamental changes to computing at Internet scale, computing at the edge and the role he thinks technology should play in the future to ensure digital fairness and inclusion. Sriram also talks to us about a variety of things his own journey as a researcher, how the lab has changed from the time he joined it many years ago and his vision for the lab. Sridhar Vedantham: So today we have a very special guest on the podcast, and he is none other than Dr. Sriram Rajamani, who is the Managing Director of the Microsoft Research Lab in India. So Sriram welcome to the podcast. Sriram Rajamani: Yeah, thank you. Thank you for having me here, Sridhar. Sridhar Vedantham: OK, you've been around in Microsoft Research for quite a while, right? Can you give me a brief background as to how you joined and when you join and what's your journey been in MSR so far? Sriram Rajamani: Yeah, so I joined in 1999. And , oh man, it's now 22 years, I guess. I've been here for a while. Sridhar Vedantham: That's a long time. Sriram Rajamani: I joined in Microsoft Research in Redmond right after I finished my PhD in Berkeley and then I, you know, my PhD was in formal verification. So, my initial work in Microsoft in Redmond was in the area of formal verification and then at some point I moved to India around 2006 or something like that. So I think I spent about six or seven years in Redmond and my remaining time- another 15 years- in India. So that's been my journey, yeah. Sridhar Vedantham: OK, so this is interesting, right, because, you know, we constantly hear about India as being this great talent pool for software engineers, but we certainly don't hear as often that it is a great place for a computer science research lab. Why do you think a Microsoft Research lab in India works and what drew you to the lab here? Sriram Rajamani: I'm a scientist and I joined MSR because I wanted to do high quality science work that is also applicable in the real world, you know. That's why I joined MSR and the reason why I moved to India was because at some point. I just wanted to live here - I wanted to live here because I have family here and so on and then Anandan started the lab and so somehow things came together, and that's why I personally moved. But if you ask, you know, ask me why it makes sense for MSR to have a lab here, the reasons are quite clear. I think we are such a big country, we have enormous talent. I think talent is the number one reason I think we are here. Particularly unique to India is that we have really strong undergraduate talent, which is why we have programs like our Research Fellow program. But over the past, many years, right, the PhD talent is also getting better and better. As you know, initially when we started, you know, we recruited many PHDs you know from abroad, who had their PhD from abroad and then return just like me. But over the years we've also recruited many PhDs from Indian institutions as well. So, I think that talent is the number one reason. The second reason is you know the local tech ecosystem is very different. It started out as a service industry for the West- you know essentially all of the software we were doing, we were servicing companies in the western hemisphere. But over time, India has also become a local consumer of technology, right? Now, be it if you sort of think about, you know Ola or Flipkart, you know, the country is now using technology for its own local purposes. And because of the size and scale of the country, the amount the government and industry is pushing digitization, there's a huge opportunity there as well. And finally, I would say another reason to have a lab is in a place like India that it's a very unique testbed. You know, cost is a huge concern in a place like India, technology has to be really low cost for it to be adopted here. There are very severe resource constraints. Be it bandwidth…you know if you think about NLP, you know many of our languages don't have data resources. Very unreliable infrastructure- things fail all the time, and so you know, I've heard of saying that you know if you build something so that it works in India, it works anywhere. So it's a test bed to actually build something. If you can deploy it and make it work here, you can make it work anywhere. So in that sense actually it's also another reason. Sridhar Vedantham: OK, so basically it works here it's a good certification that it'll work anywhere in the world. Sriram Rajamani: Yeah, yeah. Sridhar Vedantham: All right. OK Sriram, so here's something I'm very curious about. How does a research scientist end up becoming the managing director of a lab? Sriram Rajamani: So the short answer is that it was rather unplanned, but maybe I can give a more longer answer. You know, I started out, you know, being a researcher like anyone else who joins MSR. My initial projects were all in the area of, you know, formal verification, you know, I built together with Tom Ball something called static driver verifier that used formal methods to improve windows reliability. Then I worked on verifiable design- how can you do better design so that you produce better systems? Then I worked on, you know, security, and now I work on machine learning and program synthesis. And you know, a common thread in my work has always been the use of programming languages and formal methods to sort of understand how to build various kinds of systems be it drivers, be it secure systems, be it machine learning systems. That has been sort of the theme underlying my research. But to answer your question as to how I sort of became lab director, you know, after some years after I moved back to MSR India, you know Anandan who was the lab director then, you know, he left. There was a leadership churn there, and at the time I was asked whether I would consider being the lab director. The first time I declined and because I had many other technical projects that are going on. But I got the opportunity the second time, you know, when Chandu and Jeanette really encouraged me when Chandu decided to move on. I had been in MSR maybe 15-16 years when that event happened. And one of the reasons why I decided to take this up was I felt very strongly for MSR, and I thought that MSR has given me a lot and I wanted to give back to MSR and MSR India. And MSR India is easily one of the best CS, computer science industrial labs in this part of the world. And, you know, it made sense that I actually devote my time to support my colleagues

    28 min

Ratings & Reviews

5
out of 5
2 Ratings

About

A technology and research podcast from Microsoft Research India

You Might Also Like

To listen to explicit episodes, sign in.

Stay up to date with this show

Sign in or sign up to follow shows, save episodes, and get the latest updates.

Select a country or region

Africa, Middle East, and India

Asia Pacific

Europe

Latin America and the Caribbean

The United States and Canada

OSZAR »