Designing for Accessibility (Google I/O'19)

Designing for Accessibility (Google I/O'19)



you see this this is my old hearing aid and this this is one that I wear today what's different this one's flush colored and this one's red it may seem like a small design tweak but it changed my life it made me feel as if I belong again you see right before fifth grade my mom sat me down and she said you need to tell Michelle about your hearing laughs Michelle was my best friend but I hadn't seen her all summer long and two weeks before I was told that I was losing my hearing and it was going to just get worse and worse and worse but I was 10 years old I didn't know how to deal with heavy stuff like this and so when I called her up before school started we talked about our summers we talked about sports we talked about everything but my hearing lapse on the second day of school she was standing behind me and nine and she tapped me on the shoulder pointing to my earring and said what's that it was an innocent question but I didn't know how to respond and so I said to hearing it as if she was stupid not to know and then they're just silence in this silence all of us but the rest of our relationship she never asked me again about my hearing loss and it's been that yeah walking her slowly drift away I thought different as if I didn't belong and she being just ten years old didn't know how to deal with this different many kids the boy difference because they're just not sure what to do with it and so I found out quickly that I didn't want to be seen as different and I began this long struggle to try to prove that although I had a hearing loss it didn't change me I was still normal and I did this by over TV going when I went to college playing just one sport was enough I had played to him I had to go to an Ivy League I became one of the first few deaf lawyers in the u.s. I did some work at the United Nations and then I became a designer but somewhere along the way I realized something I am the new normal you know that TV show orange is the new black well I am the new normal difference is the new normal difference even if it seems like a limitation it's what makes us fly what makes us valuable now I would like to think about this disability encompasses ah of us if we all live long enough we will all get disability at some point in our lives and who yeah has broken their legs or their arms really that's it come on that's an example of a temporary disability but what comes next is key we also are experience something called momentary disabilities now I'd like to pop up a volunteer to come up and help me demonstrate what they are what I'd like you to do yes to pick up that box in those books and then come over while carrying box take sip of the water you're gonna have to open it though yep Toba that was pretty impressive but it was was it easy and hard yes so thank you very much for your help as we go about our lives we encounter situations where we'll be momentary disabled whether we're carrying box and trying to open up a door and so disability really encompasses all of us they just some of us that experience it a lot more than others now as a lawyer but for equality and race gender disability and you would think that I would have been outfitted with the skills necessary to feel accepted and valued by society but to my surprise I found my strongest tools when I transition from na2 design design has this powerful ability to ship perceptions but it's up to you to use it up to you sir finally it happened after the Lascaux I went back to the audiologist to get a new hearing aid and I just threw out because they weren't just these are flesh colored things and more but they made it red ones and blue ones and green ones so I opted for the bright red one and then something magical happened my hearing aid became cool people started saying things like left of red this little thing create this huge shift in my life it allowed to celebrate my difference and it allowed others to join it I'm celebrating this difference with MIT this is because it opened up the door to converse him about difference without being focused on limitations okay thank you at least that was a beautiful talk and it was a very good introduction to what to our story which we call project euphonium so we're gonna start the story by telling you a story about one of our colleagues at Google so this is Dimitri connects key and Dimitri it turns out as a mathematician he's worked at some of the great institutions for mathematics in the world but for the last two decades he's really been thinking primarily about designing for accessibility that is trying to invent technology that was helpful in some way or other so Dimitri himself has a disability he's deaf and he also has a very strong Russian accent so the first time that at least I met Dimitri I found it very hard to understand what he was talking about and but you know hanging out with Dimitri eventually you get the idea so it turns out that our computers have the same problem that is when Dimitri speech to his phone as I might speak to my phone his phone doesn't understand him very well and this is a clip in which he explains that himself Luca says we are a good digital speech recognition but if it was not fun as most people it will not defend to you so what you see from this is that the the phone that was being showed was the phone that was running the Google Cloud speech recognition model and what I would claim is that if you only looked at the phone that you would not be able to really understand the thread of Dimitri's conversation of what Dimitri was trying to communicate and so we asked ourselves the question why is that the case why is it that the phone was not able to understand Dimitri but for example it is able to understand me and in order to explain this I need to tell you a little bit about how speech recognition works and why it is that speech recognition has gotten so much better over the past number of years so when we speak what we're doing is creating a wave form so a wave form is just a sound wave and it looks rather intelligible unintelligible the job that we're asking a computer to do is to take the picture on the left and to somehow turn it into the words that are being said so as you all know humans have gotten very good at interpreting pictures and so the way that speech recognizers work as we first take the wave form and turn it in to a picture the picture is called a spectrogram and it's just a picture of colors but it's still unintelligible as to what was being said and then what we do is take the picture and stick it into a neural network which is a big computer program that has lots of parameters in it and the idea is to make the computer program so that it outputs what was being said now of course just like us if you don't train the computer program it has no idea what the what was being said and so what we do is we take all of the numbers in this computer program there are millions of numbers that you have to tune and we give it one sentence at a time somebody saying something and the computer predicts it saying this and then it gets it wrong and we bang the computer over the head twiddle the parameters around a little bit until eventually by giving it lots and lots of sentences it gets better at speech recognition and we have phones that work for people whom the computer has heard now in order to do that it takes huge numbers of sentences so tens of millions say if sentences need to be given to the computer for it to develop a general type of understanding but the problem is that for people like Dmitri or even or indeed anyone who speaks in a way that is different than the pool of examples that the computer was given the phone can't understand them just because it's never heard the example before and so the question that we asked and this was a question that we started asking in collaboration with an iOS foundation that we've been working with a OS TDI who gave me this t-shirt so I'm day we asked whether or not it's possible to basically fix the speech recognizers to work for people who are hard to understood and dimitri is amazing and he decided to take this on so remember what I said it takes tens of millions of sentences to train a speech recognizer it's completely crazy to ask someone to sit and record tens of millions of sentences but Dimitri has a great spirit and so he sat in front of his computer and he just started reporting sentences and so for example here is a sentence what is the temperature today and so the computer would say what is the temperature today and Dimitri would read what is the temperature today and he sat there for days recording these sentences until we had reported upwards of 15,000 sentences and we then decided to train the speech recognizer to see if it was able to understand him and I should tell you that none of us knew whether or not it was even conceivable that this would could work because as I said it took many more sentences to train the thing in the first place for many people who speak in a way that is more typical for speech recognizers so here's Dimitri at the end he was still happy after doing this and then here is the I'm now going to show you a quick clip of what happened we need to make all interactive devices be able to understand any person who speaks to them and so what you see is is that the device on the on the right was able to understand Dimitri whereas the device on the left which is the Google cloud device was not and this really gave us confidence that it was possible to make progress on this task and so we started working in earnest with our collaborators als TDI in which we recruited they recruited a large number of people with a OS to start recording sentences to see if this works now of course getting someone to record 15,000 sentences is completely crazy that's never going to work at scale and so instead we were investigating technically whether or not it's possible to make progress with smaller numbers of sentences and what I can report you is that we're making progress we're not there yet we do not feel that we've solve this problem in any way but we're working hard and there are groups of engineers at Google who are working hard and this is just a little example so the last column is the ground truth phrases the rightmost column is what Google cloud recognizes on this particular person who happens to have a OS and the middle column is what our recognizer is right now doing and we're hard at work trying to figure out if it is possible to make this work for people without requiring so much training data so this is Dimitri as of this week so Dimitri now carries around with him about five different phones in his pocket each of which has a different speech recognizer on it and he and he's testing and trying to figure out the best way and it is our hope that if we can get this to work with Dimitri's help and with all of your help and hopefully people will record make recordings for us the reason for this call for data that sundar made is that we need more data from people just recordings to be able to make this work hopefully we will get there that is our goal and so this sort of visit give general goal of euphonious mission which is what we would like to do is to improve communication technology by including as many people as possible whatever features that the people have and whatever means to communicate of course speaking is an important way of communicating but it is not the only way that we communicate we communicate with each other by looking by feeling by doing so many different things and there are people who don't have the ability to speak and so now I'm going to turn it over to Irene who will start to talk about other speaking modalities all right thanks Michael all right so so far we've talked about Dimitri and about speech but what about other forms of communication what about folks who can't communicate verbally we want to show you how we're approaching the research for those types of cases as well so for that I'd like to introduce our second protagonist for the day the amazing Steve Saleen he's an incredible person he had a brilliant career as a landscape architect and when he learned that he has ALS he said about to rethink how people with his condition get care he also started thinking about how he could leverage technology to create more independence for himself so that he didn't have to rely as much on other people to take care of him and one thing he helped do was he helped create a smart home like system that lets him request an elevator and close the blinds turn on the music all by using his computer it's really amazing so Steve happened to be one of the perfect persons to partner with for this research because he is a technologist himself and speaking of computers we want to show you how many folks who have ALS communicate today they use something called an eye gaze pointer to type out letters one by one so these are two different systems that they can use either a keyboard or something on the right called Dasher and it works it does a job but if you can imagine it's just a little bit slow and what he's missing is a layer of communication that all of us are familiar with interruptions mannerisms jokes laughs synchronous communication that comes by quickly that's something that's really hard for Steve and people with his condition to do so something we wanted to try with him was to see if we could if he could train his own personal machine learning models to classify different face expressions and the thought was is this even useful for him to be able to trigger things more quickly so that he might be able to open his mouth and trigger something on the computer or raise his eyebrows and trigger something else it was a question it's a research question and we didn't know the answer so with Steve's feedback his ideas and a lot of testing we developed a machine learning tool that anybody actually can use to train classification models in the browser and by classification I mean a model that tries to predict what category a certain type of input belongs to let me show you an example to see how it works this is my colleague Baron and he's training two classes one to detect his face and want to detect this really cute cat pillow that he has so he's giving the computer a bunch of data he's training it waiting for it to finish and then he's testing the model on the right and then he publishes publishes the model all of this is happening in the browser in real time and the images the processing is happening in his computer's so the images aren't being sent to a server it's all happening is in his computer in the browser so we're calling this Tito machine it's a tool for anybody to train machine learning models in the browser without having to know how to code and it's actually built on top of tensorflow j/s so all of the underlying technology is free and it's open source for you to use so okay how is Steve using this well as I mentioned he's training face classification models for cases where he might want a faster response time that what he can achieve with his eye I guess pointer and teach one machine is the prototyping tool that's allowing him to do this and explore what types of use cases are actually helpful for him so why is why is this useful well Tito machine is situational in two ways right ALS actually changes over time so people with the condition they deteriorate over time so Steve might be able to do an expression today that he can't do in a year he has to be able to retrain those models on his own perhaps week by week month by what month by month as he needs it and this and the second thing is that you might imagine that he might want to use different models for different use cases one thing that he actually tried was training a model that would trigger an air horn like a sound of an air horn when he opens his mouth and to trigger a boo when he raises his eyebrows and he used it one night to watch a basketball game with one of his favorite seems to react quickly to the game as I progressed unfortunately that night his team didn't win but it was actually really fun to set up so we've got a long way to go with this research this is really only the beginning and we hope to expand the tool to support many more modes of input the tool itself will be available later this year for anyone to train their classification models but as I said before all of the technology is already available on tensorflow Jas we're committed to working with people like Steve and Dmitriy to make their communication tools better and the idea really is to start with the hardest problems that might unlock innovations for everyone but it starts sincere hope that this kind of research might help people with other types of speech impairments people with cerebral palsy or Parkinson's or multiple sclerosis and maybe perhaps one day it could be helpful it's even more people people who freely communicate today maybe you like folks who have an accent in a 2nd language and in fact we started calling this approach to building start with one invent for money we think it's we think anybody can work this way and you can apply to many more types of problems the idea is actually quite simple to start by working together with one person to solve one problem and that way you can be sure that what you make for them will be impactful to them and the people in their lives and sometimes doesn't always happen but sometimes what you make together it can go on to be useful to many more people start with one invent for many if you'd like to hear more about this project and start with one if you'd like to hear more about Samia trees Steve and actually played CTO machine we have all these projects in the experiment sandbox tent which is actually really close to the stage and finally lastly we'd like to invite you to help this research effort as Michael was saying we don't expect people to train fifteen thousand phrases in order to get a model like this so we actually need volunteers to share their voice samples with us so that we may one day generalize these models so if you or anyone you know has heart under says speech we'd like to invite you to go to this link and submit some samples and hopefully one day we can make these models more widely accessible to everyone thank you [Applause]

Related Posts

One Reply to “Designing for Accessibility (Google I/O'19)”

  1. Google Developers, how can I help correct the closed caption when I see "hearing laugh" and "hearing lapse?" I think she said "hearing loss." It seems the automatic closed captioning is not picking up the "ss" sound when she says "hearing loss."

    4:03 "Disability encompasses ah of us" should be "Disability encompasses all of us." Seems to me the automatic closed caption thinks she said "ah" instead of "all."

    4:15 For confirmation, did she say "yeah" in "who yeah has broken their legs or arms?"

    6:26 "From na2 design." I don't know if I was lipreading right, but didn't she say "from my to design?" Could automatic closed captioning system do lipreading in the future?

    7:20 Again, I did some more lipreading and I think she said "it allowed me to celebrate" and the dynamics of her voice caused automatic closed caption to miss "me" after "allowed" as in "it allowed to celebrate."

    7:58 "So this is Dimitri connects key" should be "So this is Dimitri Kanevsky."

    20:57 "Start with one invent for money" should be "start with one invent for many." At least she said "start with one, invent for many" the second time and automatic closed caption got it right.

Leave a Reply

Your email address will not be published. Required fields are marked *