OrCam Hear: Leveraging AI Sound Source Separation to Tackle the Hearing in Noise Problem

OrCam is leveraging its expertise in AI-based devices to address the cocktail party problem in hearing technology with the introduction of OrCam Hear. The device, which made a notable debut at the Consumer Electronics Show (CES), uses advanced AI to isolate and enhance specific voices in noisy environments.

This innovation aims to significantly improve the speech-to-noise ratio, making it easier for users to participate in conversations even in loud settings. The OrCam Hear consists of earbuds, a smartphone dongle, and an app that allows users to select and focus on particular speakers while filtering out background noise. The technology, rooted in OrCam’s successful development of assistive devices for the visually impaired, offers a groundbreaking solution for those with hearing difficulties, including individuals who do not require traditional hearing aids but struggle in noisy environments.

To learn more about the OrCam Hear device, visit: https://orcam.co/ThisWeekinHearing

Full Episode Transcript

Hello, everyone, and welcome to This Week in Hearing. I don’t need to tell anyone that solving the cocktail party problem is one of the biggest challenges for hearing device makers. Many times I’ve written or spoken about advancements in AI based approaches, which in the end all have the same goal, to dramatically increase the speech to noise ratio for those voices one wishes to hear. The ability to identify and isolate a desired speaker when another, perhaps equally loud voice is also present is an important part of the challenge. One company working on exactly that is OrCam, a company built on creating AI based devices for people with low vision that can identify objects, recognize faces, and read text. It is that experience OrCam is leveraging to enter the hearing space with a device called the OrCam Hear. Orcam made a splash at CES, demonstrating a prototype this year with the publication and gadgets review editor calling it surprisingly effective with me to share more OrCam’s director of marketing and business development in Inbal Saban and CTO Tal Rosenwein. Welcome to you both. Let me begin by asking you to share a bit on your own background and that of OrCam. Tal, let’s start with you. Hi. Nice to meet you. Thanks for having us here. I’m Tal Rosenwein. I’m the CTO of Orcam Hear and also serve as an adjacent lecturer at Tel Aviv University in audio, combination of audio and artificial intelligence. Thank you, Inbal. Hey, of course. I’m really happy to be here and to attend your amazing podcast. I’m Inbal. The director of marketing and business development at OrCam Hear. I’ve worked in the medical devices industry for the last several years focusing mainly on marketing and business development activities. Okay, thank you. And would you also describe a little bit of OrCam’s history? How long has OrCam been in business? The evolution of the products for blind and low vision people, and how that led to the Hear device? OrCam was co founded in 2010 by professor Amnon Shashua and Mister Ziv Aviram. We’re also the co founders of Mobileye, which is the largest exit that happened in Israel ever. And they are also entrepreneurs and co founders of other AI related, state of the art products in the fields of natural language processing. Generative AI and so on. OrCam’s vision is to harness the power of AI to create high end. Assistive technology that empowers people. We started with the OrCam Myeye, which is an award winning wearable. It has a camera speaker and the possessor. The OrCam MyEye assists the blind and visually impaired by transforming or transducing the visual words to auditory content. Basically, the device is a wearable that clips to every eyeglasses. Like regular eyeglasses. It serves as an AI assistant. In general, we can share shortly that our products are distributed worldwide. You can really see how using this high end technology that we translated into tailor made solution for those specific problems, how it transforms the way our users experience the world around them. That’s an amazing set of technologies for blind and low vision people. And now, of course, you’re bringing that same experience to the OrCam Hear. Inbal please describe what are the components of the OrCam Hear and how would I actually use it at a dinner when, say, multiple people are sitting at the table and maybe there’s loud voices at the table next to me I don’t care about. Okay, so in general, the OrCam Hear has three main components. It has hearing buds it has a dongle that you insert into your smartphone. And it has an app. And what it essentially enables is to mute out, tuned out all the people that you don’t want to hear. And it enables you to select the specific people that you want to hear. So people with some kind of hearing loss. One of the most painful problems is the cocktail party problem that you described at the beginning of the podcast. And what this device enables is to overcome this problem. Amazingly even people with hearing aids that one of the best motivations to purchase hearing aids is to overcome this problem. They still suffer from this problem, and it affects them not only physically, it’s not only they can’t hear, they can’t actively participate because of this. So it affects them also emotionally and socially and psychologically. And with this device, that enables you to choose the specific people around the table or in a meeting that you want to hear. It enables you to participate in family occasions, in business meetings, in situations that are important for you to bring value, and you are able to bring value online. Only if you can select the specific people that you want to hear and tune out all the other voices that are irrelevant for the conversation. So, essentially, the device has mentioned the earbuds, the dongle, and the app. And with the app the user can choose the specific people he wants to hear. And then automatically, the device, like the OrCam Hear, enables him to hear those specific voices and tune out all the rest. It was going to ask, how do you use the app? How does the app identify different voices and how do you select them then? As Inbal mentioned the Orcam Hear is a hybrid hearable solution. It has two earbuds, which are inside my ear at the moment. It has a dongle. The dongle has a microphone array, as you can see here, and le audio, and the mobile app that does all the AI and the heavy lifting. Okay, so basically, the mobile app, you have an icon of a person. And so basically, you press the icon of the person in the middle of the screen. You drag the person towards the direction you want to hear the person you want to hear. I You then release your finger. The icon stays on its direction, and you see an icon of an arrow. And now you can hear the person in the direction of the arrow. And if you need to align it better with the direction. So we basically just move around the arrow on the screens app. And after the speaker is talking for a few seconds, we automatically learn the voice print of the, of the speaker. And from this from this point forward, the app can automatically, the OrCam hear can automatically track the person as they moved, if and as they move during the conversation. Meaning if I’m not stationary in my place and I start moving around. So, and while I’m speaking, it will track me in real time, and you will be able to hear my voice and only my voice as I speak. And, of course, if you want to hear more than one person. So we just need you can. We currently support four people, four concurrent speakers. And if you want to stop hearing the person so basically just tap on their icon, and the person goes away from the conversation. It’s June out. Okay, so if I’m talking with, say, one person, and I’ve identified them through the app, and then a second person walks in, I can identify them later on in the conversation, and then they, too, will be passed through. Exactly. You just need to swipe your finger towards their direction, and the outcome here will do the rest. Okay, and so then you’re doing sound source separation there in order to identify individual voices from the crowd, correct? Exactly. Yeah. Andy, just to emphasize. Usually in the hearing aid arena, the fact of having interface is not that common, because usually people just put hearing aid on their ear and they are unable to, like to amplify voices. In this specific product, in this specific occasion, we have to have this interface to make the best experience for the user to be able to choose specific voices. So we had a challenge here during the development phase to make this user interface as simple as possible. And we worked very hard to achieve this goal, and we are almost 100 percent there. We have a very, very simple, intuitive user interface that we already checked with users and get their feedbacks to make sure it would be as fluent, as transparent as possible. Okay, and let me ask you about the audio flow. And here my end concern is about latency. So the audio is being picked up by the dongle or by the microphones in the earbuds. And then, what’s the path? What’s the flow of the signal and the sound source separation? And what’s the total latency of the voices coming in? Okay, so, basically, the audio is being collected by the microphone array located on the dongle. Then it goes to the mobile app of the smartphone currently on iPhone. So it’s MFI. The iPhone does all the heavy lifting, and then it transforms it back to the dongle, transmits it using Le audio to the earbuds. And this end to end process takes around 200 milliseconds, and in two weeks, it will take 100 milliseconds. So, basically, you can think of the latency as a trade off. Like, in order to get this performance, you need to have much more computational power than standard hearing aids, meaning current hearing aids perform amazing in most of the situations. But in this specific situation, where multiple speakers, multiple interfering speakers, and background noise like omni background noise, or even from a specific direction, which is more challenging. So the challenge is larger compared to our regular scenario. And in this particular situation you need to have more computational power, and by doing this, you pay in latency. But on the other end, it enables you to have a clear sound, intelligible, and much reduced cognitive overload, meaning you are calm and you hear only the desired speaker. Without hearing interfering speakers or the background noise. No, that makes perfect sense. I can see the trade off. Because 200 milliseconds or even 100 milliseconds makes lip reading cues a little bit hard. But on the other hand, if you’re dramatically increasing the speech to noise ratio, many people will be less reliant on lip reading cues. So the trade off is definitely there. But I could see it could be a very useful one if you can dramatically reduce the noise content. What about occlusion of the earbuds? One of the things I’ve seen in testing a number of hearing devices is if you’re actually having a meal, for example, if there’s too much occlusion, it’s hard to hear while you are actually eating. How is the occlusion in this device? So, in order to overcome a occlusion, we enable the user, like, their own preference, to mix some of the ambient audio into the enhanced audio. So it’s basically like a transparency mode in hearables. So we let the user decide how much of the ambient they want to hear, and we provide, like, the clean stream, and they can noise, like, use the ambient as they want according to their preferences. Okay. I can see a lot of useful reasons to have that, aside from occlusion, too. So, for example, if you’re in a group of people and there’s music, you might want to dial and allow some of the music to pass through the ambient while still improving the speech of the people you’re talking to. Exactly. Yeah. Okay, so, very interesting. Now, if I’m standing at the actual cocktail party, standing with a group of people, then I would hold the phone in my hand. Correct. I couldn’t put it in my pocket because the microphones are in the dongle. Right. So you are right. And currently, we support situations where you place the. The smartphone over a table or a horizontally stable furniture or the like. And we’re currently working on something that will enable us to hold the smartphone using our hands, and then we will work also as if it was on the table. But it’s under development. It’s still under development. The most common situations that you need this solution for the cocktail party problem is basically for family occasions that you meet, like, all family members and for elderly people, the most painful situation is when they are trying to hear their grandsons struggling with that. So these situations are basically, most of them are around the table. Also, like business meetings, in most of the cases, will be people sitting around the table and discussing. For this case we already have a perfect solution. And as Tal mentioned for, like, standing mode when you are standing and not sitting around the table. We have a good solution. For now. And very soon, we will have, like perfect solution also for this occasion. Okay, excellent. Thanks for explaining that. And I’ll add that more than once in a stand up situation, I’ve said, don’t mind me, if I hold my remote microphone, you can point it at somebody and then the directional mic is going forward. So it’s not out of the question that you would hold the phone in your hand with a bit of explanation what you’re doing there. But you’ve also alluded to the future direction that you’re going to take after this first generation including on the website it says that, and I’m quoting here, OrCam hear technology will expand its reach, seamlessly integrating with third party hearing devices, offering users unparalleled clarity and convenience across a wider range of hearing aids. So that seems to imply that your technology could also end up in different people’s devices in different form factors. Paul tell me more about that. So actually we have two main business channels. The first one would be b2c channel. B2c channel would be a platform to offer customers, the users the full package of the earbuds, dongle and application. As mentioned before. The second channel, as you just mentioned, Andy, is b2b. In this option, the customer will be, he will have an option to purchase only the dongle and the app that will be integrated into his hearing aids. At the moment, we are in a process with few hearing aid companies to enable this option. Hopefully it will be completed soon. And in this case, once we will have these two channels, I’m taking you to the next phase. The customer that would decide to purchase a first phase, the full package of earbuds. Don’t get amplification, after a while when he will decide that it’s about time for him to take the next step for hearing aid solution. He will be able still to enjoy the solution for the cocktail party because we hopefully will be integrated into his hearing aid. So he will be still able to use the dongle app that he purchased a few years ago that will be integrated into the hearing aid solution. Okay. And with le audio capable hearing aids, I suppose you could still sell it b2c, and somebody could then link to the dongle with le audio. Is that correct? Yeah, exactly. As long as you have le audio, we will enable to communicate with your earbuds. I just want to point something that Inbal said, to emphasize something that Inbal said. And it relates also to the latency issues you mentioned earlier. One of our target audience is. Who find it challenging to understand speech in noise. This population is not necessarily people that are currently wearing hearing aids, just statistics. So around 20% to 30% of people who enter audiologists have, like, their pure tone tests are okay. Like, they are, they don’t need hearing aids, but they still have a they find it. But they still find it difficult to understand speechy noise. For those people, we can also enable them to participate in multiparty conversations and to socialize again. So why, the reason I’m saying it because you asked before about the latency for these people, 100 or 200 milliseconds of latency is not, is not a lot like it’s really okay, because they just can’t, they can’t understand what’s happening in this situation, and they find it confusing. And even worse, they don’t want to attend this kind of family dinners, restaurants and so on. We enable people to socialize again. Well, and I’ll amplify that point a little bit, because when I spoke at the Australian College of Audiology conference a couple of weeks ago, I used a LinkedIn Post from Nikki Chong White of the National Acoustics labs, because she spoke about trying the Apple Live Listen system with her husband, who wears hearing aids. And she said that it was quite satisfactory, that it made it much easier for him to understand her speech when he was using live listen. And that would have been in MFi mode of the iPhone transmitting to the hearing aids. And so I asked her if latency was a problem on LinkedIn, and she said, no, it wasn’t. The experience was really good. Well, I measured the latency and the latency, and I used the GN Resound Nexia hearing aids because I had a set and they’re le audio audio capable. So I used the Nexias in Apple live listen mode, and well, that actually would have been MFi in that case, but I measured the latency at 83 milliseconds. She, she and her husband were quite happy with 83, and I actually then popped in a set of AirPods pro into my artificial ear and measured it, and it was 93. And a lot of people are using live listen with AirPods. So to amplify your point, as high as 93, at least I could verify that, you know, people are very happy with that experience. And it also, also, as part of that discussion at the conference, was how the standardization of low latency Bluetooth is going to enable a lot of innovation in the hearing space. And yours is a prime example, because before Le audio, you would have to work with each hearing aid company’s proprietary transmission system, and now you can use Le audio and offer it more broadly. So I think this is a great example of how that will actually take place. But I’ve also been following the advancements in DSP development. And I’ve got to imagine it’s only a matter of time before you’re able to run this, first on your dongle and then perhaps down the road natively in the earbuds. What kind of development timeline do you foresee for that happening? It’s a great question. The world is moving towards embedding things. For, like, running things on the edge. Currently we’re working on our next Generation will be, as you probably, as you mentioned, all in here. I think, like, our main focus currently is making the UI more seamless. Like, we wanted to fit elderly people. So everyone can enjoy it. And it will be like you just plug and play, you know, just put. The earbuds in your ears and everything works seamlessly. So we’re currently working on this. I think the next directions that I assume the entire industry will go will. Follow is things related to generative AI, meaning how to operate the device using free language, or how to interact in multinationalities. Right. Like the translation or how to, like, the hearing aid will enable you to have like a smart assistant and so on. So I think we’re going to concentrate this part at the beginning and. Then moving forward for embedding the technology into the dongle itself. Okay. So in other words, making the user experience more seamless, especially for people who may only be marginally tech adept. And then. Okay, okay, makes perfect sense. Let me kind of step back to the present for a moment and ask when will this generation of OrCam Hear actually be available and how would people buy it? So the product will be available very soon. We’re talking about like one, two months from now. And I want to say another thing about that. We talked about the cocktail party, but I think that what the OrCam Hear enables or will enable it’s much more than that because. People have few barriers to use hearing aids. Hearing assistance. First is the shape of the hearing aid. People don’t want to go with the stigma of using hearing aids. There is a great barrier there. And the second is the price. The price of hearing aids is super high. We are talking about, like, for two hearing aids, it could reach even to. $7,000 or more. So this amazing device overcomes those problems because the shape of the buds is like hearing buds and not hearing aids. So it looks much more. Fashion with the right shape and you don’t have to go with the stigma. The price is much more reachable than the price of hearing aids. And since it overcomes the cocktail party situation it’s an occasional solution, like reading glasses, that you use your reading glasses only when you need to read, So in this case, you can use your OrCam Hear only when you are in a cocktail party situation. So those people, there is a gold number of five to seven years for people from the moment they realize they have some kind of hearing problem until they make the decision that they want to go to the audiologist and purchase a solution for this period of time. We talked before about the b 2c and b 2b. This solution is a perfect solution for this period of time because you don’t feel like it’s about time to purchase the hearing aids. But you do need a solution for this period of time. So the occasional solution in much reachable target price in the shape that you would feel comfortable to wear a this device, it’s a perfect solution for these kind of people that don’t feel it’s about time to go to the audiologist and purchase their first hearing aids. So going back to your question, this solution, this device will be available in the market in like one, two months from now. One channel to purchase this device would be our online website, OrCam’s website. And hopefully it will be available through other retailers that currently we are working with. Okay, thank you for that. And actually, you brought up a point earlier that fits in exactly with what you said now. And that is people who have normal audiograms but still have trouble hearing in i noise. That’s not a small population. I saw two different, two different assessments. One of them by the NAL and the other, I think, came from Doctor Beck. The NAL version talked about 25 million people in the United States who have trouble hearing in difficult situations, but have normal audiograms. Well, if you take, say, the US has about 5% of the world’s population and there are 25 million people just there that’s a lot of people then who have difficulty hearing in noise but have normal audiograms for which it’s more about solutions like directional microphones or speech and noise separation, that are more effective than amplification with a hearing aid. So it’s a large population you’re aiming at serving here, and I wish you success in doing so because it means a lot of people are going to find it easier to converse in social situations and therefore will not be inhibited from doing so. So thanks for the conversation. Conversation. Tal and Inbal. And I appreciate you joining me to share how you’re tackling the cocktail party problem. I’m really looking forward to trying it myself when it’s out on the market. Do you have any closing thoughts? Okay, so thanks for the question. I think that we have a great solution in our hands, but more than that, I feel like it’s a great luck to work toward a vision to make this world a little bit better place to live in with the great co workers that wake up in the morning with the same feeling. And I really appreciate this opportunity to be part of this group of people towards this amazing vision. And I have to tell you that each time that we present this OrCam Hear to people, to hearing aid companies, to potential customers the amazement they feel when they use this device for the first time make all of us feel like we are in the right place. Tal, any closing thoughts? Yes, like, our vision is to enable people who struggle understanding speech in noise to rejoin the conversation. And this means a lot for us. Like, this is why we walk, eat and sleep while thinking about how to overcome the huge challenges in developing such an amazing product and getting feedback from users that we see that we change their lives, enabled when enables them to socialize. It’s priceless. And we hope to spread the word like in your amazing podcast. So, thank you for the opportunity. Oh, it’s my pleasure. And I love the passion you both bring to this topic because, I mean, solving the cocktail party problem means that people are more comfortable socializing. They don’t feel inhibited from doing so, and they lead a more healthy and engaging lifestyle. I love it. And I really appreciate you two spending some time with me. If people would like to reach out to you as a result of this conversation, how would they do it? So the traditional way would be obviously through OrCam’s website. There is a dedicated section for the OrCam that we would love to share in this podcast details. And obviously also through LinkedIn, you can reach out to me or to Tal, and we will be more than happy to connect. Well, thanks again to you both, and thanks to everyone for watching or listening to this edition of This Week in Hearing

Be sure to subscribe to the TWIH YouTube channel for the latest episodes each week, and follow This Week in Hearing on LinkedIn and on X (formerly Twitter).

Prefer to listen on the go? Tune into the TWIH Podcast on your favorite podcast streaming service, including Apple, Spotify, Google and more.

About the Panel

Tal Rosenwein is the Chief Technology Officer (CTO) of OrCam Hear and an adjunct lecturer at Tel Aviv University, specializing in the integration of audio and artificial intelligence. With extensive experience in developing AI-based devices, Tal is dedicated to advancing hearing technology to solve complex auditory challenges.

Inbal Saban serves as the Director of Marketing and Business Development at OrCam Hear, bringing years of expertise in the medical devices industry. Focused on leveraging AI for innovative hearing solutions, Inbal plays a key role in promoting and expanding the reach of OrCam’s groundbreaking auditory products.

Andrew Bellavia is the Founder of AuraFuturity. He has experience in international sales, marketing, product management, and general management. Audio has been both of abiding interest and a market he served professionally in these roles. Andrew has been deeply embedded in the hearables space since the beginning and is recognized as a thought leader in the convergence of hearables and hearing health. He has been a strong advocate for hearing care innovation and accessibility, work made more personal when he faced his own hearing loss and sought treatment All these skills and experiences are brought to bear at AuraFuturity, providing go-to-market, branding, and content services to the dynamic and growing hearables and hearing health spaces.

OrCam Hear: Leveraging AI Sound Source Separation to Tackle the Hearing in Noise Problem

Full Episode Transcript

Leave a ReplyCancel Reply

Helpful Links

Sections

HHTM eBooks