Host Andrew Bellavia takes us on an in-depth exploration of Phonak’s latest hearing aid platform, Infinio, which features a number of benefits both for the hearing professional at the fitting and for the end-user in daily life. The most exciting innovation is in the model named Sphere which incorporates a powerful deep neural network chip to perform speech and noise separation in an ear-worn device for the very first time.
Andrew speaks with three key Phonak team members to go beyond the headlines while providing a basic overview of AI in hearing devices, and why Sphere is a seminal moment in hearing device innovation.
Christine Jones, currently Senior Director of Marketing and VP Audiology at the time Infinio was kicked off, takes us through everything new across the range of products. Henning Hasemann, Director of Deep Learning Engineering and the person who led the team that developed the AI model in Sphere, describes how they remove noise in real time while maintaining natural sound across different languages, speaking styles, and environments, naming the specific benefit to hearing impaired people v. classical acoustic methods employed in hearing devices today. Then Stefan Launer, VP of Audiology and Health Innovation, provides additional context on the broader implications of this innovation for the future of hearing care, offering a glimpse into how AI and sophisticated engineering are set to redefine the possibilities in hearing technology across a continuum of needs.
Full Episode Transcript
Hello and welcome to This Week in Hearing
I no longer remember how many times
I’ve opened the podcast
by pointing out that speech
in noise is THE last frontier
in hearing device performance
and therefore the focus of more than one
recent product announcement.
Sonova is no exception.
After a period of incremental improvements
comes a major new line for Phonak
called Infinio, with several models,
including a custom rechargeable and a CROS,
all based on a new chip called ERA.
But the big news is
the addition of a second chip
called DEEPSONIC in the Sphere model.
DEEPSONIC is designed
specifically for providing real time
separation of speech from noise,
a first for an in-ear device of any kind.
We’ll get into that,
and consider it in context of overall AI development.
But first, let’s hear from Christine Jones
what’s new across the Infinio line
and how it impacts both hearing
care professionals and their patients.
I met up with her at Phonak’s media event
in New York City a week before the launch.
I have with me Christine Jones.
She’s the senior director of marketing for Phonak
and a research audiologist by background.
Thanks for joining me, Christine.
Thanks for having me, Andy.
You’re welcome.
Tell people a little bit more
about your background. – Yeah.
So I started as a clinical audiologist.
I worked in pediatrics and adults
always had a passion for Phonak technology
working in pediatrics.
I had a lot of experience with it
clinically and joined Phonak
as a clinical trainer about 150 years ago.
And have done various jobs in audiology,
including starting and running the
Phonak Audiology Research Center [PARC],
which was our first clinical research lab
that Phonak opened in the US,
and more recently took the job
to run marketing for Phonak
in the US, where
I could take all of that clinical knowhow
and really try to apply it
to our marketing
and our brand communications
and really trying to focus on,
on a strong clinical
messaging tied,
tied closely to the to the needs
and desires of our HCPs.
– So you really bring a good broad
background to what you do today.
– I hope so.
I also have a lot of people in my life
with hearing loss,
and so I live it every day
and I’m really passionate and in what
the technology can do and people’s lives
and the importance of hearing care
in people’s lives.
So it’s a good job to have.
– So let’s talk about that technology now.
We’ve covered the deep neural network,
the speech and noise separation already.
But in this product line one of the six
products has that capability.
But the other five
have a range of other improvements
that make the experience better
than the predecessor models, correct?
-Yes. I think there’s a few things
to talk about here.
Sound quality was really paramount
in the design of this product.
And of course, sound quality
for speech in noise
remains a top consumer need.
But we felt like there was room
across the board to really try to optimize
the experience, both from that moment
of truth at the first fit,
and then when that repeat user goes out
and re-experiences,
you know, music and speech and,
and all of the different surroundings
and how to just wow
somebody across the board.
We did a study in PARC
when we implemented that change
and we put users in hearing aids
for the first time in the Infinios,
and we found that compared to
a competitive device, 93% of the users
had a spontaneous preference for that
APD first fit that we implemented here.
So we believe from an HCP standpoint,
where you want to put
hearing aids on a person and really
just bowl them over with delight
from that first moment that they’re
going to get that reaction.
– Okay.
So in other words, 93% of the time
people are satisfied with the first fit.
– Exactly right.
– Okay.
And then in other cases, you come back
and do real ear and all the rest of it.
If a person’s perfectly satisfied
with the first fit
93% of the time.
– Right.
– Okay.
Which has interesting implications for
telecare as well, because if I, for
example, I’m in a rural area,
– Uh huh
– You could then theoretically
give a pretty good experience
with a remote first fitting,
even if they come back to an audiologist
in person later for fine tuning.
– Yep
– But you’re able to deliver
a pretty good experience for a person
who cannot easily get to
– Yep
– An in-person clinic.
– And for sure, you know, we enable people
to do fine tuning remotely as well
for patients that don’t
happen to be able to come in
easily, or are remote.
But yeah, very highly satisfied.
Sound quality out of the gates.
– Okay, interesting.
But you also have a thing
called the AI dome proposer.
What is that?
– Okay, so this sounds like a funny thing
for us to talk about in a world where
the other topics that you’re exploring
with this device are,
you know, 4.5 million connections in a
deep neural network with online
real time signal processing.
You know, the dome feels
like kind of a low tech
pivot from that.
But the reality is everybody fitting
hearing aids knows
that those fittings are won and lost
with acoustic coupling
and that you can give it the best fancy
signal processing in the world
and put the wrong acoustic coupling on it
and really have a mess on your hands.
And that is something that is very hard
to fix on a
remote teleconsult.
And so the AI dome predictor has been
trained across multiple dimensions.
So it’s looking at indicators
of satisfaction and use time and benefits
and also
all kinds of factors that we could pull
out of the target software
in order to create a proposal for the HCP
about what is going to be the best dome
for both audiological performance
and long term satisfaction.
And so with that kind of training in mind,
we’re giving audiologists now
the advice of what is the best possible
starting point for this patient to achieve
those outcomes, which is generally
what the HCP is after
and limit the chance that they’re going
to have to come back and address
the subsequent sound quality issues
or other tolerance issues.
And again, lead to that best impression
of sound quality possible.
– Okay.
So the dome proposer then is working on
the basis of the patient’s audiogram.
Correct. And then what?
Then from the audiogram,
you know what gain profile
you’re going to deliver
and then you’re recommending
what dome would work best with
that gain profile.
– 100 percent, but also using multiple
dimensions, including long term
use in treatment adherence,
a lot of things to layer on to even just
what is going to give you the gain
that you need, but also what’s going
to give you the long term satisfaction.
– Okay, got it.
So with the new chip,
the ERA chip, you’ve further
improved the sound quality
and now with the dome proposer,
then you’re going to make sure you deliver
the optimal experience by recommending
what would be the best dome
for that audiogram and that fitting.
– Exactly right.
– And then you also have this thing
called the acoustic optimized vent.
– Yep.
– What is that?
– So that is related to products
that we’re custom fabricating,
whether it be an earpiece or a
custom ITE. And this is a way
of looking at all of the acoustics
of the ear in order to ensure, again,
that right balance between sound quality
and ideological benefit.
And so this is a tool that’s been in place
for a while and in combination
with our biometric calibration,
which helps us to optimize
the fitting of the device and the
maintenance of directivity with the device
between those two things
solving for both sound quality
and ideological performance.
We believe that gives you
a really great starting point
for anything that you’re going
to fit custom in your patient’s ear.
– Okay, so essentially the
shape of the ear
canal is partially dictating
the construction of the vent.
– Exactly right.
– Okay – Yeah – Okay
And you mentioned,
since we’ve talked about custom products,
you also have a custom rechargeable now.
– We do! So Virto
Infinio rechargeable is a
fully rechargeable product
that has all the benefits of Infinio.
but for patients
who want that custom form factor
and also the convenience of
rechargeability.
– Okay.
And how does that physically work then?
Is it difficult
to put the hearing aids in the case?
How are you doing in a custom form factor
to, you know, make a good contact
and charge the hearing aids, especially for
somebody who might have dexterity issues?
– Yeah, thanks for asking.
Actually, the design of the charger
was one of the key priorities
in the development of that product that we
didn’t want something that was clunky,
difficult to maintain contact with.
And so you’ll see that
there’s actually a magnetic connection
between the charger and the hearing aid
that even if you were to shake the thing
and have them tumble around a little bit,
they would come back and reseat themselves
in exactly the right position
to maintain charging.
So it’s quite an easy thing
if you just sort of drop them in the right
direction, they’ll snap in
and give you reliable charging.
So very easy to use and very friendly
for anybody with dexterity issues who
often in general
can be very successful with a one piece
custom product.
So getting it in and out of their ear
is fairly simple, and then dropping it in
the charger has really been developed
with simplicity.
– Okay. And I know personally from having,
you know, started with the Marvels
that you’ve been
using Bluetooth Classic and so you have
universal connectivity.
– Yep.
– And I’ll tell you, even the Marvels
when I got the Marvels, now Bluetooth
earphones have improved a lot over time.
But when I got the Marvels in 2018,
I thought, these are the best
Bluetooth earphones I’ve ever had.
Like they paired faster and connected
more reliably
than half of the consumer products
that were out there.
But you’ve made further improvements.
What are those?
– That’s right.
So another piece of sound quality
and customer satisfaction
these days with hearing aids
very much has to do with the connectivity
to all of one’s digital universe, right?
So with the ERA chip and Infinio,
the wireless transmission
power is four times
greater than previous generations.
And so, for starters,
– Still having all day battery life.
– All day battery life.
So with an Infinio RIC,
you can have with even
with the Sphere product
where the DNN is activated,
you will get a solid
16 hours of use per day, including 8 hours
of streaming and activation
of the DNN in noise.
So of a full day of use.
And the key with the connectivity
now is we took what was already
the most universally capable
hearing aid, which was the Lumiity
with the universal Bluetooth.
Still the only products that have
universal Bluetooth connectivity.
And we’ve made that transmission
power four times stronger.
So the seamless connectivity, but also
the stability of that connection
has really been enhanced.
So for instance, I don’t know
if you ever went into this, but
sometimes somebody would have
their phone in a position
where there was some kind
of barrier or isolation
between the phone and the ear
that was primarily connected to the phone.
And there were times
where because of distance
or some isolating factor,
that signal would drop out.
Now we’ve got even a stronger
connection to begin with,
but we also have adaptation
to where that phone is connecting.
So either ear.
So if there’s one clear ear pathway,
it will it will choose that one.
So we have seen super robust connections
and then also improved
in the switching behavior
between multiple devices.
So if you have to go between devices
or between acoustic and streaming,
that can happen instantaneously
without even missing a word.
So I think the user experience
with all of the digital universe
and this product is really exceptional.
– Well, I’ll personally name
the devices – my android phone
this iPad, two PCs, you know
I mean, I connect to all of them.
But you mentioned something
which I think is worth explaining.
Bluetooth classic was never meant
for true wireless devices.
So what you’re doing is,
the signal goes to one ear
and then you’re passing it from
ear to ear to the second ear.
– Correct.
– That’s how Bluetooth classic works.
LE Audio has an independent stream
for each ear.
And so it begs the question,
will this device
be ready for LE audio
and will it be ready for Auracast
when Auracast transmitters
start to appear in different venues?
– Yep, great question.
We’re super excited about Auracast.
I mean, what a huge patient benefit
to have this universal accessibility
and when it is available for patients,
we want the Phonak devices to
be the enabler and the connection,
the gateway to that technology.
So the ERA chip is Auracast ready.
It’s Auracast enabled.
And when the time is right
and that those installations exist,
we look forward to being able
to activate that feature for our patients.
– Okay. Got it. And what about telecoils?
How many of the six models
have telecoils on them?
– Yep, well, we know that there
are people that like telecoils
and we will always have products available
that include telecoils.
The Infinio platform is focused
on the Bluetooth Classic,
as well as about four other wireless
protocols for maintaining accessibility
among all kinds of conditions,
including TV streaming, including
Roger connectivity,
including Bluetooth low energy
for the data, communication with the app
and in the future, also Auracast.
– So it’s a good question to ask
is that that you still have the same
compatibility with the Roger microphone
that’s in that bag over there
and the TV streamer I have at home.
– Correct.
– Okay.
But are there telecoils
in any of the models then?
– Not in the Infinios.
– Not in the Infinios, okay.
So se one of the previous models
if you want telecoil capability.
– There are a lot of Lumity options
for somebody who wants a telecoil.
– Okay, got it.
Is there anything else we should know
from an audiological
point of view about this device?
What makes it different?
Why would somebody want to choose
an Infinio over Lumity?
– One thing that we haven’t talked
about is reliability.
And reliability with
hearing aids has been something
that has not always been
the primary focus of this industry.
We know that on average, HCPs reports
spending 20 to 30% of time
doing troubleshooting minor repairs,
dealing with Bluetooth
connectivity issues with patients.
I mean, HCPs
will sometimes remark about
being the local genius bar for
for all the different
needs of somebody wanting to connect
their digital world to their hearing aids.
And so we’ve really put a focus
on how to take exceptional care
that the reliability of these products
is setting a new standard.
And that means with the wireless stability
and trying to prevent
any breakages of that,
that results in a call to the HCP,
but also the devices themselves.
So these products have been through
thousands of hours of testing
and go through 135 different tests
in order to ensure that they can withstand
the daily life of their users.
– Okay. Very good. Well, thank you.
I really appreciate
you spending some time with me.
– Thanks, Andy.
– You’re welcome.
– As I mentioned at the beginning,
the big news is the model named Sphere
symbolizing spherical hearing
and launched creatively enough
at the Sphere in Las Vegas on August 7th.
in New York I got a
head start on the details
from Henning Hasemann, responsible
for training and implementing the AI model.
Before we hear from Henning,
let’s have a brief look at how AI
has been implemented
in hearing aids to date.
Forgive me
if I oversimplify while keeping it brief.
Running an AI program requires
a deep neural network, or DNN,
which is a structure that mimics the way
a human brain processes information.
A DNN is trained by providing a series
of inputs and the
corresponding desired outputs.
As a result of this training
the DNN can deliver a valid output
even when presented with an input
that is not identical
to what was provided in the training,
just as a human brain can.
Imagine asking someone
who has never seen a cat to draw one.
If you give them only a basic description,
you might end up
with a child’s version.
With additional detail,
the drawing gets more realistic.
Add even more details,
and you will get quite a good rendering.
This is how DNNs work.
Each little nugget of description
is called a parameter.
The more parameters
the DNN can hold and act on,
the closer its output will be to the ideal
over a wider range of inputs.
Trying to get too much done
with too few parameters
the result will not be good.
In hearing aids
a DNN is typically incorporated
as part of the processor chip.
The size of the network,
the complexity of the training
and the number of parameters needed
depends on the task assigned to the DNN
and the number of possible outcomes.
Relatively small DNNs
can be used to match, for example,
the sound scene to the nearest
of some thousands of training samples.
This works well because
similar sound scenes will
have similar hearing aid settings.
Therefore, very large training
sets are not required.
The DNN itself does
not have to be extremely large either,
nor does it have to be particularly fast.
And yet they have greatly improved
the user experience
by automatically optimizing
hearing aid performance
as one goes about their day,
including in noise.
It sure beats trying to guess the best
settings with the app all the time.
Signia provided an
excellent graphic illustrating
another application identifying
what is a nearby voice
and from which direction it comes
for the purpose of
focusing the mics on each one.
Hearing aids employing this level of
AI have gotten better and better.
As I’ve seen comparing
my two-generation old Phonak Paradise
to the recently released GN Nexia
I took to Australia
to do live LE Audio and Auracast demos
during my presentation there,
not to mention the base Infinio itself.
The performance difference
with both modern devices over
the older one was obvious.
The Sphere model adds a
completely new innovation.
A much larger, faster and more capable DNN
in a second chip called DEEPSONIC.
This DNN is placed directly in the audio
stream, identifying noise
and removing much of it and real time
while allowing nearby
speech to pass through with minimal delay.
When given a server farm like what
Microsoft uses to clean up audio on Teams calls,
or even a modern smartphone,
one has a lot of processing power at hand.
But to deliver effective speech and noise,
separation in-ear with all day wear,
one has to make careful choices
on both hardware and software.
I really enjoyed the conversation
with Henning as he described
just exactly what it took to make it work.
So I have with me Henning Hasemann.
He’s the director of Deep Learning
Engineering for Sonova and he’s
one of the people primarily responsible
for the machine learning algorithm
within the Sphere.
Thank you for joining me.
– Well, thank you.
– Tell me a little bit about your background
and how we come to be at this point.
– So I started my career,
so I got a Ph.D.
in computer science
and I started as software engineer, mostly
in automotive industry
and working in several different
topics there.
And yeah, five years ago
I ended up at Sonova, Phonak
and worked as a deep learner
for deep learning performance engineer,
more specifically
to bring this product to life that we’re
going to talk about a bit today.
Meanwhile, I’m leading the team now.
– Excellent.
Now, people who’ve seen
my earlier podcasts
and followed some of my work know
I’ve been following the pace of technology
in machine learning, speech from noise
separation and exactly what that means.
I’ve even done some demos with products
that weren’t capable
of being fit in-ear,
but show how that works.
And now you have managed
to put it in a hearing aid.
And so one of the questions I have
and I think other people will have too,
is just how much have you been
able to embed in the device?
In other words, how large is the,
was the training set you used?
How many parameters
are actually in the device itself
and what range of situations are you
capable of extracting noise from speech?
– Okay.
So yeah, I’m going to hit
you with some numbers now.
So we’re talking about 4.5 million
parameters that are on the device.
We have 22 million
sound samples
that we used for training
and this runs in a very wide range
of listening situations that can deal with,
lots of different kinds of noise.
But on the hearing aid,
we only use it in the most
challenging situations,
which is speech and loud noise.
So think of a typical restaurant
situation.
Yeah.
As the battery is, of course, the
limiting factor in such a small device.
– Okay.
And for context, when I would talk
with people developing for earbuds,
it was down to how tightly
can we squeeze the number of parameters,
you know, 500,000 and 300,000
to try and get it to run on a chip
that you could actually put in an earbud.
You’re talking what did you say, 4.5 million?
– Yeah.
– And you said you’re also focusing
on the restaurant scenario.
So in other words,
you put a tremendous number
of parameters in the restaurant scenario.
And so then what that means
is you’re very accurately able
to separate out speech from noise.
Would that be a correct way of saying it?
– That’s perfectly correct.
Yes. So the more parameters you have,
the more information
can be in the network, as you can imagine.
For the more different types
of noise and different speech.
Can the network distinguish
them more clearly and precisely.
– Okay.
And let’s put some context to this
about it, okay?
So for example, I’m wearing Paradises
that have binaural directional microphones
and the usual kind of acoustic noise
reduction techniques and so on.
And I get a certain amount of SNR
improvement.
– Mm hmm.
– Okay?
What does this do in comparison
to what I’m wearing?
– When you’re talking about
what’s on the Paradise, for example,
you will have a classical algorithm
so rule-based system where
you give the rules
as a software developer
for the hearing aid,
you provide the rules
and then the system gets some input
and compute some output
in deep learning.
What we do is we give the input
and the desired outputs
and then we run a procedure
that’s called training.
And with the parameters that we provide
in the training process,
we derive the rules.
So the network learns itself basically.
What are the rules?
What how do I distinguish speech
from noise
in every single relevant situation?
– Right.
So if I were to rephrase
that, what I have now and what everybody
has in their devices right now is really
sound scene classifiers, which is.
Right. It’s a more
or less a convenience feature.
In other words, the automatic modes work
very well because you’re identifying
what sound scene I’m in and doing
classical hearing aid adjustments with it.
– Yeah, so that’s a good point.
We have machine learning for classification
in hearing aids for quite some time.
That’s usually
more in the machine learning,
not so much on the deep learning sides.
So that’s a simpler class
of AI algorithms
where you also need more engineering
and more time tinkering to get it to work.
And also a simpler, simpler problem.
Simpler problem,
because you have a number of sounds
that you can classify, can be in a church,
or in a restaurant, or in silent,
or listening to music, or
whatever scenes you may have.
But that’s it.
And then if you look at speech
where you have
every fraction of a second and you sample
with lots of different frequencies,
and this has much more information
to get right.
So it’s much harder problem.
– And this is in line. The sound
classifier is actually not in line
with the audio stream, it’s controlling
the hearing aid settings.
– Exactly.
You are now in line with the audio stream.
– Exactly.
The sound classifier is separate
from the audio stream.
It’s time-wise decoupled,
so you can afford to only classify
once per second or whatever.
You don’t need to classify
every single fraction of a second, right?
You don’t even want to change.
Have it change too frequently anyway.
But with sound cleaning,
every single sound bit,
every bit of information that goes in
needs to be cleaned, and all
that needs to go through the DNN.
So you need a lot of computational power
to make that happen.
– And so then you’re able to actually
identify which component of this complex
audio spectrum that’s coming
at me is noise and what is speech
you pass on or recreate the speech
and you reject the noise.
Is that a correct understanding?
– Yeah, that’s actually
a nice way to put it.
If you want to get really technical,
we compute a mask, a complex mask
that we multiply with the audio
stream of every single frame.
And so then as a practical matter
for a hearing aid user,
let’s talk about SNR
improvement first.
And what does that mean for ability
to understand speech
in a challenging, noisy situation
compared to what classical devices do.
– So we get 10 dB SNR improvement,
which is unprecedented.
This is 3.7 dB more than any hearing aids
so far could do.
What does it mean in practical terms,
you’ll understand
2 to 3 times
more of the words that you’re hearing.
– Okay, so I’m getting 10 dB improvement
in SNR over nothing
and say between three and four dB
improvement over a classical hearing aid.
– Yes.
– Okay.
So yeah, that’s a hard won
three and four dB.
I mean, typically when you
go from generation to generation,
like for example, my Marvels
to my Paradise, what were we talking,
maybe one dB improvement,
something like that.
One, one and a half.
– I don’t know, like.
– Certainly not three.
So three is very hard won.
How many years were actually involved
in making this happen?
– That was five years.
So we started in 2019.
Yeah.
And was quite a bit of work
because we did not only
built the DNN, but we built the hardware,
the DEEPSONIC chip with it.
And it was a very challenging way
to have this co-design.
We had the first DNN on a laptop running
and that would be heating up
when I was doing the demo, right.
And then at some point
we brought it down to a phone and yeah,
now we have it in the hearing aid.
– And even training the hearing aid,
training the model, for example,
what goes into training the model
and what kind of computational complexity
was involved.
– So you have to imagine 100 GPUs,
and that’s not
your gaming graphics card
that you would buy in the store,
but that’s professional
grade, deep learning GPUs,
and then you train for months and months.
You train multiple of
these models actually,
and then you select the best ones of them.
And then you do a lot of listening tests,
go maybe back to the drawing board,
try to figure out what the target function
for the next run should be,
because you have some idea of what
the mathematical goal is to optimize for.
But you of course
have to constantly reality check that.
– Okay.
And how many GPUs did you say
you were running?
– Hundreds.
– Hundreds.
And so a lot of people
are going to be familiar with Nvidia
and how they’ve been.
And maybe you’re not using Nvidia’s
but just as an example.
Everybody’s been following,
you know, their AI activity
and how their stock has gone up so much
because everybody uses their GPUs.
And you’re running hundreds of those.
So this is a really sophisticated effort
to train the model.
– Yeah, that’s
that’s a lot of investment that we did,
but it’s not the only thing.
It’s also getting all the data,
getting the data into a really good shape.
So you want to have a lot of noise data
that covers a wide array of situations.
You want to have lots of speech data
that covers a wide array of situations.
You don’t
want the network to focus only on males
because all your female speakers
all those should all be equalized,
all these kind of things.
Yeah.
And you have to do a lot of tuning
and engineering and of course
a lot of audiology also goes into this so.
– Well, I’m very much looking forward
to trying the device. I don’t,
I can’t even count anymore
how many different ear worn things
have gone into my ears and I’m very much
looking forward to trying this one.
Thanks for spending some time with me.
– Well, thank you.
“Spherical hearing” refers to
the fact that the DNN removes noise
regardless of direction,
enabling one to hear everyone
around them equally well,
regardless of where they’re positioned.
Some directionality
can still be used to deliver
even better performance toward the front,
but the DNN does not depend on it.
I wore the Spheres in
New York City, Las Vegas
and in various settings in my home town.
I can tell you the AI
speech and noise
separation really works well.
I believe in the future
we will look at this as a seminal moment
in hearing device evolution, just as we do
with digital hearing aids today.
With any radically new technology,
the question becomes,
where do we go from here?
In a hallway conversation
at the launch event in Las Vegas
Stefan Launer touched on their initial goals
for the development of Sphere.
Then we discussed the different directions
they could take it
as part of a concerted effort
to meet the needs of people
with all levels of hearing loss,
including those with normal audiograms
who have difficulty hearing in noise.
I’ll let him have the final word.
So I have with me here Stefan Launer
Stefan, thank you for joining me.
Please tell everyone
a few words about yourself.
– My name is Stefan Launer, as you said.
I’m the VP of audiology and
health innovation.
I’m a physicist by training.
So I did my PhD thesis on basic hearing
science and hearing impairment.
I joined Phonak in 1995.
Now, Sonova and I have
been always involved in
collaborating with a lot
of academic external partners,
driving research in hearing science,
in signal processing,
and in also hearing care delivery models
and a lot of technology developments.
– Terrific. Quite a background
which you bring, experience
you bring to the table
which really culminates
in the development of the Sphere,
in particular, which is what
I want to talk with you about.
First off, congratulations.
I’ve been wearing the Sphere
since Sunday now
and got a lot of experience with them
and the in-line DNN really works.
– Thank you.
– Very nicely done.
– Thank you.
– And I’d like to talk about that
a little bit more, because it’s clear
that you’ve been working on this
for some time, as far as I could
tell, at least since 2019
and maybe a little bit before.
How did you actually start
to think about in-line
deep neural networks and the fact
that you could apply them to hearing aids?
When did that first
thought process begin
and set off the timeline
that led up to today?
– So for me, I mean, I did my Ph.D.
in a group where they already in ’95
and then afterwards until 2000,
worked on applying neural networks
for speech enhancement
in all sorts of configurations,
but it never worked.
So neural networks as a toolset
have been around for a long time.
And by 2005, you know,
the work, the research on DNNs
really took off.
And we saw a lot of powerful developments.
And we were following this field
very closely.
We were scouting, exploring a lot.
We had our own team trying out things.
And I think it was around 2016, 2017,
when we started to realize, hmmmm, it’s
getting close to these things becoming
potentially possible.
And in 2018, 2019, we really
realized, oh, this is the moment
where we have to take a decision
and go all in
and develop such a solution.
We realized the DNNs, the large
DNNs had become very powerful
to really provide a significant
improvement in speech intelligibility.
The only downside was
they were pretty big.
So that was the bet
that we had to take back then.
– And it’s interesting
because I’ve been watching it too,
thinking about it in the consumer world
and even at CES a couple of years ago,
I did some interviews and live demos
with some of the startups working on this,
the problem being that you couldn’t
get enough parameters
into a viable chip for in-ear,
so there were limitations.
A lot of demos are being run on
smartphones. – Yeah
– I’ve been following
the consumer chip development, wondering
when it’s going to get big enough
to do something really interesting.
You obviously had to take your own path
to shortcut that process.
– Well, what we did and that is something
we have always been doing.
We always wanted to be in control
of our own destiny.
And that’s why we kept
developing our own chips over many years.
We did that with the digital chips.
We did that with the wireless radio.
And by 2019,
we also decided, hey, these DNNs.
They have become powerful.
We should try to get a chip done
that runs on the power budget and size
constraints of a hearing instrument.
And that is powerful enough
to really run a large scale
DNN optimized for the
task of speech enhancement.
So it’s a very special network structure
with millions of parameters,
and we decided to develop this chip
and that’s what we have been doing
over the past years.
– And it’s interesting
to think about exactly how you did it
because you’re getting about
10 dB noise reduction.
And I’ve seen, you know, demos
where you could get far more than that,
but they have other limitations for
example, you don’t have enough parameters
to be able to fit in-ear
and handle all the world’s languages.
And so what you’ve done is
you’ve made it work very well
with 10 dB noise reduction,
which in their hearing world is a lot,
but below what the
ultimate capability would be.
When you have a larger model
and more parameters.
I can think of some other use cases too,
but I want to ask you,
where is this going?
And now I understand
you probably have version two of the chip
being taped out right now
and you’ve got more room to play.
In what directions
will you take this going forward?
– So first of all,
we have taken a first step,
but when you took a first step,
you always know, oops,
we could have done better here,
and here, and here.
So lots of learnings.
So one of the first, or next steps
is optimize the current solution
in terms of computational complexity,
computational efficiency
and the chip architecture.
So we are definitely evolving
the solution we have
that’s a clear pathway for us
and there is quite some room
to improve here.
But we are also thinking about lots
of other applications
and in the in the world of hearing care,
the one question I’ve always received
jokingly by the audience
of hearing impaired people is
when do I get the spouse enhancer?
Or sometimes the spouse canceller,
the idea of a signal processing tool
that can pick out a specific
voice and amplify it.
So speaker tracking is
an age old topic, an age old question
and things like that.
Identify specific scenes,
identify specific
target signals,
pick them out, enhance them.
That’s a next big step that we
and the whole research
community is working on.
And then you can also apply larger scale
DNN models to identify acoustic scenes
and to combine them with other sensors,
integrate different types of information.
So we now have a completely different way
of computing things.
We have this tool of powerful DNNs
in the hearing instrument
and we have lots of different
applications now ahead of us.
– So that makes perfect sense.
because in experiencing the DNN,
it’s very, very good.
The voices are very natural.
So you haven’t tried to press
so the absolute maximum
amount of noise reduction in exchange
for making the voices sound less natural.
– I mean, this is an interesting point.
When you design a hearing instrument,
you have to have a hearing instrument
that is natural and authentic.
There is no point in a noisy restaurant
to kill the entire noise floor
because it tells you something
about the environment.
It gives you information.
When the servers, the waiters are
approaching you and asking you questions
and you need to be aware
of what’s going on around you.
So it’s a subtle balance
of enhancing the speech signal
while still maintaining
environmental awareness.
– Yeah.
It makes perfect sense.
And I found in trying it,
there were times when I would,
I had created a mode where I went
really directional with the mics
because you generally want omni
so that when the server
comes up over here and addresses
you, you have heard them straight away.
But on the other hand,
there’s a loud person at the table
next to me I have no desire to hear.
So then I would take additional advantage
of the directional mics
and the DNN simultaneously.
But then that made me
think, what you just said too,
there are actually consumer
solutions coming out now
that are smartphone based, that are
actually doing voice identification.
And so you can say,
I want to attend to this voice,
I want to attend to this voice.
But I almost thought about it
in the opposite.
If you’re trying to take a user,
especially one that’s only moderately
adept with a smartphone,
most of the time, you want it
running automatically.
You don’t have to do a thing, but
a voice rejector would be interesting.
– So a voice rejector in that
you really suppress certain voices
or what do you mean?
– Yeah, exactly.
So if I’m at a table at a crowded restaurant
and it’s all good, I’ve got two
or three people here and I’m hearing them,
you know, perfectly well.
So the deep neural network running in more
or less omni directional mode.
But there’s a really loud person over here
I don’t want.
– Yeah. Yeah.
So that’s an interesting point.
You know, you could do this
by placing a strong notch behind you
in a beamformer, but that’s
then difficult if you move your body
or the person moves or you could,
you know, try to detect the voice
or the sound that you want to suppress
and briefly train a network.
And these are also solutions
that various groups in the world
in the world
working on trying to identify.
But we’re trying to find out
how do we handle the logistics,
because when you are in this situation,
you need to have a way
that is efficient to say it’s
this source that I want to cancel.
And how do you identify it?
How do you train your network
and how do you stabilize it?
So I think the technology
is there to do it.
The question is more
how do we operationalize it and how do we
integrate it in a very usable way for the
for the user of the device.
– So in the DNN,
how did you bring about the development
of the testing over this timeline?
– Yeah.
So when we developed the DNN
and we also had partners we work with
and tools to help us build skills
in the really deep neural network
technology for speech enhancement
in the computational optimization,
we’ve built a lot of internal knowledge
and we also had partners
who helped us develop the chip technology.
So we really applied our typical
open innovation model.
And the crucial point also was
if you do a cutting edge
development like this
development of this chip,
it’s never a straight development process.
It’s a pretty exciting and at times
nerve wracking rollercoaster ride.
And we had to put a lot of energy in also
testing the solution along the way.
So we also developed another DNN
that we trained on human subjective ratings.
So we had hundreds of people
on the Internet rate
sound samples in terms of sound quality
and speech intelligibility.
We use these resources to train another
and that helped us
to select the optimal architecture.
We did a lot of
testing with prototypes,
with technical measures,
and we did a lot of testing with subjects
in different laboratories,
also with external people.
And this work has also been published.
– Okay.
And so you’re really,
having a process
that’s running in parallel
because you’re developing the model,
which is informing the chip development,
what the needs of the chip were
and at the same time
as you’re starting to tape out the chip,
that’s going to inform what you can do
in terms of the model. -Correct.
– And so you really had to meet
these two together. – Yeah, absolutely.
So with the development of the chip
we started the development
of the integration of the algorithm
in a full blown hearing instrument
and to test it under
really realistic conditions
with prototypes and different levels
on a laptop level, on smartphone level
with integrated
hearing instrument functionality
and at the end, in prototypes
Of hearing instruments. And we
always tested that at least in two labs
with different languages, different
background noises, different acoustics.
So it was quite a bit of testing
to really be sure the thing works.
– Yeah, which certainly explains
the long development timeline.
Anything this new that’s never been done
before is going to take a while
to get it right.
– Yes
– And I can tell you,
as an actual hearing impaired person,
if you don’t get it right, things
start sounding unnatural.
You start to lose directionality.
– Yeah.
– You start, you know, the voices
start sounding unnatural,
they become less intelligible.
And that’s, we really don’t
want to go there.
Now, when you think about
hearing care more generally,
this capability in particular,
what additional needs can you meet with it
that you aren’t meeting now, for example,
I think about the 25 million people
that NAL identified
as having normal audiograms
but difficulty hearing
speech in noise.
How do you see this
technology helping there?
– See, I think there was quite
a bit of convergence in general
between consumer audio
and the hearing care.
And we still have quite different
requirements
and lots in terms of usability
and things like that.
But when we think about technologies
like the active noise cancelation,
the consumer or now this very powerful
speech enhancement tool that we have,
I see quite a lot of benefit
if they converge.
And especially this technology now
is also beneficial for people
who have listening difficulties
but normal audiograms. A lot of my team
colleagues have been wearing that and said
oh that’s quite helpful in a
really noisy place, in a noisy bar.
They benefit
quite a bit from that.
So I see really quite some
application of this technology.
Also for this segment
we have to solve the wearing comfort
because you have to close the earmolds
a little bit to really
have powerful results.
But that’s things that we can handle
and then we have to increase
the acceptance of these devices.
But it offers us a lot of
opportunities moving forward.
– Well, and acceptance of these devices
is a great lead in
to what I’ve been thinking
about for a while.
And that is how do we further increase
the adoption rates reaching populations
who either don’t have access today
or hesitant to use today’s solutions?
How do you see looking holistically
in terms of devices, in terms
of how hearing care is delivered,
and in terms of how we message
the benefits of treating your hearing loss
to the general population,
how you see, what are the
key strategies going forward
to increase adoption rates
on all of those axes?
– Yeah so first
and foremost,
I would also like to appreciate
and emphasize that adoption
of hearing instruments over the past
20 years as we can learn
from Marketrak and Eurotrak,
especially in countries like the US,
like Germany, with a well-developed
infrastructure for hearing care delivery,
adoption has significantly increased.
It’s not 100%, it’s below 50.
In most countries,
but it has doubled over the past 20 years
and we should appreciate that.
I think what is contributing to that
is the performance of the hearing
instruments in general has increased.
It’s also something we can see
when we look at the wearing time
of hearing instruments, there are
several studies published by several
research groups from
different manufacturers,
also talking about hearing instruments
being used on average 12 hours per day
and the number of devices
in the drawer has decreased.
So it shows that we probably have
risen the awareness and talked
about the importance of hearing care.
And I think that’s a general theme
that we need to drive forward
even more to emphasize
how important hearing is beyond hearing.
You know, hearing is important
for social interaction.
Hearing is the sense
that helps us as a social group
and it helps us connect with
friends, with families.
We have learned from a couple of studies
how hearing care contributes to
maintaining cognitive health, the ACHIEVE
study, especially in people at risk.
We have seen other correlations between
hearing care and healthy living and aging,
and I think that’s a major theme
we have to keep driving forward
to emphasize to the broader
population about the importance
of hearing and hearing well.
Moving forward, I think we also have
to become more specific
because we love to talk about people
with hearing loss, and then we talk about
people with a mild
hearing loss, and we put them
all in the same basket.
People with a mild hearing loss, people
with a profound hearing loss.
I think we have to become more specific.
Identify what is the target group
we are trying to reach and then identify
how do we talk to this target group
and what are the products that
we are offering to this target group?
I think that’s also an important
discussion to have in terms of different
target groups and their needs
for listening devices so that technology,
education, awareness, why it matters
and maybe also models of care delivery.
We always talk about OTC
or not to OTC
instead of thinking about
Wait a minute, which target group,
which needs which model,
and how can we blend different care models
to become a continuum?
So I think that’s something we should be
working on more in the future.
Be more specific.
What hearing care means
for different target groups
we are trying to reach.
– I really love that
because it’s a line.
It’s a continuous line, right.
-And to divide it up
between one and the other, I think
leaves a big hole in the middle
in the way we talk with people about it.
But I also think that’s actually probably
the biggest benefit of OTC
is that it started this conversation.
I mean, in the general public,
at least in the US,
there was a whole lot of press
around hearing care
and the importance of hearing care
with the arrival of OTC – Yeah
– And now that you were thinking,
and of course you’ve got Sennheiser
on one end of the line,
so you, you know, you have,
you can do the whole continuum
of hearing care
and I’m really looking forward
to seeing how you address all levels
of hearing loss with what those people’s
needs are, how you can deliver
care to those people, and give
them a more satisfying lifestyle.
I’m really looking forward
to seeing you go forward doing that.
– I mean, this is something
we as an organization certainly drive.
But I think this is also something
the entire community has to pick up
kind of as a task.
– I completely agree with you.
Well, listen, thanks a lot.
I really appreciate you spending some time,
a very busy conference.
You had lots to do and yet
you took some time to talk with us.
I very much appreciate it.
– Thank you, my pleasure.
Thank you very much.
– Thank you.
Be sure to subscribe to the TWIH YouTube channel for the latest episodes each week, and follow This Week in Hearing on LinkedIn and on X (formerly Twitter).
Prefer to listen on the go? Tune into the TWIH Podcast on your favorite podcast streaming service, including Apple, Spotify, Google and more.
About the Panel
Christine Jones, AuD, is the Senior Director of Marketing for Phonak and a research audiologist with a background in clinical audiology, including work with both pediatric and adult patients. She has led the Phonak Audiology Research Center (PARC) and currently applies her clinical expertise to marketing and brand communications for Phonak in the U.S.
Henning Hasemann, PhD, is the Director of Deep Learning Engineering at Sonova, with a Ph.D. in computer science and extensive experience in software engineering, particularly in the automotive industry. He has been instrumental in developing the DEEPSONIC machine learning algorithm used in the Phonak Sphere, a breakthrough in real-time speech and noise separation for hearing aids.
Stefan Launer, PhD, is the Vice President of Audiology and Health Innovation at Sonova, with a background in physics and a Ph.D. in hearing science and impairment. Since joining Phonak in 1995, he has driven research in hearing science, signal processing, and hearing care delivery models, contributing to major technological advancements in the field.
Andrew Bellavia is the Founder of AuraFuturity. He has experience in international sales, marketing, product management, and general management. Audio has been both of abiding interest and a market he served professionally in these roles. Andrew has been deeply embedded in the hearables space since the beginning and is recognized as a thought leader in the convergence of hearables and hearing health. He has been a strong advocate for hearing care innovation and accessibility, work made more personal when he faced his own hearing loss and sought treatment All these skills and experiences are brought to bear at AuraFuturity, providing go-to-market, branding, and content services to the dynamic and growing hearables and hearing health spaces.