Exploring Phonak’s Revolutionary Infinio Hearing Aids with Those Who Made It Happen

phonak infinio hearing aid review
HHTM
August 20, 2024

Host Andrew Bellavia takes us on an in-depth exploration of Phonak’s latest hearing aid platform, Infinio, which features a number of benefits both for the hearing professional at the fitting and for the end-user in daily life. The most exciting innovation is in the model named Sphere which incorporates a powerful deep neural network chip to perform speech and noise separation in an ear-worn device for the very first time.

Andrew speaks with three key Phonak team members to go beyond the headlines while providing a basic overview of AI in hearing devices, and why Sphere is a seminal moment in hearing device innovation.

Christine Jones, currently Senior Director of Marketing and VP Audiology at the time Infinio was kicked off, takes us through everything new across the range of products. Henning Hasemann, Director of Deep Learning Engineering and the person who led the team that developed the AI model in Sphere, describes how they remove noise in real time while maintaining natural sound across different languages, speaking styles, and environments, naming the specific benefit to hearing impaired people v. classical acoustic methods employed in hearing devices today. Then Stefan Launer, VP of Audiology and Health Innovation, provides additional context on the broader implications of this innovation for the future of hearing care, offering a glimpse into how AI and sophisticated engineering are set to redefine the possibilities in hearing technology across a continuum of needs.

Full Episode Transcript

Hello and welcome to This Week in Hearing

I no longer remember how many times
I’ve opened the podcast

by pointing out that speech

in noise is THE last frontier
in hearing device performance

and therefore the focus of more than one
recent product announcement.

Sonova is no exception.

After a period of incremental improvements
comes a major new line for Phonak

called Infinio, with several models,
including a custom rechargeable and a CROS,

all based on a new chip called ERA.

But the big news is
the addition of a second chip

called DEEPSONIC in the Sphere model.

DEEPSONIC is designed
specifically for providing real time

separation of speech from noise,
a first for an in-ear device of any kind.

We’ll get into that,
and consider it in context of overall AI development.

But first, let’s hear from Christine Jones

what’s new across the Infinio line
and how it impacts both hearing

care professionals and their patients.

I met up with her at Phonak’s media event
in New York City a week before the launch.

I have with me Christine Jones.

She’s the senior director of marketing for Phonak
and a research audiologist by background.

Thanks for joining me, Christine.

Thanks for having me, Andy.
You’re welcome.

Tell people a little bit more
about your background. – Yeah.

So I started as a clinical audiologist.

I worked in pediatrics and adults

always had a passion for Phonak technology
working in pediatrics.

I had a lot of experience with it
clinically and joined Phonak

as a clinical trainer about 150 years ago.

And have done various jobs in audiology,
including starting and running the

Phonak Audiology Research Center [PARC],
which was our first clinical research lab

that Phonak opened in the US,
and more recently took the job

to run marketing for Phonak
in the US, where

I could take all of that clinical knowhow

and really try to apply it
to our marketing

and our brand communications
and really trying to focus on,

on a strong clinical

messaging tied,

tied closely to the to the needs
and desires of our HCPs.

– So you really bring a good broad
background to what you do today.

– I hope so.

I also have a lot of people in my life

with hearing loss,
and so I live it every day

and I’m really passionate and in what

the technology can do and people’s lives
and the importance of hearing care

in people’s lives.

So it’s a good job to have.

– So let’s talk about that technology now.

We’ve covered the deep neural network,
the speech and noise separation already.

But in this product line one of the six
products has that capability.

But the other five
have a range of other improvements

that make the experience better
than the predecessor models, correct?

-Yes. I think there’s a few things
to talk about here.

Sound quality was really paramount
in the design of this product.

And of course, sound quality
for speech in noise

remains a top consumer need.

But we felt like there was room
across the board to really try to optimize

the experience, both from that moment
of truth at the first fit,

and then when that repeat user goes out
and re-experiences,

you know, music and speech and,
and all of the different surroundings

and how to just wow
somebody across the board.

We did a study in PARC

when we implemented that change

and we put users in hearing aids
for the first time in the Infinios,

and we found that compared to

a competitive device, 93% of the users

had a spontaneous preference for that
APD first fit that we implemented here.

So we believe from an HCP standpoint,
where you want to put

hearing aids on a person and really
just bowl them over with delight

from that first moment that they’re
going to get that reaction.

– Okay.

So in other words, 93% of the time
people are satisfied with the first fit.

– Exactly right.
– Okay.

And then in other cases, you come back
and do real ear and all the rest of it.

If a person’s perfectly satisfied
with the first fit

93% of the time.
– Right.

– Okay.

Which has interesting implications for

telecare as well, because if I, for
example, I’m in a rural area,

– Uh huh
– You could then theoretically

give a pretty good experience
with a remote first fitting,

even if they come back to an audiologist
in person later for fine tuning.

– Yep
– But you’re able to deliver

a pretty good experience for a person
who cannot easily get to

– Yep
– An in-person clinic.

– And for sure, you know, we enable people
to do fine tuning remotely as well

for patients that don’t
happen to be able to come in

easily, or are remote.

But yeah, very highly satisfied.

Sound quality out of the gates.

– Okay, interesting.

 

But you also have a thing
called the AI dome proposer.

What is that?

– Okay, so this sounds like a funny thing
for us to talk about in a world where

the other topics that you’re exploring
with this device are,

you know, 4.5 million connections in a

deep neural network with online
real time signal processing.

You know, the dome feels
like kind of a low tech

pivot from that.

But the reality is everybody fitting
hearing aids knows

that those fittings are won and lost
with acoustic coupling

and that you can give it the best fancy
signal processing in the world

and put the wrong acoustic coupling on it
and really have a mess on your hands.

And that is something that is very hard
to fix on a

remote teleconsult.

And so the AI dome predictor has been
trained across multiple dimensions.

So it’s looking at indicators
of satisfaction and use time and benefits

and also

all kinds of factors that we could pull
out of the target software

in order to create a proposal for the HCP

about what is going to be the best dome

for both audiological performance
and long term satisfaction.

And so with that kind of training in mind,
we’re giving audiologists now

the advice of what is the best possible
starting point for this patient to achieve

those outcomes, which is generally
what the HCP is after

and limit the chance that they’re going
to have to come back and address

the subsequent sound quality issues
or other tolerance issues.

And again, lead to that best impression
of sound quality possible.

– Okay.

So the dome proposer then is working on
the basis of the patient’s audiogram.

Correct. And then what?

Then from the audiogram,
you know what gain profile

you’re going to deliver
and then you’re recommending

what dome would work best with
that gain profile.

– 100 percent, but also using multiple
dimensions, including long term

use in treatment adherence,
a lot of things to layer on to even just

what is going to give you the gain

that you need, but also what’s going
to give you the long term satisfaction.

– Okay, got it.

So with the new chip,

the ERA chip, you’ve further
improved the sound quality

and now with the dome proposer,

then you’re going to make sure you deliver
the optimal experience by recommending

what would be the best dome
for that audiogram and that fitting.

– Exactly right.

– And then you also have this thing
called the acoustic optimized vent.

– Yep.
– What is that?

– So that is related to products

that we’re custom fabricating,
whether it be an earpiece or a

custom ITE. And this is a way

of looking at all of the acoustics
of the ear in order to ensure, again,

that right balance between sound quality
and ideological benefit.

And so this is a tool that’s been in place
for a while and in combination

with our biometric calibration,
which helps us to optimize

the fitting of the device and the
maintenance of directivity with the device

between those two things

solving for both sound quality
and ideological performance.

We believe that gives you
a really great starting point

for anything that you’re going
to fit custom in your patient’s ear.

– Okay, so essentially the
shape of the ear

canal is partially dictating
the construction of the vent.

– Exactly right.
– Okay – Yeah – Okay

And you mentioned,

since we’ve talked about custom products,
you also have a custom rechargeable now.

– We do! So Virto
Infinio rechargeable is a

fully rechargeable product
that has all the benefits of Infinio.

but for patients
who want that custom form factor

and also the convenience of
rechargeability.

– Okay.
And how does that physically work then?

Is it difficult
to put the hearing aids in the case?

How are you doing in a custom form factor
to, you know, make a good contact

and charge the hearing aids, especially for
somebody who might have dexterity issues?

– Yeah, thanks for asking.

Actually, the design of the charger

was one of the key priorities
in the development of that product that we

didn’t want something that was clunky,
difficult to maintain contact with.

And so you’ll see that
there’s actually a magnetic connection

between the charger and the hearing aid
that even if you were to shake the thing

and have them tumble around a little bit,
they would come back and reseat themselves

in exactly the right position
to maintain charging.

So it’s quite an easy thing
if you just sort of drop them in the right

direction, they’ll snap in
and give you reliable charging.

So very easy to use and very friendly
for anybody with dexterity issues who

often in general

can be very successful with a one piece
custom product.

So getting it in and out of their ear
is fairly simple, and then dropping it in

the charger has really been developed
with simplicity.

– Okay. And I know personally from having,
you know, started with the Marvels

that you’ve been

using Bluetooth Classic and so you have
universal connectivity.

– Yep.
– And I’ll tell you, even the Marvels

when I got the Marvels, now Bluetooth
earphones have improved a lot over time.

But when I got the Marvels in 2018,

I thought, these are the best
Bluetooth earphones I’ve ever had.

Like they paired faster and connected
more reliably

than half of the consumer products
that were out there.

But you’ve made further improvements.
What are those?

– That’s right.

So another piece of sound quality
and customer satisfaction

these days with hearing aids
very much has to do with the connectivity

to all of one’s digital universe, right?

So with the ERA chip and Infinio,
the wireless transmission

power is four times
greater than previous generations.

And so, for starters,

– Still having all day battery life.

– All day battery life.

So with an Infinio RIC,
you can have with even

with the Sphere product
where the DNN is activated,

you will get a solid
16 hours of use per day, including 8 hours

of streaming and activation
of the DNN in noise.

So of a full day of use.

And the key with the connectivity
now is we took what was already

the most universally capable

hearing aid, which was the Lumiity
with the universal Bluetooth.

Still the only products that have
universal Bluetooth connectivity.

And we’ve made that transmission
power four times stronger.

So the seamless connectivity, but also

the stability of that connection
has really been enhanced.

So for instance, I don’t know
if you ever went into this, but

sometimes somebody would have
their phone in a position

where there was some kind
of barrier or isolation

between the phone and the ear
that was primarily connected to the phone.

And there were times
where because of distance

or some isolating factor,
that signal would drop out.

Now we’ve got even a stronger
connection to begin with,

but we also have adaptation
to where that phone is connecting.

So either ear.

So if there’s one clear ear pathway,
it will it will choose that one.

So we have seen super robust connections
and then also improved

in the switching behavior
between multiple devices.

So if you have to go between devices
or between acoustic and streaming,

that can happen instantaneously
without even missing a word.

So I think the user experience

with all of the digital universe
and this product is really exceptional.

– Well, I’ll personally name
the devices – my android phone

this iPad, two PCs, you know
I mean, I connect to all of them.

But you mentioned something
which I think is worth explaining.

Bluetooth classic was never meant
for true wireless devices.

So what you’re doing is,
the signal goes to one ear

and then you’re passing it from
ear to ear to the second ear.

– Correct.

– That’s how Bluetooth classic works.

LE Audio has an independent stream
for each ear.

And so it begs the question,
will this device

be ready for LE audio
and will it be ready for Auracast

when Auracast transmitters
start to appear in different venues?

– Yep, great question.

We’re super excited about Auracast.

I mean, what a huge patient benefit
to have this universal accessibility

and when it is available for patients,
we want the Phonak devices to

be the enabler and the connection,
the gateway to that technology.

So the ERA chip is Auracast ready.

It’s Auracast enabled.

And when the time is right
and that those installations exist,

we look forward to being able
to activate that feature for our patients.

– Okay. Got it. And what about telecoils?

How many of the six models
have telecoils on them?

– Yep, well, we know that there
are people that like telecoils

and we will always have products available
that include telecoils.

The Infinio platform is focused
on the Bluetooth Classic,

as well as about four other wireless
protocols for maintaining accessibility

among all kinds of conditions,
including TV streaming, including

Roger connectivity,
including Bluetooth low energy

for the data, communication with the app

and in the future, also Auracast.

– So it’s a good question to ask
is that that you still have the same

compatibility with the Roger microphone
that’s in that bag over there

and the TV streamer I have at home.
– Correct.

– Okay.

But are there telecoils
in any of the models then?

– Not in the Infinios.
– Not in the Infinios, okay.

So se one of the previous models
if you want telecoil capability.

– There are a lot of Lumity options
for somebody who wants a telecoil.

– Okay, got it.

Is there anything else we should know
from an audiological

point of view about this device?
What makes it different?

Why would somebody want to choose
an Infinio over Lumity?

– One thing that we haven’t talked
about is reliability.

And reliability with
hearing aids has been something

that has not always been
the primary focus of this industry.

We know that on average, HCPs reports
spending 20 to 30% of time

doing troubleshooting minor repairs,

dealing with Bluetooth
connectivity issues with patients.

I mean, HCPs
will sometimes remark about

being the local genius bar for
for all the different

needs of somebody wanting to connect
their digital world to their hearing aids.

And so we’ve really put a focus

on how to take exceptional care

that the reliability of these products
is setting a new standard.

And that means with the wireless stability
and trying to prevent

any breakages of that,
that results in a call to the HCP,

but also the devices themselves.

So these products have been through

thousands of hours of testing
and go through 135 different tests

in order to ensure that they can withstand
the daily life of their users.

– Okay. Very good. Well, thank you.

I really appreciate
you spending some time with me.

– Thanks, Andy.
– You’re welcome.

– As I mentioned at the beginning,

the big news is the model named Sphere
symbolizing spherical hearing

and launched creatively enough
at the Sphere in Las Vegas on August 7th.

in New York I got a
head start on the details

from Henning Hasemann, responsible
for training and implementing the AI model.

Before we hear from Henning,
let’s have a brief look at how AI

has been implemented
in hearing aids to date.

Forgive me
if I oversimplify while keeping it brief.

Running an AI program requires
a deep neural network, or DNN,

which is a structure that mimics the way
a human brain processes information.

A DNN is trained by providing a series

of inputs and the
corresponding desired outputs.

As a result of this training

the DNN can deliver a valid output
even when presented with an input

that is not identical
to what was provided in the training,

just as a human brain can.

Imagine asking someone
who has never seen a cat to draw one.

If you give them only a basic description,
you might end up

with a child’s version.
With additional detail,

the drawing gets more realistic.

Add even more details,
and you will get quite a good rendering.

This is how DNNs work.

Each little nugget of description
is called a parameter.

The more parameters
the DNN can hold and act on,

the closer its output will be to the ideal
over a wider range of inputs.

Trying to get too much done
with too few parameters

the result will not be good.
In hearing aids

a DNN is typically incorporated
as part of the processor chip.

The size of the network,
the complexity of the training

and the number of parameters needed
depends on the task assigned to the DNN

and the number of possible outcomes.

Relatively small DNNs
can be used to match, for example,

the sound scene to the nearest
of some thousands of training samples.

This works well because

similar sound scenes will
have similar hearing aid settings.

Therefore, very large training
sets are not required.

The DNN itself does
not have to be extremely large either,

nor does it have to be particularly fast.

And yet they have greatly improved
the user experience

by automatically optimizing
hearing aid performance

as one goes about their day,
including in noise.

It sure beats trying to guess the best
settings with the app all the time.

Signia provided an

excellent graphic illustrating
another application identifying

what is a nearby voice
and from which direction it comes

for the purpose of
focusing the mics on each one.

Hearing aids employing this level of
AI have gotten better and better.

As I’ve seen comparing
my two-generation old Phonak Paradise

to the recently released GN Nexia
I took to Australia

to do live LE Audio and Auracast demos
during my presentation there,

not to mention the base Infinio itself.

The performance difference
with both modern devices over

the older one was obvious.

The Sphere model adds a
completely new innovation.

A much larger, faster and more capable DNN

in a second chip called DEEPSONIC.

This DNN is placed directly in the audio
stream, identifying noise

and removing much of it and real time
while allowing nearby

speech to pass through with minimal delay.

When given a server farm like what
Microsoft uses to clean up audio on Teams calls,

or even a modern smartphone,

one has a lot of processing power at hand.

But to deliver effective speech and noise,
separation in-ear with all day wear,

one has to make careful choices
on both hardware and software.

I really enjoyed the conversation
with Henning as he described

just exactly what it took to make it work.

So I have with me Henning Hasemann.

He’s the director of Deep Learning
Engineering for Sonova and he’s

one of the people primarily responsible
for the machine learning algorithm

within the Sphere.

Thank you for joining me.
– Well, thank you.

– Tell me a little bit about your background
and how we come to be at this point.

– So I started my career,

so I got a Ph.D.

in computer science

and I started as software engineer, mostly
in automotive industry

and working in several different

topics there.

And yeah, five years ago
I ended up at Sonova, Phonak

and worked as a deep learner

for deep learning performance engineer,
more specifically

to bring this product to life that we’re
going to talk about a bit today.

Meanwhile, I’m leading the team now.

– Excellent.

Now, people who’ve seen
my earlier podcasts

and followed some of my work know
I’ve been following the pace of technology

in machine learning, speech from noise
separation and exactly what that means.

I’ve even done some demos with products
that weren’t capable

of being fit in-ear,
but show how that works.

And now you have managed
to put it in a hearing aid.

And so one of the questions I have
and I think other people will have too,

is just how much have you been
able to embed in the device?

In other words, how large is the,

was the training set you used?

How many parameters
are actually in the device itself

and what range of situations are you
capable of extracting noise from speech?

– Okay.

So yeah, I’m going to hit
you with some numbers now.

So we’re talking about 4.5 million
parameters that are on the device.

We have 22 million

sound samples
that we used for training

and this runs in a very wide range

of listening situations that can deal with,

lots of different kinds of noise.

But on the hearing aid,
we only use it in the most

challenging situations,
which is speech and loud noise.

So think of a typical restaurant
situation.

Yeah.

As the battery is, of course, the
limiting factor in such a small device.

– Okay.

And for context, when I would talk
with people developing for earbuds,

it was down to how tightly
can we squeeze the number of parameters,

you know, 500,000 and 300,000

to try and get it to run on a chip
that you could actually put in an earbud.

You’re talking what did you say, 4.5 million?

– Yeah.

– And you said you’re also focusing
on the restaurant scenario.

So in other words,
you put a tremendous number

of parameters in the restaurant scenario.

And so then what that means
is you’re very accurately able

to separate out speech from noise.

Would that be a correct way of saying it?
– That’s perfectly correct.

Yes. So the more parameters you have,

the more information
can be in the network, as you can imagine.

For the more different types
of noise and different speech.

Can the network distinguish
them more clearly and precisely.

– Okay.

And let’s put some context to this
about it, okay?

So for example, I’m wearing Paradises
that have binaural directional microphones

and the usual kind of acoustic noise
reduction techniques and so on.

And I get a certain amount of SNR
improvement.

– Mm hmm.
– Okay?

What does this do in comparison
to what I’m wearing?

– When you’re talking about
what’s on the Paradise, for example,

you will have a classical algorithm
so rule-based system where

you give the rules

as a software developer
for the hearing aid,

you provide the rules

and then the system gets some input
and compute some output

in deep learning.

What we do is we give the input
and the desired outputs

and then we run a procedure
that’s called training.

And with the parameters that we provide

in the training process,
we derive the rules.

So the network learns itself basically.

What are the rules?

What how do I distinguish speech
from noise

in every single relevant situation?

– Right.

So if I were to rephrase
that, what I have now and what everybody

has in their devices right now is really
sound scene classifiers, which is.

Right. It’s a more
or less a convenience feature.

In other words, the automatic modes work
very well because you’re identifying

what sound scene I’m in and doing
classical hearing aid adjustments with it.

– Yeah, so that’s a good point.

We have machine learning for classification
in hearing aids for quite some time.

That’s usually

more in the machine learning,
not so much on the deep learning sides.

So that’s a simpler class
of AI algorithms

where you also need more engineering
and more time tinkering to get it to work.

And also a simpler, simpler problem.

Simpler problem,

because you have a number of sounds
that you can classify, can be in a church,

or in a restaurant, or in silent,
or listening to music, or

whatever scenes you may have.

But that’s it.

And then if you look at speech
where you have

every fraction of a second and you sample
with lots of different frequencies,

and this has much more information
to get right.

So it’s much harder problem.

– And this is in line. The sound
classifier is actually not in line

with the audio stream, it’s controlling
the hearing aid settings.

– Exactly.

You are now in line with the audio stream.

– Exactly.

The sound classifier is separate
from the audio stream.

It’s time-wise decoupled,
so you can afford to only classify

once per second or whatever.

You don’t need to classify
every single fraction of a second, right?

You don’t even want to change.

Have it change too frequently anyway.

But with sound cleaning,

every single sound bit,
every bit of information that goes in

needs to be cleaned, and all
that needs to go through the DNN.

So you need a lot of computational power
to make that happen.

– And so then you’re able to actually
identify which component of this complex

audio spectrum that’s coming
at me is noise and what is speech

you pass on or recreate the speech
and you reject the noise.

Is that a correct understanding?

– Yeah, that’s actually
a nice way to put it.

If you want to get really technical,
we compute a mask, a complex mask

that we multiply with the audio
stream of every single frame.

And so then as a practical matter
for a hearing aid user,

let’s talk about SNR
improvement first.

And what does that mean for ability
to understand speech

in a challenging, noisy situation
compared to what classical devices do.

– So we get 10 dB SNR improvement,
which is unprecedented.

This is 3.7 dB more than any hearing aids

so far could do.

What does it mean in practical terms,
you’ll understand

2 to 3 times
more of the words that you’re hearing.

– Okay, so I’m getting 10 dB improvement
in SNR over nothing

and say between three and four dB

improvement over a classical hearing aid.
– Yes.

– Okay.

So yeah, that’s a hard won
three and four dB.

I mean, typically when you
go from generation to generation,

like for example, my Marvels
to my Paradise, what were we talking,

maybe one dB improvement,
something like that.

One, one and a half.

– I don’t know, like.

– Certainly not three.
So three is very hard won.

How many years were actually involved
in making this happen?

– That was five years.

So we started in 2019.

Yeah.

And was quite a bit of work
because we did not only

built the DNN, but we built the hardware,
the DEEPSONIC chip with it.

And it was a very challenging way
to have this co-design.

We had the first DNN on a laptop running

and that would be heating up
when I was doing the demo, right.

And then at some point

we brought it down to a phone and yeah,
now we have it in the hearing aid.

– And even training the hearing aid,
training the model, for example,

what goes into training the model
and what kind of computational complexity

was involved.

– So you have to imagine 100 GPUs,
and that’s not

your gaming graphics card
that you would buy in the store,

but that’s professional
grade, deep learning GPUs,

and then you train for months and months.

You train multiple of
these models actually,

and then you select the best ones of them.

And then you do a lot of listening tests,
go maybe back to the drawing board,

try to figure out what the target function
for the next run should be,

because you have some idea of what
the mathematical goal is to optimize for.

But you of course
have to constantly reality check that.

– Okay.

And how many GPUs did you say
you were running?

– Hundreds.
– Hundreds.

And so a lot of people
are going to be familiar with Nvidia

and how they’ve been.

And maybe you’re not using Nvidia’s
but just as an example.

Everybody’s been following,
you know, their AI activity

and how their stock has gone up so much
because everybody uses their GPUs.

And you’re running hundreds of those.

So this is a really sophisticated effort
to train the model.

– Yeah, that’s

that’s a lot of investment that we did,
but it’s not the only thing.

It’s also getting all the data,
getting the data into a really good shape.

So you want to have a lot of noise data
that covers a wide array of situations.

You want to have lots of speech data
that covers a wide array of situations.

You don’t

want the network to focus only on males
because all your female speakers

all those should all be equalized,
all these kind of things.

Yeah.

And you have to do a lot of tuning
and engineering and of course

a lot of audiology also goes into this so.

– Well, I’m very much looking forward
to trying the device. I don’t,

I can’t even count anymore
how many different ear worn things

have gone into my ears and I’m very much
looking forward to trying this one.

Thanks for spending some time with me.
– Well, thank you.

“Spherical hearing” refers to

the fact that the DNN removes noise
regardless of direction,

enabling one to hear everyone
around them equally well,

regardless of where they’re positioned.
Some directionality

can still be used to deliver
even better performance toward the front,

but the DNN does not depend on it.

I wore the Spheres in
New York City, Las Vegas

and in various settings in my home town.

I can tell you the AI

speech and noise
separation really works well.

I believe in the future
we will look at this as a seminal moment

in hearing device evolution, just as we do
with digital hearing aids today.

With any radically new technology,
the question becomes,

where do we go from here?

In a hallway conversation
at the launch event in Las Vegas

Stefan Launer touched on their initial goals
for the development of Sphere.

Then we discussed the different directions
they could take it

as part of a concerted effort

to meet the needs of people
with all levels of hearing loss,

including those with normal audiograms
who have difficulty hearing in noise.

I’ll let him have the final word.

So I have with me here Stefan Launer

Stefan, thank you for joining me.

Please tell everyone
a few words about yourself.

– My name is Stefan Launer, as you said.

I’m the VP of audiology and
health innovation.

I’m a physicist by training.

So I did my PhD thesis on basic hearing
science and hearing impairment.

I joined Phonak in 1995.

Now, Sonova and I have
been always involved in

collaborating with a lot
of academic external partners,

driving research in hearing science,
in signal processing,

and in also hearing care delivery models
and a lot of technology developments.

– Terrific. Quite a background

which you bring, experience
you bring to the table

which really culminates
in the development of the Sphere,

in particular, which is what
I want to talk with you about.

First off, congratulations.

I’ve been wearing the Sphere
since Sunday now

and got a lot of experience with them
and the in-line DNN really works.

– Thank you.
– Very nicely done.

– Thank you.

– And I’d like to talk about that
a little bit more, because it’s clear

that you’ve been working on this
for some time, as far as I could

tell, at least since 2019
and maybe a little bit before.

How did you actually start
to think about in-line

deep neural networks and the fact
that you could apply them to hearing aids?

When did that first
thought process begin

and set off the timeline
that led up to today?

– So for me, I mean, I did my Ph.D.

in a group where they already in ’95
and then afterwards until 2000,

worked on applying neural networks
for speech enhancement

in all sorts of configurations,
but it never worked.

So neural networks as a toolset
have been around for a long time.

And by 2005, you know,

the work, the research on DNNs
really took off.

And we saw a lot of powerful developments.

And we were following this field
very closely.

We were scouting, exploring a lot.

We had our own team trying out things.

And I think it was around 2016, 2017,

when we started to realize, hmmmm, it’s
getting close to these things becoming

potentially possible.

And in 2018, 2019, we really
realized, oh, this is the moment

where we have to take a decision
and go all in

and develop such a solution.

We realized the DNNs, the large
DNNs had become very powerful

to really provide a significant
improvement in speech intelligibility.

The only downside was
they were pretty big.

So that was the bet
that we had to take back then.

– And it’s interesting
because I’ve been watching it too,

thinking about it in the consumer world
and even at CES a couple of years ago,

I did some interviews and live demos
with some of the startups working on this,

the problem being that you couldn’t
get enough parameters

into a viable chip for in-ear,
so there were limitations.

A lot of demos are being run on
smartphones. – Yeah

– I’ve been following
the consumer chip development, wondering

when it’s going to get big enough
to do something really interesting.

You obviously had to take your own path
to shortcut that process.

– Well, what we did and that is something
we have always been doing.

We always wanted to be in control
of our own destiny.

And that’s why we kept
developing our own chips over many years.

We did that with the digital chips.

We did that with the wireless radio.

And by 2019,
we also decided, hey, these DNNs.

They have become powerful.

We should try to get a chip done

that runs on the power budget and size
constraints of a hearing instrument.

And that is powerful enough
to really run a large scale

DNN optimized for the
task of speech enhancement.

So it’s a very special network structure

with millions of parameters,

and we decided to develop this chip

and that’s what we have been doing
over the past years.

– And it’s interesting
to think about exactly how you did it

because you’re getting about
10 dB noise reduction.

And I’ve seen, you know, demos
where you could get far more than that,

but they have other limitations for
example, you don’t have enough parameters

to be able to fit in-ear
and handle all the world’s languages.

And so what you’ve done is
you’ve made it work very well

with 10 dB noise reduction,
which in their hearing world is a lot,

but below what the
ultimate capability would be.

When you have a larger model
and more parameters.

I can think of some other use cases too,
but I want to ask you,

where is this going?

And now I understand
you probably have version two of the chip

being taped out right now
and you’ve got more room to play.

In what directions
will you take this going forward?

– So first of all,
we have taken a first step,

but when you took a first step,
you always know, oops,

we could have done better here,
and here, and here.

So lots of learnings.

So one of the first, or next steps
is optimize the current solution

in terms of computational complexity,
computational efficiency

and the chip architecture.

So we are definitely evolving

the solution we have
that’s a clear pathway for us

and there is quite some room
to improve here.

But we are also thinking about lots
of other applications

and in the in the world of hearing care,

the one question I’ve always received
jokingly by the audience

of hearing impaired people is
when do I get the spouse enhancer?

Or sometimes the spouse canceller,
the idea of a signal processing tool

that can pick out a specific
voice and amplify it.

So speaker tracking is

an age old topic, an age old question
and things like that.

Identify specific scenes,
identify specific

target signals,
pick them out, enhance them.

That’s a next big step that we

and the whole research
community is working on.

And then you can also apply larger scale

DNN models to identify acoustic scenes

and to combine them with other sensors,
integrate different types of information.

So we now have a completely different way
of computing things.

We have this tool of powerful DNNs
in the hearing instrument

and we have lots of different
applications now ahead of us.

– So that makes perfect sense.

because in experiencing the DNN,

it’s very, very good.

The voices are very natural.

So you haven’t tried to press
so the absolute maximum

amount of noise reduction in exchange
for making the voices sound less natural.

– I mean, this is an interesting point.
When you design a hearing instrument,

you have to have a hearing instrument
that is natural and authentic.

There is no point in a noisy restaurant
to kill the entire noise floor

because it tells you something
about the environment.

It gives you information.

When the servers, the waiters are
approaching you and asking you questions

and you need to be aware
of what’s going on around you.

So it’s a subtle balance
of enhancing the speech signal

while still maintaining
environmental awareness.

– Yeah.

It makes perfect sense.

And I found in trying it,
there were times when I would,

I had created a mode where I went
really directional with the mics

because you generally want omni
so that when the server

comes up over here and addresses
you, you have heard them straight away.

But on the other hand,
there’s a loud person at the table

next to me I have no desire to hear.

So then I would take additional advantage
of the directional mics

and the DNN simultaneously.

But then that made me
think, what you just said too,

there are actually consumer
solutions coming out now

that are smartphone based, that are
actually doing voice identification.

And so you can say,

I want to attend to this voice,
I want to attend to this voice.

But I almost thought about it
in the opposite.

If you’re trying to take a user,

especially one that’s only moderately
adept with a smartphone,

most of the time, you want it
running automatically.

You don’t have to do a thing, but
a voice rejector would be interesting.

– So a voice rejector in that

you really suppress certain voices
or what do you mean?

– Yeah, exactly.

So if I’m at a table at a crowded restaurant
and it’s all good, I’ve got two

or three people here and I’m hearing them,
you know, perfectly well.

So the deep neural network running in more
or less omni directional mode.

But there’s a really loud person over here
I don’t want.

– Yeah. Yeah.
So that’s an interesting point.

You know, you could do this
by placing a strong notch behind you

in a beamformer, but that’s
then difficult if you move your body

or the person moves or you could,
you know, try to detect the voice

or the sound that you want to suppress
and briefly train a network.

And these are also solutions
that various groups in the world

in the world
working on trying to identify.

But we’re trying to find out
how do we handle the logistics,

because when you are in this situation,
you need to have a way

that is efficient to say it’s
this source that I want to cancel.

And how do you identify it?

How do you train your network
and how do you stabilize it?

So I think the technology
is there to do it.

The question is more
how do we operationalize it and how do we

integrate it in a very usable way for the
for the user of the device.

– So in the DNN,

how did you bring about the development
of the testing over this timeline?

– Yeah.

So when we developed the DNN
and we also had partners we work with

and tools to help us build skills
in the really deep neural network

technology for speech enhancement
in the computational optimization,

we’ve built a lot of internal knowledge

and we also had partners
who helped us develop the chip technology.

So we really applied our typical
open innovation model.

And the crucial point also was
if you do a cutting edge

development like this
development of this chip,

it’s never a straight development process.

It’s a pretty exciting and at times
nerve wracking rollercoaster ride.

And we had to put a lot of energy in also

testing the solution along the way.

So we also developed another DNN

that we trained on human subjective ratings.

So we had hundreds of people
on the Internet rate

sound samples in terms of sound quality
and speech intelligibility.

We use these resources to train another
and that helped us

to select the optimal architecture.

We did a lot of

testing with prototypes,
with technical measures,

and we did a lot of testing with subjects

in different laboratories,
also with external people.

And this work has also been published.

– Okay.

And so you’re really,
having a process

that’s running in parallel
because you’re developing the model,

which is informing the chip development,
what the needs of the chip were

and at the same time
as you’re starting to tape out the chip,

that’s going to inform what you can do
in terms of the model. -Correct.

– And so you really had to meet
these two together. – Yeah, absolutely.

So with the development of the chip
we started the development

of the integration of the algorithm
in a full blown hearing instrument

and to test it under
really realistic conditions

with prototypes and different levels
on a laptop level, on smartphone level

with integrated

hearing instrument functionality
and at the end, in prototypes

Of hearing instruments. And we
always tested that at least in two labs

with different languages, different
background noises, different acoustics.

So it was quite a bit of testing
to really be sure the thing works.

– Yeah, which certainly explains
the long development timeline.

Anything this new that’s never been done
before is going to take a while

to get it right.
– Yes

– And I can tell you,
as an actual hearing impaired person,

if you don’t get it right, things
start sounding unnatural.

You start to lose directionality.
– Yeah.

– You start, you know, the voices
start sounding unnatural,

they become less intelligible.

And that’s, we really don’t
want to go there.

Now, when you think about

hearing care more generally,

this capability in particular,

what additional needs can you meet with it

that you aren’t meeting now, for example,
I think about the 25 million people

that NAL identified
as having normal audiograms

but difficulty hearing
speech in noise.

How do you see this
technology helping there?

– See, I think there was quite
a bit of convergence in general

between consumer audio
and the hearing care.

And we still have quite different
requirements

and lots in terms of usability
and things like that.

But when we think about technologies
like the active noise cancelation,

the consumer or now this very powerful

speech enhancement tool that we have,

I see quite a lot of benefit
if they converge.

And especially this technology now

is also beneficial for people
who have listening difficulties

but normal audiograms. A lot of my team
colleagues have been wearing that and said

oh that’s quite helpful in a
really noisy place, in a noisy bar.

They benefit
quite a bit from that.

So I see really quite some
application of this technology.

Also for this segment

we have to solve the wearing comfort
because you have to close the earmolds

a little bit to really
have powerful results.

But that’s things that we can handle

and then we have to increase
the acceptance of these devices.

But it offers us a lot of
opportunities moving forward.

– Well, and acceptance of these devices
is a great lead in

to what I’ve been thinking
about for a while.

And that is how do we further increase
the adoption rates reaching populations

who either don’t have access today
or hesitant to use today’s solutions?

How do you see looking holistically
in terms of devices, in terms

of how hearing care is delivered,
and in terms of how we message

the benefits of treating your hearing loss
to the general population,

how you see, what are the
key strategies going forward

to increase adoption rates
on all of those axes?

– Yeah so first

and foremost,
I would also like to appreciate

and emphasize that adoption
of hearing instruments over the past

20 years as we can learn
from Marketrak and Eurotrak,

especially in countries like the US,
like Germany, with a well-developed

infrastructure for hearing care delivery,
adoption has significantly increased.

It’s not 100%, it’s below 50.

In most countries,

but it has doubled over the past 20 years
and we should appreciate that.

I think what is contributing to that

is the performance of the hearing
instruments in general has increased.

It’s also something we can see
when we look at the wearing time

of hearing instruments, there are
several studies published by several

research groups from
different manufacturers,

also talking about hearing instruments
being used on average 12 hours per day

and the number of devices
in the drawer has decreased.

So it shows that we probably have
risen the awareness and talked

about the importance of hearing care.

And I think that’s a general theme
that we need to drive forward

even more to emphasize
how important hearing is beyond hearing.

You know, hearing is important
for social interaction.

Hearing is the sense
that helps us as a social group

and it helps us connect with
friends, with families.

We have learned from a couple of studies
how hearing care contributes to

maintaining cognitive health, the ACHIEVE

study, especially in people at risk.

We have seen other correlations between

hearing care and healthy living and aging,
and I think that’s a major theme

we have to keep driving forward
to emphasize to the broader

population about the importance
of hearing and hearing well.

Moving forward, I think we also have
to become more specific

because we love to talk about people
with hearing loss, and then we talk about

people with a mild

hearing loss, and we put them
all in the same basket.

People with a mild hearing loss, people
with a profound hearing loss.

I think we have to become more specific.

Identify what is the target group
we are trying to reach and then identify

how do we talk to this target group

and what are the products that
we are offering to this target group?

I think that’s also an important
discussion to have in terms of different

target groups and their needs
for listening devices so that technology,

education, awareness, why it matters
and maybe also models of care delivery.

We always talk about OTC

or not to OTC
instead of thinking about

Wait a minute, which target group,
which needs which model,

and how can we blend different care models
to become a continuum?

So I think that’s something we should be
working on more in the future.

Be more specific.

What hearing care means
for different target groups

we are trying to reach.

– I really love that
because it’s a line.

It’s a continuous line, right.

-And to divide it up
between one and the other, I think

leaves a big hole in the middle
in the way we talk with people about it.

But I also think that’s actually probably
the biggest benefit of OTC

is that it started this conversation.

I mean, in the general public,
at least in the US,

there was a whole lot of press
around hearing care

and the importance of hearing care
with the arrival of OTC – Yeah

– And now that you were thinking,
and of course you’ve got Sennheiser

on one end of the line,
so you, you know, you have,

you can do the whole continuum
of hearing care

and I’m really looking forward
to seeing how you address all levels

of hearing loss with what those people’s
needs are, how you can deliver

care to those people, and give
them a more satisfying lifestyle.

I’m really looking forward
to seeing you go forward doing that.

– I mean, this is something
we as an organization certainly drive.

But I think this is also something

the entire community has to pick up
kind of as a task.

– I completely agree with you.

Well, listen, thanks a lot.

I really appreciate you spending some time,
a very busy conference.

You had lots to do and yet
you took some time to talk with us.

I very much appreciate it.
– Thank you, my pleasure.

Thank you very much.
– Thank you.

 


Be sure to subscribe to the TWIH YouTube channel for the latest episodes each week, and follow This Week in Hearing on LinkedIn and on X (formerly Twitter).

Prefer to listen on the go? Tune into the TWIH Podcast on your favorite podcast streaming service, including AppleSpotify, Google and more.

About the Panel

Christine Jones, AuD, is the Senior Director of Marketing for Phonak and a research audiologist with a background in clinical audiology, including work with both pediatric and adult patients. She has led the Phonak Audiology Research Center (PARC) and currently applies her clinical expertise to marketing and brand communications for Phonak in the U.S.

Henning Hasemann, PhD, is the Director of Deep Learning Engineering at Sonova, with a Ph.D. in computer science and extensive experience in software engineering, particularly in the automotive industry. He has been instrumental in developing the DEEPSONIC machine learning algorithm used in the Phonak Sphere, a breakthrough in real-time speech and noise separation for hearing aids.

Stefan Launer, PhD, is the Vice President of Audiology and Health Innovation at Sonova, with a background in physics and a Ph.D. in hearing science and impairment. Since joining Phonak in 1995, he has driven research in hearing science, signal processing, and hearing care delivery models, contributing to major technological advancements in the field.

Andrew Bellavia is the Founder of AuraFuturity. He has experience in international sales, marketing, product management, and general management. Audio has been both of abiding interest and a market he served professionally in these roles. Andrew has been deeply embedded in the hearables space since the beginning and is recognized as a thought leader in the convergence of hearables and hearing health. He has been a strong advocate for hearing care innovation and accessibility, work made more personal when he faced his own hearing loss and sought treatment All these skills and experiences are brought to bear at AuraFuturity, providing go-to-market, branding, and content services to the dynamic and growing hearables and hearing health spaces.

 

Leave a Reply