Smart Home Voice Announcements

In the very early days of our smart home development, we quickly realised that it would be good for it to have a voice. It is intelligent enough to know when to make voice announcements and has the whole home context to be best placed to make them. Whilst notifications are incredible powerful, they are dependent upon a Smartphone for delivery. We wanted our contextual smart home to be able to make both localised (to a room, zone or group of rooms/zones) and whole home voice announcements.

Our definition of "announcements" is broad. It can include pre-recorded voice announcements, dynamically generated voice announcements using text-to-speech (TTS), alert tones and other noises.

Note: Unlike notifications voice announcements cannot be muted, so they must be used wisely. But a voice announcement might just save your life, when your smartphone might be in 'do not disturb' mode.

Requirements

Our requirements for smart home voice announcements are driven by the desire to improve quality of life.

The voice announcement system as we have defined it does not include microphones for far-field voice interaction or other smart home services. This is a separate capability and is better optimised for this purpose. It also does not include being used for entertainment (music, video, etc.). The amplifiers and speakers used for voice announcements are mono audio only and are best placed for audio announcements (typically in ceilings or walls). Speakers used for music need to be stereo, typically much larger in size and also need to be better placed to deliver high quality, stereo audio.

The key challenges to be addresses with this voice announcements are all part of delivering a great user experience:

High quality, clear voice announcements with no distortion.
Zero noise when no announcement is being made.
No thumps/clicks when starting or ending an announcement.
No detectable hiss during the quiet parts of an announcement.
The ability to target announcements to a part, several parts, or all of our home. This also includes concepts such as 'upstairs', 'downstairs', or 'outside', which are supported by our zones model.
The ability to deliver announcements based on occupancy.
The ability to vary the volume of the announcements based on context.
An aesthetically pleasing hardware solution that is very low-profile or invisible in operation.

Note: Audio announcements and audio notifications are considered a 'mission critical' part of our contextual smart home and need to work during power outages, etc. The hardware used is therefore connected via our Uninterruptible Power Supply (UPS).

Implementation

Our current implementation is extremely powerful, with our home able to announce anything using several Text-To-Speech (TTS) services. A high quality cloud-based TTS service is used when available but, our contextual smart home falls back to a local service when the cloud service is unavailable or the Internet connection is down. This resilience means that it can always speak to us when required.

To provide minimal latency, it also uses a really clever caching algorithm for the TTS speech files, to ensure the best possible user experience.

The same TTS services are also used by our smart home's AI, to ensure a consistent voice is used by our home, which further enforces it's identity.

We consider voice announcements an important safety feature and as such it is a mission critical service requiring a protected power supply.

Each voice announcement is made with a priority, much like notifications. These determine which order they are spoken out, when several announcements have been queued up. The higher the priority announcements are made first. Priority 1 is highest and used for information of high importance, such as alarm being turned on, /off or things triggering the alarm. Priority 2 is for information of medium importance and includes direct feedback to a user action, e.g. pressing a button to turn on a towel rail. Priority 3 are information of low importance, typically general updates, e.g. it has started raining, weather forecasts, etc.

The priority is also one thing that determines the volume of the announcement. Priority 1 announcements are always made at 100% volume.

Amazon Echo Announcements

We recently found a way to make any of Amazon Echo devices make voice announcements and this is based on this page, which is in turn based upon blog post. Whilst this has cloud dependencies, it adds an extra set of devices that we can use in our home for voice announcements in parallel with our more resilient approach.

Examples

"The hair straighteners in the main bedroom have been left on and I have switched them off for you." - Whole home, based on our hair straightener's usage being monitored.

"A leak has been detected in the Conservatory by the Conservatory Irrigation Leak Sensor." - Whole home, one example of our many connected connected leak/flood sensors being triggered.

"Mail has been delivered." - Whole home, triggered by our connected letter box.

Some other examples using in our contextual smart home:

"The alarm is now on|off|armed." - Downstairs only, part of our extremely powerful alarm system.
"This is a reminder for Rob to put the recycle bins out." - room/zone based.
"There is someone at the front door." - whole home, when someone rings the door bell.
"Your coffee is ready Rob." - Downstairs only, based on our connected coffee maker.
"The Internet connection has failed." - Whole home, triggered by our smart home monitoring its Internet connection.
"The Internet connection has been restored." - Whole home, triggered by our smart home monitoring its Internet connection.
"It has started raining." - Whole home, triggered by our optical rain sensor but also based on other context, so it doesn't get announced every time it starts raining!
"The security alarm has been triggered by the conservatory door in the conservatory." - Whole home, priority 1.
"Smoke detected in the kitchen." - Whole home, priority 1, triggered by our smoke sensors.

We also envisage this capability having huge value in other aspects of assisted living. It can include pre-emptive safety warnings, such as icy roads/pavements, etc. It can also be used to deliver regular reminders to take medicine, appointments, etc. for those with dementia or poor memory.

Hardware

It is our Home Control System processor that is making the actual voice announcements, controlling the zones and volume. The zone control is achieved using a multi-channel digitally controlled audio matrix switch. This is connected to several 12V dc power amplifiers, which are used to drive the various speakers.

We are using a combination of ceiling, in-wall and standard speakers depending on the installation environment. In all cases, we have tried to make the solution as visually low-impact as possible.