In the very early days of our smart home development, we quickly realised that it would be good for it to have a voice. It is intelligent enough to know when to make voice announcements and has the whole home context to be best placed to make them. Whilst notifications are incredible powerful, they are dependent upon a Smartphone for delivery. We wanted our contextual smart home to be able to make both localised (to a room, zone or group of rooms/zones) and whole home voice announcements.
Our definition of "announcements" is broad. It can include pre-recorded voice announcements, dynamically generated voice announcements using text-to-speech (TTS), alert tones and other noises.
Our requirements for smart home voice announcements are driven by the desire to improve quality of life.
The voice announcement system as we have defined it does not include microphones for far-field voice interaction or other smart home services. This is a separate capability and is better optimised for this purpose. It also does not include being used for entertainment (music, video, etc.). The amplifiers and speakers used for voice announcements are mono audio only and are best placed for audio announcements (typically in ceilings or walls). Speakers used for music need to be stereo, typically much larger in size and also need to be better placed to deliver high quality, stereo audio.
The key challenges to be addresses with this voice announcements are all part of delivering a great user experience:
Our current implementation is extremely powerful, with our home able to announce anything using several Text-To-Speech (TTS) services. A high quality cloud-based TTS service is used when available but, our contextual smart home falls back to a local service when the cloud service is unavailable or the Internet connection is down. This resilience means that it can always speak to us when required.
To provide minimal latency, it also uses a really clever caching algorithm for the TTS speech files, to ensure the best possible user experience.
The same TTS services are also used by our smart home's AI, to ensure a consistent voice is used by our home, which further enforces it's identity.
We consider voice announcements an important safety feature and as such it is a mission critical service requiring a protected power supply.
Each voice announcement is made with a priority, much like notifications. These determine which order they are spoken out, when several announcements have been queued up. The higher the priority announcements are made first. Priority 1 is highest and used for information of high importance, such as alarm being turned on, /off or things triggering the alarm. Priority 2 is for information of medium importance and includes direct feedback to a user action, e.g. pressing a button to turn on a towel rail. Priority 3 are information of low importance, typically general updates, e.g. it has started raining, weather forecasts, etc.
The priority is also one thing that determines the volume of the announcement. Priority 1 announcements are always made at 100% volume.
We recently found a way to make any of Amazon Echo devices make voice announcements and this is based on this page, which is in turn based upon blog post. Whilst this has cloud dependencies, it adds an extra set of devices that we can use in our home for voice announcements in parallel with our more resilient approach.
"The hair straighteners in the main bedroom have been left on and I have switched them off for you." - Whole home, based on our hair straightener's usage being monitored.
"A leak has been detected in the Conservatory by the Conservatory Irrigation Leak Sensor." - Whole home, one example of our many connected connected leak/flood sensors being triggered.
"Mail has been delivered." - Whole home, triggered by our connected letter box.
Some other examples using in our contextual smart home:
We also envisage this capability having huge value in other aspects of assisted living. It can include pre-emptive safety warnings, such as icy roads/pavements, etc. It can also be used to deliver regular reminders to take medicine, appointments, etc. for those with dementia or poor memory.
It is our Home Control System processor that is making the actual voice announcements, controlling the zones and volume. The zone control is achieved using a multi-channel digitally controlled audio matrix switch. This is connected to several 12V dc power amplifiers, which are used to drive the various speakers.
We are using a combination of ceiling, in-wall and standard speakers depending on the installation environment. In all cases, we have tried to make the solution as visually low-impact as possible.