We do what we must because...

I’ve received a few queries asking about our status. Blog entries are typically posted frequently by their creators, so I am a bit overdue for an update. It’s not for lack of progress, but because of it. We have been so busy at Hoaloha that it has been hard to take a break, but let me try to bring you up to date.

First, we did a major shift from our original strategy. Initially I had hoped to focus purely on the software development side of the solution for two reasons: 1) there seemed to be enough challenges here and 2) there appeared to be a number of companies already working on assistive care robots so it didn’t seem necessary to invest on the hardware side. You can read about some of them such as Robosoft, who participated with their robot Kompaï in the recently concluded Mobiserv project in Europe (http://www.mobiserv.info/news-events/). Another is Metra Labs whose robot, Hector, was featured in the CompanionAble project (http://www.companionable.net/). Both these robots were designed to be socially interactive and assistive companions. We explored the possibility of working with both companies, but for a variety of reasons, it didn’t work out. We also considered other hardware partners, but were not able to find a complementary company to work with.

However, because we had a strong conceptual model of what was required, we embarked on designing and building our own hardware platform, which we now not only have built, but have worked through several increasingly improved iterations. While this has consumed a good deal of additional time and effort, it has been well worth the investment. We have been able to benefit from what the late Steve Jobs noted about Apple’s ability to blend and tune the company’s design of the user experience by managing both the hardware and software design, something that has become an increasing trend in the industry.

It has not been simple, even for basic functions. For example, drive designs that support a robot’s mobility can be implemented in many ways, from the commonly used dual-drive-wheels-plus-passive-caster differential drive to the dynamic-balancing-on-two-drive-wheels (or a single sphere) or the various omnidirectional drive designs. I can’t share what we selected yet, but suffice it to say though that we evaluated a number of options. Also what I can say is that we remain committed for now to a wheeled platform. That means our robot will be limited to operation on a single level (i.e. no stairs). This seems reasonable considering that the majority of the time our targeted users are also mobile only on a single level.

Meanwhile, we have also progressed on our user interaction architecture. Here also, we’ve iterated and worked through a number of variations on a path to a usable and satisfying user interface. In addition, we also revisited some early design assumptions, such as the use of speech as an input modality. As noted in previous posts, speech recognition is a beguiling technology. It is magical when it works, but very frustrating when it does not. And despite the advances reflected in Apple’s Siri and Google Plus, speech as a primary form of interaction has still not proven itself to be a sufficient primary form user interface. This will be especially true for our users.

As a result, I originally considered not including speech input as a part of our user interface because of its characteristic tendency for failure. While speech experts often boast of 80% or greater accuracy that means a failure rate of 20%, the experience which might be like your keyboard generating the wrong character every fifth keystroke. Instead we would rely primarily on touch for input. However, there were two drawbacks with this approach. The first was that it meant that the user could only interact when within arm’s length of the robot or would have to rely on a separate mobile controller which might also not be within the user’s reach. This would reduce the robot’s ability to be proactive and reliably initiate interaction. More importantly though early user tests confirmed that, even when carefully instructed to use touch and not speech as input, users consistently attempted to interact by simply talking to the robot. This has been confirmed in other research studies on user’s preferences for input and interaction.

While we have now added back speech as an input modality, it still does not resolve the typical issues that afflict all speech recognition technologies and the inevitable errors or other failures that can occur. While recent developments have started to incorporate semantic and natural language processing into the equation, speech technologies still lack the ability to consider other important contextual information that even a two-year old learns to master to support a successful conversation such as visual information (body language, facial expression, etc.), prosody (emphasis, pacing, tone, loudness), priors (previous sentences or topics), or the speaker’s general personality and behavior patterns (e.g. the user likes to ask about the weather first thing in the morning). Instead, conventional telephony-based speech interfaces tend to try to confine the user on what can be said at any time, require multiple dialogue layers that impede progress, and implement repetitive voice prompts, creating an unsatisfying experience. (Over 85% of users dislike such interfaces, often speaking the keyword early in the dialogue to get them to a human speaker.)

Granted the newer search-based speech interfaces from Apple, Google, and Microsoft allow the user a much wider, richer range of words that can be spoken, these still lack a true conversational experience since in most cases the dialogue exchange rarely lasts more than one or two turns. And even with their humorous comebacks for odd requests like “where do I bury a dead body” or “what is the meaning of life” set an unrealistic expectation for what they can actually understand.

Our scenarios make it even more challenging because the microphone will not be near the user’s mouth, adding acoustic distortion to the incoming audio. Further, to enable interaction at a reasonable distance that may be beyond arm’s reach, we cannot require the user to tap or click a button to know when the user is addressing the robot. Some interfaces try to get around this by having a special keyword to initiate a conversation, such as the user saying “Kinect” to start voice commands for Xbox’s speech interface or “OK Glass” to begin a Google’s Glass interaction. This technique may help, but still can get annoying if you have to precede requests with the keyword.

Fortunately, we have more at our disposal than relying on the speech recognizer being in charge of a conversation. Our robot has the benefit of knowing if a user is nearby and if the user is currently looking at the robot and for how long. It also tracks when the last conversation was, what it was about, and the history of other conversations with this user at this time of day. That said, it is not yet a “solved problem” and we continue to work on improvements in our interaction model.

But let me leave describing the challenges we are working on and mention a few other updates since my last post. I am pleased and honored to announce that Dr. Henrik Christensen, KUKA Chair of Robotics at the College of Computing Georgia Institute of Technology, and Dr. Clifford Nass, Thomas M. Storke Professor at Stanford University, now serve as members of Hoaloha Robotics academic advisory board. These two gentlemen are both known throughout the academic community for their exceptional research and provide Hoaloha with a significant resource for design advice and insights.

In the coming days, I’ll endeavor to post more often, but there much work yet to be done. The challenges in delivering a great solution are not trivial. Were it not so, we would already see more personal robots in the marketplace. So far what out there is limited to research prototypes, toys and gadgets, or robots that exceed what any consumer, aged or otherwise, could reasonably afford. Despite this we continue to be driven forward by the growing need for technology that can empower those who face the effects of aging, disability, and chronic disease, while care costs increase, and the supply of human resources for support shrink (http://bit.ly/1cawU6b).