extract features of conversations recorded with mobile phones to analyze social interaction

Increasingly, every family member has a mobile phone. Doctors, therapists, and life coaches are recognizing that these phones can help families collect and learn from data about their habits, environment, and interpersonal dynamics. Working with the Semel Institute, we are developing technologies to document key features of a family’s daily interactions, including co-location and social interactions. In contrast to self-reporting, phone-based tools can collect data otherwise invisible wellness professionals and family members themselves. For example, families and coaches can learn about behaviors such as consistency of engagement at mealtimes using measures of proximity to one another, as revealed by Bluetooth stumbling.

We will collect and analyze short, anonymized audio clips captured from each member’s mobile phone to extract features of conversations, including pitch, speed, and the duration of each member’s participation. Preliminary work on classifying audio clips as speech or non-speech has shown that a basic set of 9 audio features, including the mel frequency cepstral coefficients, can be classified using the K-Nearest Neighbor algorithm with an accuracy of 93% [1]. We will build on this work. The mobile platform will anonymize the data by calculating features on the mobile platform and only uploading feature values where possible; and only capturing intermittent audio clips, on the order of 1-second samples taken every 10 seconds.

tools