The blossoming of a speech form through big data visualization


Anyone questioning the difference between big data analytics and run-of-the-mill business intelligence software or wanting to experience the wow-factor of big data need only to watch the recent TED Talk called "The Birth of a Word" from Deb Roy, director of the Cognitive Machines Group at the MIT Media Lab.

Roy was involved in robotics research and theories about child development in language back in 2005, when he and his wife (and collaborator Rupal Patel) discovered they were expecting a child. Inspired by the emerging phenomenon known as big data, the two hatched a plan to use their home as a lab for studying the development of language in a natural setting by recording video and audio of the entire process, or at least eight hours per day of the process.

Roy and his research team amassed more than 200,000 hours of video tape after considerations for privacy of people entering the home were taken into consideration. By analyzing a near-complete record of the first two years of his son's life, they were able to follow the process of how and when individual words were learned. They called them word births.

Roy believes that words with unique wordscapes (patterns of when and where a word was repeatedly heard or used) tend to be learned earlier and more easily, and suggests ways to help children learn language more effectively by manipulating the non-linguistic contexts in which they experience language. He also discovered what he thinks are deep connections between words and grammar acquisition that suggest paths for future research. For example, he was able to identify a significant halt in the pace his son was learning individual words only to discover the skill to put words together had taken off.

This home experiment with big data showed Roy and one of his students, Michael Fleischman, that the study could be applied to broader mediums and soon they applied it to television. Fleischman subsequently submitted a paper called "Grounding Language in Events" for his doctorate, in which he described the system he built for watching hundred hours of baseball games and learned to link sports commentary language (i.e. words such as "fly ball") to their associated visual meanings.

It is this kind of analysis that is now being applied to social media communications, which Roy calls a fundamentally new mode of communication. Together, Roy and Fleischman founded Bluefin Labs using their ideas from MIT to analyze the chatter of social media and tie it to various broadcasts of television shows, ads and political events.

Roy concludes that "Massive new flows of data coupled with practically limitless computational power are unleashing profound transformations throughout the cognitive and social sciences. And even as we advance our understanding of ourselves, the same technological forces are driving unprecedented changes in how we communicate and interact with each other. All of this I see as natural steps in our quest to become an increasingly self-aware and connected species.

