Friday, October 2, 2015
For our Google Summer of Code wrap-up this week we have The Distributed Little Red Hen Lab. A new organization for 2015, Red Hen Lab had three student projects. Read on to learn about the Lab and their effort to scan a huge repository of international television news programming.
The Distributed Little Red Hen Lab is an international consortium for research on multimodal communication. We develop open source tools for joint parsing of text, audio/speech and video, using datasets of various sorts, most centrally a very large dataset of international television news called the UCLA Library Broadcast NewsScape. Red Hen uses 100% open source software. In fact, not just the software but everything else—including recording nodes—is shared in the consortium.
The Red Hen archive is a huge repository of recordings of TV programming, processed in a range of ways to produce derived products useful for research, expanded daily, and supplemented by various sets of other recordings. Our challenge is to create tools that allow us to access audio, visual, and textual (closed-captioning) information in the corpus in various ways by creating abilities to search, parse and analyze the video files. However, as you can see, the archive is very large, so creating processes that can scan the entire dataset is time consuming, and often with a margin of error.
Our projects for Google Summer of Code 2015 (GSoC) challenged students to assist in a number of projects, including some that have successfully improved our ability to search, parse and extract information from the archive.
Ekateriana Ageeva - Multiword Expression Search and Tagging
Ekaterina built a multiword expressions toolkit (MWEtoolkit), which is a tool for detecting multi-word units (e.g. phrasal verbs or idiomatic expressions) in large corpora. The toolkit operates via command-line interface. To ease access and expand the toolkit's audience, Ekaterina developed a web-based interface, which builds on and extends the toolkit functionality.
The interface allows us to do the following:
The interface is built with Python/Django. It currently supports operations with corpora tagged with Stanford CoreNLP parser, with a possibility to extend to other formats supported by MWEtoolkit. The system uses part of speech and syntactic dependency information to find the expressions. Users may rely on various frequency metrics to obtain the most relevant search results.
Owen He - Automatic Speaker Recognition System
Owen used a reservoir computing method called conceptor together with the traditional Gaussian Mixture Models (GMM) to distinguish voices between different speakers. He also used a method proposed by Microsoft Research last year at the Interspeech Conference, which used a Deep Neural Network (DNN) and an Extreme Learning Machine (ELM) to recognize speech emotions. DNN was trained to extract segment-level (256 ms) features and ELM was trained to make decisions based on the statistics of these features on a utterance level.
Owen’s project focused on applying this to detect male and female speakers, specific speakers, and emotions by collecting training samples from different speakers and audio signals with different emotional features. He then preprocessed the audio signals and created the statistical models from the training dataset. Finally, he computed the combined evidence in real time and tuned the apertures for the conceptors so that the optimal classification performance could be reached. You can check out the summary of results on GitHub.
Vasanth built a system for detecting commercials in television programs from any country and in any language. The system detects the location and the content of ads in any stream of video, regardless of the content being broadcast and other transmission noise in the video. In tests, the system achieved 100% detection of commercials. An online interface was built along with the system to allow regular inspection and maintenance.
Initially the user uses a set of hand tagged commercials. The system detects this set of commercials in the TV segment. On detecting these commercials, it divides the entire broadcast into blocks. Each of these blocks can be viewed and tagged as commercials by the user. There is a set of 60 hand labelled commercials for one to work with. This process takes about 10-30min for a 1hr TV segment, depending on the number of commercials that have to be tagged.
When the database has an appreciable amount of commercials (usually around 30 per channel) we can use it to recognize commercials in any unknown TV segment. On making changes to the web interface, the system updates its db with new/edited commercials. This web interface can be used for viewing the detected commercials as well. For more information see Vasanth’s summary of results.
By Patricia Wayne, UCLA Communication Studies