For our Google Summer of Code wrap-up this week we have The Distributed Little Red Hen Lab. A new organization for 2015, Red Hen Lab had three student projects. Read on to learn about the Lab and their effort to scan a huge repository of international television news programming.

The Distributed Little Red Hen Lab is an international consortium for research on multimodal communication. We develop open source tools for joint parsing of text, audio/speech and video, using datasets of various sorts, most centrally a very large dataset of international television news called the UCLA Library Broadcast NewsScape. Red Hen uses 100% open source software. In fact, not just the software but everything else—including recording nodes—is shared in the consortium.

The Red Hen archive is a huge repository of recordings of TV programming, processed in a range of ways to produce derived products useful for research, expanded daily, and supplemented by various sets of other recordings. Our challenge is to create tools that allow us to access audio, visual, and textual (closed-captioning) information in the corpus in various ways by creating abilities to search, parse and analyze the video files. However, as you can see, the archive is very large, so creating processes that can scan the entire dataset is time consuming, and often with a margin of error.

Our projects for Google Summer of Code 2015 (GSoC) challenged students to assist in a number of projects, including some that have successfully improved our ability to search, parse and extract information from the archive.

Ekateriana Ageeva - Multiword Expression Search and Tagging

Ekaterina built a multiword expressions toolkit (MWEtoolkit), which is a tool for detecting multi-word units (e.g. phrasal verbs or idiomatic expressions) in large corpora. The toolkit operates via command-line interface. To ease access and expand the toolkit's audience, Ekaterina developed a web-based interface, which builds on and extends the toolkit functionality.

The interface allows us to do the following:
  • Upload, manage, and share corpora
  • Create XML patterns which define constraints on multiword expressions
  • Search the corpora using the patterns
  • Filter search results by occurrence and frequency measures
  • Tag the corpora with obtained search results

The interface is built with Python/Django. It currently supports operations with corpora tagged with Stanford CoreNLP parser, with a possibility to extend to other formats supported by MWEtoolkit. The system uses part of speech and syntactic dependency information to find the expressions. Users may rely on various frequency metrics to obtain the most relevant search results.

Owen He - Automatic Speaker Recognition System

Owen_He-web.jpgOwen used a reservoir computing method called conceptor together with the traditional Gaussian Mixture Models (GMM) to distinguish voices between different speakers. He also used a method proposed by Microsoft Research last year at the Interspeech Conference, which used a Deep Neural Network (DNN) and an Extreme Learning Machine (ELM) to recognize speech emotions. DNN was trained to extract segment-level (256 ms) features and ELM was trained to make decisions based on the statistics of these features on a utterance level.

Owen’s project focused on applying this to detect male and female speakers, specific speakers, and emotions by collecting training samples from different speakers and audio signals with different emotional features. He then preprocessed the audio signals and created the statistical models from the training dataset. Finally, he computed the combined evidence in real time and tuned the apertures for the conceptors so that the optimal classification performance could be reached. You can check out the summary of results on GitHub.

Vasant_Kalingeri-web.jpgVasanth Kalingeri - Commercial detection system

Vasanth built a system for detecting commercials in television programs from any country and in any language. The system detects the location and the content of ads in any stream of video, regardless of the content being broadcast and other transmission noise in the video. In tests, the system achieved 100% detection of commercials. An online interface was built along with the system to allow regular inspection and maintenance.

Initially the user uses a set of hand tagged commercials. The system detects this set of commercials in the TV segment. On detecting these commercials, it divides the entire broadcast into blocks. Each of these blocks can be viewed and tagged as commercials by the user. There is a set of 60 hand labelled commercials for one to work with. This process takes about 10-30min for a 1hr TV segment, depending on the number of commercials that have to be tagged.

When the database has an appreciable amount of commercials (usually around 30 per channel) we can use it to recognize commercials in any unknown TV segment. On making changes to the web interface, the system updates its db with new/edited commercials. This web interface can be used for viewing the detected commercials as well. For more information see Vasanth’s summary of results.

By Patricia Wayne, UCLA Communication Studies

It’s Friday! Time for another Google Summer of Code wrap-up post. Boston University / XIA is one of the 37 new organizations to the program this year. Read below about three student projects and their work to discover the future architecture of the internet.
Linux XIA is the native implementation of eXpressive Internet Architecture (XIA), a meta network architecture that supports evolution of all of its components, which we call “principals,” and promotes interoperability between these principals. We are developing Linux XIA because we believe that the most effective way to find the future Internet architecture that will eventually replace TCP/IP is to crowdsource the search. This crowdsourced search is possible in Linux XIA.

Our organization, Boston University / XIA, received 34 proposals from 12 countries. As a first-year organization in Google Summer of Code (GSoC), we were surprised by the number of proposals, and we did our best to choose great students for each of the following projects:

XLXC is a set of scripts written in Ruby that creates network topologies using virtual interfaces and Linux containers. While testing a new network stack, a good amount of work goes into creating testing environments. XLXC saves developers and tinkerers a lot of time while experimenting with Linux XIA. Our student Aryaman Gupta from India worked with mentor Rahul Kumar to enable XLXC to emulate any topology using a language to describe the topologies.

Linux XIA needs to call forwarding functions that correspond to each XID type in order to forward a packet. XID types are 32-bit identifiers associated with principals which, in turn, define the forwarding functions. Being able to hash each XID type to a unique entry in an array increases the number of packets Linux XIA can forward per second because it reduces the number of memory accesses per lookup. Our student Pranav Goswami, also from India, worked with mentor Qiaobin Fu to find the best perfect hashing algorithm for Linux XIA to use in this case, and implemented it in Linux XIA.

We do not know how the future Internet will route packets between autonomous systems (ASes), but we are certain that Linux XIA can leverage IP's routing tables to have large deployments of Linux XIA. This is the goal of the LPM principal: leveraging routing tables derived from BGP, OSPF, IS-IS and any other IP routing protocol to forward XIA packets natively, that is, without encapsulation in IP. Thanks to the evolution mechanism built into Linux XIA, when a better way to route between ASes becomes available, we will be able to incrementally phase LPM out. Student André Ferreira Eleuterio from Brazil implemented the LPM principal in Linux XIA with the help of mentor Cody Doucette.

We are going to work with our students during the fall to have their contributions merged into our repositories and to add new projects to our ideas list that build upon their contributions. We expect that this will motivate new contributors by showing how much impact they can have on Linux XIA. Finally, new collaborators do not need to wait for the next GSoC to get involved! Join our community today, and "do what you can, with what you have, where you are" to make a difference like our three students successfully did.

By Michel Machado, Organization Administrator for Boston University / XIA

"Time zones are logical and easy to use."
—no one ever

Programming with time zones is notoriously difficult and error prone. Sure, this is partially because time zones have some inherent complexity. But perhaps the bigger problem is that programmers don't have a clear conceptual model of how time and time zones work. Additionally, library support may not be what it should. The end result is that code dealing with time zones is often overly complicated and sometimes even wrong.

A couple years ago we set out to fix these time zone programming woes within Google. We did this first by defining a greatly simplified mental model that enables programmers to understand time concepts and correctly reason about their code. We also created a C++ Time Zone library that closely matches this mental model and allows programmers to handle even the most complicated issues in a general and clear way.

And since we don't believe that time zone programming problems are unique to Google, we think our solutions may be useful to others. We presented these ideas and announced the open sourced cctz library this week at CppCon 2015. Even if you don't use C++, we hope you'll take a moment to read about the simplified mental model and perhaps flip through the slides from our talk, because those ideas are language independent.

by Greg Miller and Bradley White, Google Engineering

At Google, we think that internet users’ time is valuable, and that they shouldn’t have to wait long for a web page to load. Because fast is better than slow, two years ago we published the Zopfli compression algorithm. This received such positive feedback in the industry that it has been integrated into many compression solutions, ranging from PNG optimizers to preprocessing web content. Based on its use and other modern compression needs, such as web font compression, today we are excited to announce that we have developed and open sourced a new algorithm, the Brotli compression algorithm.

While Zopfli is Deflate-compatible, Brotli is a whole new data format. This new format allows us to get 20–26% higher compression ratios over Zopfli. In our study ‘Comparison of Brotli, Deflate, Zopfli, LZMA, LZHAM and Bzip2 Compression Algorithms’ we show that Brotli is roughly as fast as zlib’s Deflate implementation. At the same time, it compresses slightly more densely than LZMA and bzip2 on the Canterbury corpus. The higher data density is achieved by a 2nd order context modeling, re-use of entropy codes, larger memory window of past data and joint distribution codes. Just like Zopfli, the new algorithm is named after Swiss bakery products. Brötli means ‘small bread’ in Swiss German.

The smaller compressed size allows for better space utilization and faster page loads. We hope that this format will be supported by major browsers in the near future, as the smaller compressed size would give additional benefits to mobile users, such as lower data transfer fees and reduced battery use.

By Zoltan Szabadka, Software Engineer, Compression Team

Pencil Code is a collaborative programming site for art, music and creating games. It is also a place to experiment with mathematical functions, geometry, graphing, webpages, simulations and algorithms. Pencil Code had three Google Summer of Code students in 2015. You can read more about their project successes below.

As we return to school and look around Pencil Code in preparation for classes this fall, there are quite a few improvements created by our Google Summer of Code (GSoC) students. The first thing you see when you log in — icons everywhere! But better yet, if you have saved the program recently, the icon will be a screenshot of the program's output. This change will help students and teachers quickly identify saved projects, and will help people find interesting projects they want to share.
The icon implementation was done by Xinan Liu, a student at Singapore National University. He rewrote several bits of the Pencil Code server to support the icons, and then on the client side, he integrated the very cool html2canvas library to create the screenshots.

Xinan also contributed quite a bit beyond this project. He refactored our node.js-based build to switch from require.js to browserify, and he has been contributing to other sharing and scaling features on Pencil Code, helping other non-GSoC contributors get up to speed and reviewing their pull requests. We're looking forward to Xinan's continuing involvement and contributions to our little open source community.
The next contribution was by IIIT Hyderabad student Saksham Aggarwal. Saksham has implemented an HTML block mode for the Droplet block editor, which means that teachers can introduce beginners to HTML syntax using a drag-and-drop interface. And as usual with Droplet, you can toggle between blocks and text at any time. Saksham is also working on a similar Droplet-based editor for CSS syntax. The visual HTML syntax editor is a very accessible way to see and work with HTML syntax without having to type every bracket. And yet, magically, it does not hide the syntax - by toggling into text, you can work directly with traditional code. It is fully authentic, but highly accessible. You can read a paper about Saksham's work here.
The final project was a collaboration between GSoC student Jeremy Ruten from the University of Saskatchewan, and two of our summer students Amanda Boss from Harvard and Cali Stenson from Wellesley. They created an incredibly ambitious project to implement a "rewindable" debugger in Pencil Code. Although it is not quite ready for production yet, we are already using pieces of it in Pencil Code. You will see the debugger in coming months! For examples of how it transforms code, you can check out Jeremy, Amanda and Cali's writeup of their debugging work.

Did I mention that the three of them are students? And that they built this rewindable debugger over just one summer!? They made improvements that will make a real difference as we use Pencil Code to bring computer science to the next generation of students.

We'd like you to participate!

If you are interested in bringing some of this cool work into your classroom, join our discussion group by signing up at We have teachers from elementary school to college, from Texas to Singapore. And if you'd like to make an open source contribution, check out for project ideas, and join the teaching discussion group — also an area where our open source contributors hang out.

We are grateful to Google for supporting our summer open source program with GSoC. We hope the summer was as interesting for our students as it was productive for our project. We look forward to our students' continued involvement in the Pencil Code community.

By David Bau, Organization Administrator for Pencil Code

Now that the 11th year of Google Summer of Code has officially come to a close, we will devote Fridays to wrap-up posts from a handful of the 137 mentoring organizations that participated in 2015. Organizations this year represented a wide range of computing fields including artificial intelligence, featured below.


Two software libraries that originate from our laboratory, the Institute for Artificial Intelligence, that are used and supported by a larger user community are the KnowRob system for robot knowledge processing and the CRAM (Cognitive Robot Abstract Machine) framework for plan-based robot control. In our group, we have a very strong focus on open source software and active maintenance and integration of projects. The systems we develop are available under BSD and MIT licenses, and partly (L)GPL.

Within the context of these frameworks, we offered four projects during the summer term in 2015, which were all accepted to Google Summer of Code (GSoC).

Multi-modal Big Data Analysis for Robotic Everyday Manipulation Activities

The project "Multi-modal Big Data Analysis for Robotic Everyday Manipulation Activities" added to our ongoing work to build the robotic perception system RoboSherlock for service robots performing household chores. Our GSoC student, Alexander, made exciting progress and valuable contributions during the summer. He ported an earlier prototypical proprioceptive module from Java to C++ to integrate it into RoboSherlock, he developed tools for visualizing the module's various detections and annotations, and applied this infrastructure to detect collisions of the robot's arms with unperceived parts of the environment in a shelf reordering task. We are also very happy that Alexander decided to stay and keep on working on RoboSherlock after GSoC ended.

Kitchen Activity Games GUI

Our GSoC student, Mesut, developed a GUI to interact with the robotics simulator Gazebo. The simulator has been used as a library, allowing different scenarios (worlds) to be selected and executed. Playlists can be generated in order to replay logged episodes. During the replay, various plugins can be linked and executed from the GUI to allow post processing the data. The user interface will ease organizing and saving simulation data further used for learning. You can view Mesut’s project on GitHub here.

Symbolic Reasoning Tools with Bullet using CRAM

Autonomous robots performing complex manipulation tasks in household environments, such as preparing a meal or tidying up, are required to know where different objects are located and what properties they have. The knowledge about their environment is called “belief state”, i.e. the information that the robot believes holds true in the surrounding world. Our GSoC student, Kunal, worked on improving the world representation of the CRAM robotic framework, which represents the environment as a 3-dimensional world where simple physics rules of the Bullet Physics engine apply. The goal of the project was to issue events when errors are found in the belief state, such as, if the robot thinks its arm is inside of a table, which is physically impossible. A stand-alone ROS (Robot Operating System) publisher node, that would notify all its listeners about errors, was partially implemented while integration with the CRAM belief state is still in progress.

Report Card Generation from Robot Mobile Manipulation Activities

Throughout the summer, our GSoC student Kacper made great progress in developing a framework for automatically generating report cards from robot experiences. We have a special focus in mobile manipulation activities in robots and are interested in anomaly detection in our rather complex systems — the developed components greatly help us save time on mundane analysis tasks, and make complicated analysis steps (looking up all aspects of a certain action, comparing different trials) easier to do.

By Jan Winkler, Organization Administrator and PhD student at the Institute of Artificial Intelligence

We're excited to announce the Beta release of Bazel, an open source build system designed to support a wide variety of different programming languages and platforms.

There are lots of other build systems out there -- Maven, Gradle, Ant, Make, and CMake just to name a few. So what’s special about Bazel? Bazel is what we use to build the large majority of software within Google. As such, it has been designed to handle build problems specific to Google’s development environment, including a massive, shared code repository in which all software is built from source, a heavy emphasis on automated testing and release processes, and language and platform diversity. Bazel isn’t right for every use case, but we believe that we’re not the only ones facing these kinds of problems and we want to contribute what we’ve learned so far to the larger developer community.

Our Beta release provides:

Check out the tutorial app to see a working example using several languages.

We still have a long way to go.  Looking ahead towards our 1.0.0 release, we plan to provide Windows support, distributed caching, and Go support among other features. See our roadmap for more details and follow our blog or Twitter account for regular updates.  Feel free to contact us with questions or feedback on the mailing list or IRC (#bazel on freenode).

By Jeff Cox, Bazel team


Another year of Google Summer of Code, our program designed to introduce university students from around the world to open source development, is drawing to a close.

In April, we accepted 1,051 university students from 73 countries. These students wrote code for 137 mentoring organizations. We also had 1,918 mentors from 70 countries help them out. We are excited to announce that 87%* (916) of the students passed their final evaluations. To see more about how that compares to previous years, check out our statistics from the last ten years of the program.   

And we’re not done yet: this November, we’ll be hosting our yearly mentor summit in Sunnyvale, California. We’ll welcome representative mentors and organization administrators from each of the mentoring organizations from this year’s program to meet and exchange ideas.

Now that the coding period has concluded, students are busy preparing their code samples for all eyes to see. Soon you will be able to visit the program site where organizations will have links to the students’ code repositories.

Thank you to all of the students, mentors and organization administrators that have helped to make this 11th year of the Google Summer of Code a great success!

By Carol Smith, Open Source Programs

* This number could change slightly in the next few weeks.

Today we have a guest post from Sam Parkinson, a 15 year-old Google Code-in 2014 grand prize winner. Sam worked with Sugar Labs for two instances of Google Code-in and tells us more about his journey navigating the world of free and open source software. We hope this is only the beginning of Sam’s contributions.
Ever since I was young, naive and enjoying my first tastes of Linux, I've wanted to contribute to the FOSS community. For me, Google Code-in (GCI) made that dream come true. I was lucky enough to be able to participate for the last two years with the mentoring organization Sugar Labs.

Sugar Labs is a “desktop environment without a desktop” that uses Python. Officially, Sugar Labs is the core component of a worldwide effort to provide every child with an equal opportunity for a quality education. Available in 25 languages, Sugar Labs activities are used every school day by nearly 3 million children in more than 40 countries.

I started my FOSS journey in GCI 2013 by completing the simple task of changing a ValueError to a logged exception. At first, my confidence level went from "yeah, I know some cool Python tricks" to "omg! how do I code?". I discovered new (and sometimes confusing) things like PEP8, git-branch and mailing lists. However, having the GCI and Sugar Labs communities as a support system made my dream of contributing to FOSS manageable by breaking it up into small, manageable tasks.

I worked on some pretty cool features, like adding a nutcracker-style mode in a Speak activity, where users could insert a picture of a face and have it talk to them.
I also worked on some not-so-fun tasks, like fixing bugs caused by GTK updates while trying not to break compatibility with ancient versions. But by the end of GCI 2014, I had learned how to pass code reviews and even completed some of my own. Hopefully I’ve programmed something that has made somebody smile.

In 2014, I was lucky enough to be chosen as a GCI winner. The grand prize trip was the cherry on top of the proverbial cake. I got to meet the amazing people I'd been hacking with, plus some pretty inspiring people from Google and other FOSS projects. I found it mind blowing to actually talk with people about programming face to face, and even better to sit around laughing about the programming culture. A highlight of the trip was meeting Walter Bender, one of the Sugar Labs mentors. Together we hacked on a project improving the Sugar Labs website. It’s not done, but it’s in better shape than it was before, and I can claim that I did some coding during the trip.

GCI was truly something that changed my life. I went from being an open source newbie to being able to contribute to really cool projects, thanks to the amazing GCI and Sugar Labs communities. It's something that I would recommend any young programmer consider doing. Participating in GCI is something that can make dreams come true.

By Sam Parkinson, Google Code-in grand prize winner

(Cross-posted from the Go Blog)

Today the Go project is proud to release Go 1.5, the sixth major stable release of Go.

This release includes significant changes to the implementation. The compiler tool chain was translated from C to Go, removing the last vestiges of C code from the Go code base. The garbage collector was completely redesigned, yielding a dramatic reduction in garbage collection pause times. Related improvements to the scheduler allowed us to change the default GOMAXPROCS value (the number of concurrently executing goroutines) from 1 to the number of available CPUs. Changes to the linker enable distributing Go packages as shared libraries to link into Go programs, and building Go packages into archives or shared libraries that may be linked into or loaded by C programs (design doc).

The release also includes improvements to the developer tools. Support for "internal" packages permits sharing implementation details between packages. Experimental support for "vendoring" external dependencies is a step toward a standard mechanism for managing dependencies in Go programs. The new "go tool trace" command enables the visualisation of  program traces generated by new tracing infrastructure in the runtime. The new "go doc" command is a substitute for the original "godoc" that provides an improved command-line interface.

There are also several new operating system and architecture ports. The more mature new ports are darwin/arm, darwin/arm64 (Apple's iPhone and iPad devices), and linux/arm64. There is also experimental support for ppc64 and ppc64le (IBM PowerPC 64-bit, big and little endian).
The new darwin/arm64 port and external linking features fuel the Go mobile project, an experiment to see how Go might be used for building apps on Android and iOS devices. (The Go mobile work itself is not part of this release.)

The only language change was the lifting of a restriction in the map literal syntax to make them more succinct and consistent with slice literals.

The standard library saw many additions and improvements, too. The flag package now shows cleaner usage messages. The math/big package now provides a Float type for computing with arbitrary-precision floating point numbers. An improvement to the DNS resolver on Linux and BSD systems has removed the cgo requirement for programs that do name lookups. The go/types package has been moved to the standard library from the repository. (The new go/constant and go/importer packages are also a result of this move.) The reflect package provides the new ArrayOf and FuncOf functions, analogous to the existing SliceOf function. And, of course, there is the usual list of smaller fixes and improvements.

For the full story, see the detailed release notes. Or if you just can't wait to get started, head over to the downloads page to get Go 1.5 now.

by Andrew Gerrand, Go team