uni'wissen 02-2012 ENG

sis of the video analysis. Now Brox is working on the second step: teaching the computer to com- pare several films and find similarities – such as images showing different cars from the same perspective. The computer can lay these images on top of one another. “Then it can determine what forms and structures are common to all cars and develop a general description of the class of ‘cars.’” Independent Collection of Training Data This provides a basis that enables the com- puter to identify variations of cars that it was pre- viously unfamiliar with on other videos. Since it already knows the class, it shouldn’t need to be told that a car is on the film anymore – in other words, it can get along without annotations. The new variations present an opportunity for the computer to expand its knowledge: It compares them with those it has already leaned, refines the abstract description of the class, and thus becomes better at finding examples of it in the future. In this way, the scientists aim to initiate a learning process in which the computer collects most of its own training data and develops its ability to recognize objects on pictures autono- mously. In order to get this process underway, all the scientists need to do is provide the computer initial information on the class it is supposed to learn. “The ideal thing would be to just give the computer pictures in which it finds similarities, allowing it to form the categories itself and ac- quire a nice representation of the world,” says Brox. “This works with humans, but for comput- ers we have to simplify the problem.” In the future, computers that can recognize objects on pictures could be used among other things to optimize driver assistance systems in cars or help robots to orient themselves better in their environment. In order to make the technol- ogy practicable for such applications, however, Prof. Dr. Thomas Brox has served as professor for pattern recognition and image processing at the Department of Computer Science of the University of Freiburg since 2010. He studied computer engineer- ing at the University of Mannheim and earned his doctorate in computer sci- ence at the University of Saarland in 2005. He then continued his research as a member of the Computer Vision Group at the Uni- versity of Bonn and taught at the Dresden University of Technology from 2007 to 2008. After two years of research activity in the Computer Vision Group at the University of California, Berkeley, USA, he accept- ed a position as professor in Freiburg. His main re- search interests include the visual understanding of computers, three-dimen- sional reconstructions, and the automated and intelli- gent analysis of spatially and temporally resolved microscopic images. Further Reading Brox, T./Malik, J. (2010): Object segmentation by long term analysis of point trajectories. European Conference on Computer Vision (ECCV). www.uni-freiburg.de/go/object- segmentation Bourdev, L./Maji, p./Brox, T./Malik, J. (2010): Detecting people using mutually consistent poselet activations. European Conference on Computer Vision (ECCV). www.uni-freiburg. de/go/detecting-people the scientists will need to accelerate the pro- cess: Currently, the computer needs around a second to search an image and up to two min- utes to complete the learning process. “Real time is not absolutely necessary for the learning process, but it is for the recognition of objects. After all, a robot shouldn’t have to stop all the time to think.” Thomas Brox is regarded as a pioneer in the field with his research approach. “At conferences I notice that other scientists are finding the idea of working with films for image recognition more and more attractive.” The computer scientist and his research group film most of the training vid- eos for the computer themselves: cars, sheep, dogs, and soon humans. In order to capture an object from all sides, he walks around it once while filming. The resolution is high, the exam- ples of the class are representative. These are things most Internet videos can’t offer. “At the end of the learning process it is wise to show the computer a couple of YouTube videos, which usu- ally contain non-standard examples of objects, in order to obtain new variations,” says Brox. “If we only used YouTube, the computer would develop an Internet-centric view of the world that doesn’t adequately reﬂect the real world.” “If we only used YouTube, the computer would develop an Internet-centric view of the world that doesn’t adequately reflect the real world” 15

uni'wissen 02-2012 ENG

Pages