AI Methods for Understanding Images & Pictures

AITopics > Vision

anatomy of an eye

There are fundamental questions to be answered about the architecture of a visual system. For nearly two decades, the field has assumed that the visual system can be decomposed into independent modules, each performing a well-defined function, like estimating color, and that their outputs are integrated at a later stage.

Is this a valid hypothesis?

- Harry G. Barrow & J. M. Tenenbaum, Retrospective on
"Interpreting Line Drawings as 3-Dimensional Surfaces"

Vision involves both the acquisition and processing of visual information. AI powered technologies have made possible such astounding achievements as vehicles that are able to safely steer themselves along our superhighways, and computers that can recognize and interpret facial expressions. Just consider the complexity of the analysis that your brain must engage in before determining that something as apparently simple as the fact that the black squares on a chess board are not holes, but rather part of the surface, and you'll have an idea of how sophisticated vision systems must be in order to reliably perform their objectives.

And there is so much more to vision than meets the eye, such as when fog or snow obscures a portion of the road ahead of you (or imagine that you are scuba diving in murky waters...). Just as you are able to fill in the missing pieces of vaguely defined areas based upon experience and your general knowledge of the environment, AI programs make possible the enhancement, interpretation, recognition, identification and other processing of partial images.

AI vision technology has made possible such applications as: image stabilization, 3D modeling, image synthesis, surgical navigation, handwritten document recognition, and vision based computer interfaces. Follow the links below to see what vision projects AI scientists are currently working on.

Definition of the Field

Computer vision: Cheat Sheet. (December 6, 2011), by Natasha Lomas. "What exactly is computer vision then? Computer vision is a research field working to equip computers with the ability to process and understand visual data, as sighted humans can. Human brains process the gigabytes of data passing through our eyes every second and translate that data into sight - that is, into discrete objects and entities we can recognise or understand. Similarly, computer vision aims to give computers the ability to understand what they are seeing, and act intelligently on that knowledge."

Comprehensive Resource

CVonline: The Evolving, Distributed, Non-Proprietary, On-Line Compendium of Computer Vision. Links to newly rewritten Computer Vision articles on Wikipedia, applications, bibliographies, conferences, evaluation data, software, and more.

Introductory Readings

Machine learns games 'like a human.' By Will Knight. New Scientist News (January 24, 2005). "A computer that learns to play a 'scissors, paper, stone' by observing and mimicking human players could lead to machines that automatically learn how to spot an intruder or perform vital maintenance work, say UK researchers. CogVis, developed by scientists at the University of Leeds in Yorkshire, UK, teaches itself how to play the children's game by searching for patterns in video and audio of human players and then building its own 'hypotheses' about the game's rules. In contrast to older artificial intelligence (AI) programs that mimic human behaviour using hard-coded rules, CogVis takes a more human approach, learning through observation and mimicry, the researchers say. ... 'A system that can observe events in an unknown scenario, learn and participate just as a child would is almost the Holy Grail of AI,' says Derek Magee from the University of Leeds." Be sure to see the sidebar with related articles & web sites.

Computer system said to help stop drowning. By Ed Frauenheim. CNET (January 31, 2005). "A man swimming in a pool near Paris almost drowned last week but was rescued with the help of a computer vision surveillance system, the maker of the system said. The Poseidon drowning-detection system also helped lifeguards save the life of a teenager in France who nearly drowned in 2000, and last year it helped lifeguards in Germany rescue an elderly man who nearly drowned after a heart attack, said Poseidon's maker, Vision IQ. ... Poseidon is a computer vision surveillance system designed to recognize texture, volume and movement within a pool."

Computer Vision and Speech. Crossroads, The ACM Student Magazine. Fall 2007; Issue13.4. As stated in the Introduction, by Niels Ole Bernsen: " If you are interested in computers with human capabilities, vision and speech open an entirely new world of computers that can see and talk like we do. Computer vision is the moody input cousin of computer graphics-in graphics, you have all the time you can afford to program the rendering, but visual input is an unpredictable and messy reality. ... The two articles on machine vision make two equally interesting points about the present state of the field. Taking human faces as an example, Justin Solomon [The Science of Shape:Revolutionizing Graphics and Vision with the Third Dimension] compares the relative ease with which it is possible to solve complex face rendering problems with the difficulty of modeling the unique face each one of us has. Gang Gao and Paul Cockshott [Use of Motion Field Warping to Generate Cardiac Images] describe how smart use of computer image processing promises a robust shortcut solution to the integration of magnetic resonance images of the same object generated using two different imaging techniques."

Clearer computer vision. A profile of Nikos Paragios, 34, of École Centrale Paris, one of Technology Review's 2006 Young Innovators Under 35. By Shereen El-Feki(September 8, 2006). "Today Paragios is a leader in computer vision. Among his many projects is the mathematical modeling of hand gestures. The idea is to develop software to translate sign language into text, easing communication between the hearing and the deaf. The models could also allow drivers to simply point at icons printed on a dashboard--gestures that would be interpreted by onboard cameras and computers -- rather than twisting knobs or pressing buttons. Paragios is best known for his contributions to medical imaging. As a research scientist at Siemens in Princeton, NJ, he created software to automatically detect and define the boundaries of anatomical structures."

U. Md. helping to fine-tune advanced fire alert system. By Dena Levitz. The Examiner (September 15, 2006). "University of Maryland researchers are teaming up with a Sparks, Md., company to perfect a camera system with the potential to detect fire and smoke within seconds. The high-tech SigniFire system will go a long way in saving lives and property, especially at locations like nursing homes and college dormitories where time is of the essence, those involved with the project said Thursday. ... 'If you were standing in a room, you would detect the fire quicker with your eyes than waiting for the smoke to build up to the smoke detector,' said Mack Mottley, chief operating officer of axonX LLC. 'And that’s what we’re doing, using it like a pair of eyes. It’s basically video-based artificial intelligence.'"

Recent advances in computer vision. By Massimo Picardi and Tony Jan. The Industrial Physicist (February/March 2003; Volume 9, Number 1). "Computer vision is the branch of artificial intelligence that focuses on providing computers with the functions typical of human vision. To date, computer vision has produced important applications in fields such as industrial automation, robotics, biomedicine, and satellite observation of Earth. ... The availability of affordable hardware and software has opened the way for new, pervasive applications of computer vision."

Computer Vision Homepage. Computer Science Department, Carnegie Mellon University. This site is a clearinghouse for information on everything about computer vision research. Links to groups, papers, software and more . . . and be sure to see their collection of Computer Vision Online Demos.

Machine Vision Glossaries of Terms

Automated Imaging Association spans Acquisition to Verification.

What Robots See. A slide show summarizing the various methods utlized by some of the vehicles in the 2005 DARPA Grand Challenge. Provided by NOVA to accompany their March 2006 broadcast of The Great Robot Race.

See how Kismet sees. Visit this page from the Humanoid Robotics Group at the MIT Artificial Intelligence Laboratory for a peek inside of Kismet's head.

IBM gets smart about Artificial Intelligence. By Pamela Kramer. IBM Think Research (June 2001). "Computer vision is important to speech recognition, too. Visual cues help computers decipher speech sounds that are obscured by environmental noise. Chalapathy Neti, manager of IBM's audiovisual speech technologies (AVST) group at Watson, often cites HAL's lip-reading ability in 2001 in promoting the group's work."

General Readings

Machine Vision Online. From AIA (Automated Imaging Association). News, feature articles, archives, a glossary, and a very comprehensive collection of machine vision market data.

Future Vision: BCS Thought Leadership Debate, 25 November 2004. "Nearly 50 years ago the Dartmouth Artificial Intelligence Conference at Dartmouth College in the USA brought together many leading researchers and launched much of what we now recognize as computer science.

Two papers addressed the question' 'Can a computer see?' Those pioneers linked a digitizer to a computer and reported results from what we would now consider to be primitive pattern recognition. Have we progressed much since then, and 50 years from now will it be considered that we have progressed much more? If so, in what ways? ..."

Giving robots the gift of sight. By Ed Frauenheim. CNET (December 30, 2002). "Hans Moravec has completed work on a three-dimensional robotic vision system he says will allow machines to make their way through offices and homes. ... Moravec's system consists of stereoscopic digital cameras and a 3D grid set up in the robot's computer brain."

To date, computer vision systems have been unable to emulate the full capabilities of the human visual system. The human eye-brain combination has proved able to categorise previously unseen objects with ease, using background knowledge and context. We recognise a pig as a pig because of the shape of its body and because we see it in a farmyard or field. ... [T]he VAMPIRE project seeks to enable cognitive computer vision systems to develop similar capabilities."

Cognitive Vision Systems: "Integration of intelligence into vision systems will transform them from processing engines that deliver information, to perception engines that acquire, maintain and deliver selective knowledge, i.e., to cognitive vision systems. A cognitive vision system may therefore be said to combine the acquisition and processing of visual data relating to objects, processes or events, and the mobilisation of specific knowledge for reasoning, establishing conclusions and making decisions. Put another way, it is a vision system with the capability of perceiving and interacting with its environment. ... The methods and technologies that facilitate reasoning in response to visual perception come mostly from the domain of Artificial Intelligence: *Intelligent agents.... *Self-adaptive software.... *Qualitative reasoning.... *Methods for memory and knowledge organisation and the intelligent deployment of that knowledge...."

Seeing is Believing: Computer Vision and Artificial Intelligence. By Christopher O. Jaynes. ACM Crossroads (the student magazine of the Association for Computing Machinery), 1996. "The importance of computer vision to the field of AI is fairly obvious: intelligent agents need to acquire knowledge of the world through a set of sensors. What is not so obvious is the importance that AI has to the field of computer vision. Indeed, I believe that the study of perception and intelligence are necessarily intertwined. This article will look at the role that knowledge plays in computer vision and how the use of reasoning, context, and knowledge in visual tasks reduces the complexity of the general problem."

New algorithm improves robot vision. By David Orenstein. Stanford Report (December 7, 2005). "This week, however, Stanford computer scientists will unveil a machine vision algorithm that gives robots the ability to approximate distances from single still images. ... With substantial sensor arrays and considerable investment, robots are gaining the ability to navigate adequately. Stanley, the Stanford robot car that drove a desert course in the DARPA Grand Challenge this past October, used lasers and radar as well as a video camera to scan the road ahead. Using the work of [Assistant Professor Andrew] Ng and his students, robots that are too small to carry many sensors or that must be built cheaply could navigate with just one video camera. In fact, using a simplified version of the algorithm, Ng has enabled a radio-controlled car to drive autonomously for several minutes through a cluttered, wooded area before crashing." [A related video is available via a link in the sidebar.]

Bugs Taking Over Robot Guidance. By Lakshmi Sandhana. Wired News (January 14, 2004). "Large UAVs that fly at high altitudes employ sensing mechanisms based on GPS or radar technologies, but those methods fail when it comes to scaled-down vehicles with smaller wingspans.... To create intelligent artificial-vision packages that weigh only a few grams and contain all the necessary optics, hardware and software, researchers have turned to creatures that manage it all with brains that weigh less than a milligram. 'Insects are a natural source of inspiration for aerospace, primarily since they were the first creatures to fly, 300 million or so years ago,' says Javaan Chahl of the Australian Defence Science and Technology Organisation's weapons systems division. Chahl is working along with professor M.V. Srinivasan, director of the Centre for Visual Sciences at the Australian National University, to design vision and navigation systems based on the honeybee. ... The key lies in understanding how insects perceive their world -- a concept called optic flow. ... Besides military applications, the technology has enormous commercial spinoffs, including intelligent vehicle systems, autonomous robots, intelligent toys, panoramic imaging systems and sensors that aid the blind, just to name a few."

Cars that Think. PBS television broadcast of Scientific American Frontiers show (January 26, 2005). "The fully automatic car may be down the road a ways, but cars that do your thinking for you are just around the corner -- they watch out for hazards, they listen to you, they read your lips, they even know when you're distracted."

Certainly it's not human - The machine vision process consists of four byte-intensive steps. By Nicholas Sheble. ISA - The Instrumentation, Systems, and Automation Society. (February 1, 2005). "Image interpretation consists of matching the processed image against a set of stored images to make identification. Template matching involves superposing the processed image over the stored image and measuring, for example, the percentage of pixels that do not correspond. Feature matching, a more sophisticated approach, involves calculating a weighted function of a number of features of the processed image and comparing it with the same function calculated for the stored image."

A Summary of the 2001 North American Machine Vision Market Study. By Nello Zuech. AIA's Machine Vision Online, March 2002. "Examining the sales of 320 companies selling machine vision products into the North American market, the North American machine vision market only declined 6.8% in terms of revenues and 17.5% in terms of units in 2001 in spite of the terrible economy."

Vision as a Computational Process. Chapter 9 of Aaron Sloman's 1978 text, The Computer Revolution in Philosophy, has been recently annotated and is now available online. As stated in the Introduction: "In this chapter I wish to elaborate on a theme which Immanuel Kant found obvious: there is no perception without prior knowledge and abilities."

Related Resources

AI on the Web: Perception and Robotics. A resource companion to Stuart Russell and Peter Norvig's "Artificial Intelligence: A Modern Approach" with links to reference material, people, research groups, books, companies and much more.

Annotated Computer Vision Bibliography. Maintained by Keith Price.

Artificial Intelligence Center Perception Program at SRI International - "3D modeling and interpretation, analysis of range images, image matching and autonomous navigation, integration of multiple information sources, linear feature detection and analysis, model-based scene analysis, natural object recognition, optimization-based recognition, partitioning and perceptual organization, representation of natural scenes, stereo analysis."

Artificial Intelligence, Vision & Robotics research at Yale University's Department of Computer Science. "The name 'artificial intelligence' covers a lot of disparate problem areas, united mainly by the fact that they involve complex inputs and outputs that are difficult to compute (or even check for correctness when supplied). One of the most interesting such areas is sensor-controlled behavior, in which a machine acts in the real world using information gathered from sensors such as sonars and cameras. This is a major focus of A.I. research at Yale. The difference between sensor-controlled behavior and what computers usually do is that the input from a sensor is ambiguous."

Computer Vision at MERL, the Mitsubishi Electric Research Laboratories (the North American arm of the Corporate R&D organization of the Mitsubishi Electric Corporation). "Computer Vision is the branch of computer science concerned with the analysis of images to extract information about the world. ... Much of the computer vision research at MERL is focused on the area of surveillance. For example, MERL has pioneered a state of the art approach to detecting object classes such as human faces in cluttered scenes. This approach uses a powerful machine learning framework to automatically build very fast object detectors given a set of positive and negative examples of the object class. The same approach has been successfully applied to the problems of pedestrian detection, facial feature finding, face recognition, and gender and race classification."

Computer Vision Course Home Pages. Maintained by Qiang Ji, Department of Electrical, Computer, and Systems Engineering, Rensselaer Polytechnic Institute. The Computer Vision Industry. By David Lowe, "This web page provides information on some commercially successful applications of computer vision with pointers to company home pages."

Introduction to Active Contours and Visual Dynamics. A very interesting collection of video clips from Andrew Blake, Visual Dynamics Group, Department of Engineering Science, University of Oxford that answers the question: " Computers have been getting better and better at seeing movement on video. How is it that they read lips, follow a dancing girl or copy an actor making faces?" Subjects covered include Marker-free biometrics, Surveillance, Traffic monitoring, and Tracking agile motion ("A key factor in recent successes in making computers 'see' moving objects has been in getting the computer to anticipate.")

Omnidirectional Vision Home Page. "Hosted by the GRASP Laboratory, maintained by Kostas Daniilidis, part of the IEEE Robotics and Automation Society, [Technical Committee] on Computer and Robot Vision activities." A collection of links to reseach projects and companies from around the world.

Some Vision Groups

Other References Offline

Eyes for Computers: How HAL Could "See." By Azriel Rosenfeld. Chapter 10 of HAL's Legacy: 2001's Computer as Dream and Reality, edited by David G. Stork (MIT Press, 1996). The abstract is available online: "At the time 2001 was filmed, work was well underway on giving computers the ability to 'see' the world by analyzing images. ... The field of computer vision deals with methods a computer can use to obtain information about objects and events in a scene by analysing images of the scene. These methods need not resemble those used by humans (or animals) to see the world as long as they yield correct results. ...."

Tags: Vision
AAAI   Recent Changes   Edit   History   Print   Contact Us
Page last modified on December 11, 2011, at 03:25 PM