Abstract
This paper presents a framework called Cresceptron for view-based learning, recognition and segmentation. Specifically, it recognizes and segments image patterns that are similar to those learned, using a stochastic distortion model and view-based interpolation, allowing other view points that are moderately different from those used in learning. The learning phase is interactive. The user trains the system using a collection of training images. For each training image, the user manually draws a polygon outlining the region of interest and types in the label of its class. Then, from the directional edges of each of the segmented regions, the Cresceptron uses a hierarchical self-organization scheme to grow a sparsely connected network automatically, adaptively and incrementally during the learning phase. At each level, the system detects new image structures that need to be learned and assigns a new neural plane for each new feature. The network grows by creating new nodes and connections which memorize the new image structures and their context as they are detected. Thus, the structure of the network is a function of the training exemplars. The Cresceptron incorporates both individual learning and class learning; with the former, each training example is treated as a different individual while with the latter, each example is a sample of a class. In the performance phase, segmentation and recognition are tightly coupled. No foreground extraction is necessary, which is achieved by backtracking the response of the network down the hierarchy to the image parts contributing to recognition. Several stochastic shape distortion models are analyzed to show why multilevel matching such as that in the Cresceptron can deal with more general stochastic distortions that a single-level matching scheme cannot. The system is demonstrated using images from broadcast television and other video segments to learn faces and other objects, and then later to locate and to recognize similar, but possibly distorted, views of the same objects.
Similar content being viewed by others
References
Anderson, J. R. 1990. Cognitive Psychology and Its Implications. 3rd edition, Freeman: New York.
Arman, F. and Aggarwal, J. K. 1991. Automatic generation of recognition strategies using CAD models. In Proc. IEEE Workshop on Directions in Automated CAD-Based Vision, pp. 124-133.
Bichsel, M. 1991. Strategies of robust object recognition for the automatic identification of human faces. Ph. D. thesis, Swiss Federal Institute of Technology, Zurich, Switzerland.
Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and regression trees. Wadsworth, CA.
Brooks, R. A. 1981. Symbolic reasoning among 3-D models and 2-D images. Artificial Intelligence, 17(1-3):285-348.
Carbonetto, S. and Muller, K. J. 1982. Nerve fiber growth and the cellular response to axotomy. Current Topics in Developmental Biology, 17:33-76.
Carew, T. J. 1989. Developmental assembly of learning in aplysia. Trends in Neurosciences, 12:389-394.
Carey, S. 1985. Conceptual Change in Childhood. The MIT Press: Cambridge, MA.
Chen, C. and Kak, A. 1989. A robot vision system for recognizing 3-D objects in low-order polynomial time. IEEE Trans. Systems, Man, and Cybernetics, 19(6):1535-1563.
Cover, T. M. Learning in pattern recognition. Methodologies of Pattern Recognition, in S. Watanabe (Ed.), Academic Press: New York, pp. 111-132.
Cover, T. M. and Hart, P. E. 1967. Nearest neighbor pattern classification. IEEE Trans. Information Theory, IT-13:21-27.
Desmond, N. L. and Levy, W. B. 1988. Anatomy of associative long-term synaptic modification. Long-Term Potentiation: From Biophysics to Behavior, in P. W. Landfield and S. A. Deadwyer (Eds.), Alan R. Liss, New York, pp. 265-305.
Dreher, B. and Sanderson, K. J. 1973. Receptive field analysis: Responses to moving visual contours by single lateral geniculate neurons in the cat. Journal of Physiology, London, 234:95-118.
Faugeras, O. D. and Hebert, M. 1986. The representation, recognition and location of 3-D objects. Int'l J. Robotics Research, 5(3):27-52.
Forsyth, D., Mundy, J. L., Zisserman, A., Coelho, C., Heller, A., and Rothwell, C. 1991. Invariant descriptors for 3-D object recognition and pose. IEEE Trans. Pattern Anal. and Machine Intell., 13(10):971-992.
Fu, K. S. 1968. Sequential methods in Pattern Recognition and Machine Learning, Academic Press: New York.
Fukushima, K. 1975. Cognitron: A self-organizing multilayered neural network. Biological Cybernetics, 20:121-136.
Fukushima, K. 1980. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36:193-202.
Fukushima, K., Miyake, S., and Ito, T. 1983. Neocognitron: A neural network model for a mechanism of visual pattern recognition. IEEE Trans. Systems, Man, Cybernetics, 13(5):826-834.
Gool, L. V., Kempenaers, P., and Oosterlinck, A. 1991. Recognition and semi-differential invariants. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. pp. 454-460.
Grimson, W. E. L. and Lozano-Perez, T. 1984. Model-based recognition sand localization from sparse range or tactile data. International Journal of Robotics Research, 3(3):3-35.
Guth, L. 1975. History of central nervous system regeneration research. Experimental Neurology, 48(3-15).
Hansen, C. and Henderson, T. C. 1989. CAGD-based computer vision. IEEE Trans. Pattern Anal. and Machine Intell., 10(11):1181- 1193.
Hebb, D. O. 1949. The organization of behavior. Wiley: New York.
Hubel, D. H. 1988. Eye, Brain, and Vision. Scientific American Library, 22.
Hubel, D. H. and Wiesel, T. N. 1977. Functional Architecture of macaque monkey visual cortex. Proc. Royal Society of London, Ser. B, Vol. 198, pp. 1-59.
Huttenlocher, D. P. and Ullman, S. 1987. Object recognition using alignment. In Proc. Int'l Conf. Computer Vision, London, England, pp. 102-111.
Highleyman, W. H. 1962. Linear decision functions, with application to pattern recognition. Proc. IRE, Vol. 50, pp. 1501-1514.
Iarbus, A. L. 1967. Eye Movements and Vision. Plenum Press: New York.
Ikeuchi, K. and Kanade, T. 1988. Automatic generation of object recognition programs. In Proc. IEEE, Vol. 76, No. 8, pp. 1016- 1035.
Jain, A. K. 1989. Fundamentals of Digital Image Processing. Prentice Hall: New Jersey.
Jain, A. K. and Dubes, R. C. 1988. Algorithms for Clustering Data. Prentice-Hall: New Jersey.
Jain, A. K. and Hoffman, R. L. 1988. Evidence-based recognition of 3-D objects. IEEE Trans. Pattern Anal. and Machine Intell., 10(6):783-802.
Kandel, E. and Schwartz, J. H. 1982. Molecular biology of learning: Modulation of transmitter release. Science, 218:433-443.
Keehn, D. G. 1965. A note on learning for Gaussian properties. IEEE Trans. Information Theory, IT-11:126-132.
Kohonen, T. 1988. Self-Organization and Associative Memory. 2nd edition, Springer-Verlag: Berlin.
Kolers, P. A., Duchnicky, R. L., and Sundstroem, G. 1985. Size in visual processing of faces and words. J. Exp. Psychol. Human Percept. Perform., 11:726-751.
Lamdan, Y. and Wolfson, H. J. 1988. Geometric hashing: A general and efficient model-based recognition scheme. In Proc. 2nd International Conf. Computer Vision, pp. 238-246.
Lévy-Schoen, A. 1981. Flexible and/or rigid control of oculomotor scanning behavior. In J. W. Senders (Ed.), Eye Movements: Cognition and Visual Perception, Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 299-314.
Lippmann, R. P. 1987. An introduction to computing with neural nets. IEEE ASSP Magazine, 4(2):4-22.
Loftsgaarden, D. O. and Quesenberry, C. P. 1965. A nonparametric estimate of a multivariate density function. Ann. Math. Stat., 36:1049-1051.
Lowe, D. G. 1985. Perceptual Organization and Visual Recognition. Kluwer Academic: Hingham, MA.
Martinez, J. L., Jr. and Kessner, R. P. (Eds.) 1991. Learning and Memory: A Biological View. 2nd edition, Academic Press: San Diego.
Michalski, R., Mozetic, I., Hong, J., and Lavrac, N. 1986. The multipurpose incremental learning system AQ15 and its testing application to three medical domains. In Proc. Fifth Annual National Conf. Artificial Intelligence, Philadelphia, PA, pp. 1041-1045.
Nazir, T. A. and O'Regan, J. K. 1990. Some results on translation invariance in the human visual system. Spatial Vision, 5(2):81- 100.
Pavlidis, T. 1992. Why progress in machine vision is so slow. Pattern Recognition Letters, 13:221-225.
Poggio, T. and Edelman, S. 1990. A network that learns to recognize three-dimensional objects. Nature, 343:263-266.
Pomerleau, D. A. 1989. ALVINN: An autonomous Land Vehicle in a Neural Network. Advances in Neural Information Processing, in D. Touretzky (Ed.), Vol. 1, pp. 305-313, Morgran-Kaufmann Publishers: San Mateo, CA.
Quinlan, J. 1986. Introduction of Decision Trees. Machine Learning, 1:81-106.
Pavlidis, T. 1977. Structural Pattern Recognition. Springer-Verlag: New York.
Rakic, P. 1988. Specification of cerebral cortical areas. Science, 241:170-176, 1988.
Ramachandran, V. S. 1990. Perceiving shape from shading. The Perceptual World, in I. Rock (Ed.), Freeman: San Francisco, CA, pp. 127-138.
Rowley, H. A., Baluja, S., and Kanade, T. 1995. Human face detection in visual scenes. Report CMU-CS-95-158, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA.
Royden, H. L. Real Analysis. Macmillan: New York.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986. Learning internal representations by error propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Foundations, in D. E. Rumelhart and J. L. McClelland (Eds.), MIT Press, MA.
Sacks, O. 1993. To see and not see. The New Yorker, pp. 59-73.
Sato H. and Binford, T. O. 1992. On finding the ends of straight homogeneous generalized cylinders. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, Urbana, IL, pp. 695- 698.
Shatz, C. J. 1992. The developing brain. Scientific American, pp. 61- 67.
Stein, F. and Medioni, G. 1992. Structural indexing: Efficient 3-D object recognition. IEEE Trans. Pattern Anal. and Machine Intell., 14(2):125-144.
Sung, K. and Poggio, T. 1994. Example-based learning for view-based human face detection. A. I. Memo 1521, CBCL paper 112, MIT.
Swets, D., Punch, B., and Weng, J. 1995. Genetic algorithm for object recognition in a complex scene. In Proc. Int'l Conf. on Image Processing, Washington, D. C., pp. 22-25.
Thompson, P. 1980. Margaret Thatcher: a new illusion. Perception, 9:483-484.
Treisman, A. M. 1983. The role of attention in object perception. Physical and Biological Processing of Images, in O. J. Braddick and A. C. Sleigh (Eds.), Springer-Verlag: Berlin.
Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71-86.
Weiss, I. 1993. Geometric invariants and object recognition. Int'l Journal of Computer Vision, 10(3):207-231.
Weng, J. 1993. On the structure of retinotopic hierarchical networks. In Proc. World Congress on Neural Networks, Portland, Oregon, Vol. IV, pp. 149-153.
Weng, J. 1996. Cresceptron and SHOSLIF: Toward comprehensive visual learning. In S. K. Nayar and T. Poggio (Eds.), Early Visual Learning, Oxford University Press: New York.
Weng, J., Ahuja, N., and Huang, T. S. 1992. Cresceptron: A self-organizing neural network which grows adaptively. In Proc. International Joint Conference on Neural Networks, Baltimore, Maryland, Vol. I, pp. 576-581.
Weng, J., Ahuja, N., and Huang, T. S. 1993. Learning recognition and segmentation of 3-D objects from 2-D images. In Proc. 4th International Conf. Computer Vision, Berlin, Germany, pp. 121- 128.
Wilson, H. R. and Giese, S. C. 1977. Threshold visibility of frequency gradient patterns. Vision Research, 17:1177-1190.
Wilson, H. R. and Bergen, J. R. 1979. A four mechanism model for spatial vision. Vision Research, 19:19-32.
Yang, G. and Huang, T. S. 1994. Human face detection in a complex background. Pattern Recognition, 27(1):53-63.
Author information
Authors and Affiliations
Rights and permissions
About this article
Cite this article
Weng, J.(., Ahuja, N. & Huang, T.S. Learning Recognition and Segmentation Using the Cresceptron. International Journal of Computer Vision 25, 109–143 (1997). https://doi.org/10.1023/A:1007967800668
Issue Date:
DOI: https://doi.org/10.1023/A:1007967800668