Learning Recognition and Segmentation Using the Cresceptron

Weng, John (Juyang); Ahuja, Narendra; Huang, Thomas S.

doi:10.1023/A:1007967800668

Learning Recognition and Segmentation Using the Cresceptron

Published: November 1997

Volume 25, pages 109–143, (1997)
Cite this article

International Journal of Computer Vision Aims and scope Submit manuscript

John (Juyang) Weng¹,
Narendra Ahuja² &
Thomas S. Huang²

280 Accesses
46 Citations
3 Altmetric
Explore all metrics

Abstract

This paper presents a framework called Cresceptron for view-based learning, recognition and segmentation. Specifically, it recognizes and segments image patterns that are similar to those learned, using a stochastic distortion model and view-based interpolation, allowing other view points that are moderately different from those used in learning. The learning phase is interactive. The user trains the system using a collection of training images. For each training image, the user manually draws a polygon outlining the region of interest and types in the label of its class. Then, from the directional edges of each of the segmented regions, the Cresceptron uses a hierarchical self-organization scheme to grow a sparsely connected network automatically, adaptively and incrementally during the learning phase. At each level, the system detects new image structures that need to be learned and assigns a new neural plane for each new feature. The network grows by creating new nodes and connections which memorize the new image structures and their context as they are detected. Thus, the structure of the network is a function of the training exemplars. The Cresceptron incorporates both individual learning and class learning; with the former, each training example is treated as a different individual while with the latter, each example is a sample of a class. In the performance phase, segmentation and recognition are tightly coupled. No foreground extraction is necessary, which is achieved by backtracking the response of the network down the hierarchy to the image parts contributing to recognition. Several stochastic shape distortion models are analyzed to show why multilevel matching such as that in the Cresceptron can deal with more general stochastic distortions that a single-level matching scheme cannot. The system is demonstrated using images from broadcast television and other video segments to learn faces and other objects, and then later to locate and to recognize similar, but possibly distorted, views of the same objects.

This is a preview of subscription content, log in via an institution to check access.

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors

Article 28 November 2014

Image Segmentation via Weighted Carving Decompositions

The Role of Mid-Level Shape Priors in Perceptual Grouping and Image Abstraction

References

Anderson, J. R. 1990. Cognitive Psychology and Its Implications. 3rd edition, Freeman: New York.
Google Scholar
Arman, F. and Aggarwal, J. K. 1991. Automatic generation of recognition strategies using CAD models. In Proc. IEEE Workshop on Directions in Automated CAD-Based Vision, pp. 124-133.
Bichsel, M. 1991. Strategies of robust object recognition for the automatic identification of human faces. Ph. D. thesis, Swiss Federal Institute of Technology, Zurich, Switzerland.
Google Scholar
Breiman, L., Friedman, J., Olshen, R., and Stone, C. 1984. Classification and regression trees. Wadsworth, CA.
Brooks, R. A. 1981. Symbolic reasoning among 3-D models and 2-D images. Artificial Intelligence, 17(1-3):285-348.
Google Scholar
Carbonetto, S. and Muller, K. J. 1982. Nerve fiber growth and the cellular response to axotomy. Current Topics in Developmental Biology, 17:33-76.
Google Scholar
Carew, T. J. 1989. Developmental assembly of learning in aplysia. Trends in Neurosciences, 12:389-394.
Google Scholar
Carey, S. 1985. Conceptual Change in Childhood. The MIT Press: Cambridge, MA.
Google Scholar
Chen, C. and Kak, A. 1989. A robot vision system for recognizing 3-D objects in low-order polynomial time. IEEE Trans. Systems, Man, and Cybernetics, 19(6):1535-1563.
Google Scholar
Cover, T. M. Learning in pattern recognition. Methodologies of Pattern Recognition, in S. Watanabe (Ed.), Academic Press: New York, pp. 111-132.
Cover, T. M. and Hart, P. E. 1967. Nearest neighbor pattern classification. IEEE Trans. Information Theory, IT-13:21-27.
Google Scholar
Desmond, N. L. and Levy, W. B. 1988. Anatomy of associative long-term synaptic modification. Long-Term Potentiation: From Biophysics to Behavior, in P. W. Landfield and S. A. Deadwyer (Eds.), Alan R. Liss, New York, pp. 265-305.
Google Scholar
Dreher, B. and Sanderson, K. J. 1973. Receptive field analysis: Responses to moving visual contours by single lateral geniculate neurons in the cat. Journal of Physiology, London, 234:95-118.
Google Scholar
Faugeras, O. D. and Hebert, M. 1986. The representation, recognition and location of 3-D objects. Int'l J. Robotics Research, 5(3):27-52.
Google Scholar
Forsyth, D., Mundy, J. L., Zisserman, A., Coelho, C., Heller, A., and Rothwell, C. 1991. Invariant descriptors for 3-D object recognition and pose. IEEE Trans. Pattern Anal. and Machine Intell., 13(10):971-992.
Google Scholar
Fu, K. S. 1968. Sequential methods in Pattern Recognition and Machine Learning, Academic Press: New York.
Google Scholar
Fukushima, K. 1975. Cognitron: A self-organizing multilayered neural network. Biological Cybernetics, 20:121-136.
Google Scholar
Fukushima, K. 1980. Neocognitron: A self-organizing neural network model for a mechanism of pattern recognition unaffected by shift in position. Biological Cybernetics, 36:193-202.
Google Scholar
Fukushima, K., Miyake, S., and Ito, T. 1983. Neocognitron: A neural network model for a mechanism of visual pattern recognition. IEEE Trans. Systems, Man, Cybernetics, 13(5):826-834.
Google Scholar
Gool, L. V., Kempenaers, P., and Oosterlinck, A. 1991. Recognition and semi-differential invariants. In Proc. IEEE Conf. Computer Vision and Pattern Recognition. pp. 454-460.
Grimson, W. E. L. and Lozano-Perez, T. 1984. Model-based recognition sand localization from sparse range or tactile data. International Journal of Robotics Research, 3(3):3-35.
Google Scholar
Guth, L. 1975. History of central nervous system regeneration research. Experimental Neurology, 48(3-15).
Hansen, C. and Henderson, T. C. 1989. CAGD-based computer vision. IEEE Trans. Pattern Anal. and Machine Intell., 10(11):1181- 1193.
Google Scholar
Hebb, D. O. 1949. The organization of behavior. Wiley: New York.
Google Scholar
Hubel, D. H. 1988. Eye, Brain, and Vision. Scientific American Library, 22.
Hubel, D. H. and Wiesel, T. N. 1977. Functional Architecture of macaque monkey visual cortex. Proc. Royal Society of London, Ser. B, Vol. 198, pp. 1-59.
Google Scholar
Huttenlocher, D. P. and Ullman, S. 1987. Object recognition using alignment. In Proc. Int'l Conf. Computer Vision, London, England, pp. 102-111.
Highleyman, W. H. 1962. Linear decision functions, with application to pattern recognition. Proc. IRE, Vol. 50, pp. 1501-1514.
Google Scholar
Iarbus, A. L. 1967. Eye Movements and Vision. Plenum Press: New York.
Google Scholar
Ikeuchi, K. and Kanade, T. 1988. Automatic generation of object recognition programs. In Proc. IEEE, Vol. 76, No. 8, pp. 1016- 1035.
Google Scholar
Jain, A. K. 1989. Fundamentals of Digital Image Processing. Prentice Hall: New Jersey.
Google Scholar
Jain, A. K. and Dubes, R. C. 1988. Algorithms for Clustering Data. Prentice-Hall: New Jersey.
Google Scholar
Jain, A. K. and Hoffman, R. L. 1988. Evidence-based recognition of 3-D objects. IEEE Trans. Pattern Anal. and Machine Intell., 10(6):783-802.
Google Scholar
Kandel, E. and Schwartz, J. H. 1982. Molecular biology of learning: Modulation of transmitter release. Science, 218:433-443.
Google Scholar
Keehn, D. G. 1965. A note on learning for Gaussian properties. IEEE Trans. Information Theory, IT-11:126-132.
Google Scholar
Kohonen, T. 1988. Self-Organization and Associative Memory. 2nd edition, Springer-Verlag: Berlin.
Google Scholar
Kolers, P. A., Duchnicky, R. L., and Sundstroem, G. 1985. Size in visual processing of faces and words. J. Exp. Psychol. Human Percept. Perform., 11:726-751.
Google Scholar
Lamdan, Y. and Wolfson, H. J. 1988. Geometric hashing: A general and efficient model-based recognition scheme. In Proc. 2nd International Conf. Computer Vision, pp. 238-246.
Lévy-Schoen, A. 1981. Flexible and/or rigid control of oculomotor scanning behavior. In J. W. Senders (Ed.), Eye Movements: Cognition and Visual Perception, Lawrence Erlbaum Associates, Hillsdale, NJ, pp. 299-314.
Google Scholar
Lippmann, R. P. 1987. An introduction to computing with neural nets. IEEE ASSP Magazine, 4(2):4-22.
Google Scholar
Loftsgaarden, D. O. and Quesenberry, C. P. 1965. A nonparametric estimate of a multivariate density function. Ann. Math. Stat., 36:1049-1051.
Google Scholar
Lowe, D. G. 1985. Perceptual Organization and Visual Recognition. Kluwer Academic: Hingham, MA.
Google Scholar
Martinez, J. L., Jr. and Kessner, R. P. (Eds.) 1991. Learning and Memory: A Biological View. 2nd edition, Academic Press: San Diego.
Google Scholar
Michalski, R., Mozetic, I., Hong, J., and Lavrac, N. 1986. The multipurpose incremental learning system AQ15 and its testing application to three medical domains. In Proc. Fifth Annual National Conf. Artificial Intelligence, Philadelphia, PA, pp. 1041-1045.
Nazir, T. A. and O'Regan, J. K. 1990. Some results on translation invariance in the human visual system. Spatial Vision, 5(2):81- 100.
Google Scholar
Pavlidis, T. 1992. Why progress in machine vision is so slow. Pattern Recognition Letters, 13:221-225.
Google Scholar
Poggio, T. and Edelman, S. 1990. A network that learns to recognize three-dimensional objects. Nature, 343:263-266.
Google Scholar
Pomerleau, D. A. 1989. ALVINN: An autonomous Land Vehicle in a Neural Network. Advances in Neural Information Processing, in D. Touretzky (Ed.), Vol. 1, pp. 305-313, Morgran-Kaufmann Publishers: San Mateo, CA.
Google Scholar
Quinlan, J. 1986. Introduction of Decision Trees. Machine Learning, 1:81-106.
Google Scholar
Pavlidis, T. 1977. Structural Pattern Recognition. Springer-Verlag: New York.
Google Scholar
Rakic, P. 1988. Specification of cerebral cortical areas. Science, 241:170-176, 1988.
Google Scholar
Ramachandran, V. S. 1990. Perceiving shape from shading. The Perceptual World, in I. Rock (Ed.), Freeman: San Francisco, CA, pp. 127-138.
Google Scholar
Rowley, H. A., Baluja, S., and Kanade, T. 1995. Human face detection in visual scenes. Report CMU-CS-95-158, School of Computer Science, Carnegie Mellon University, Pittsburgh, PA.
Google Scholar
Royden, H. L. Real Analysis. Macmillan: New York.
Rumelhart, D. E., Hinton, G. E., and Williams, R. J. 1986. Learning internal representations by error propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Vol. 1: Foundations, in D. E. Rumelhart and J. L. McClelland (Eds.), MIT Press, MA.
Google Scholar
Sacks, O. 1993. To see and not see. The New Yorker, pp. 59-73.
Sato H. and Binford, T. O. 1992. On finding the ends of straight homogeneous generalized cylinders. In Proc. IEEE Conf. Computer Vision and Pattern Recognition, Urbana, IL, pp. 695- 698.
Google Scholar
Shatz, C. J. 1992. The developing brain. Scientific American, pp. 61- 67.
Stein, F. and Medioni, G. 1992. Structural indexing: Efficient 3-D object recognition. IEEE Trans. Pattern Anal. and Machine Intell., 14(2):125-144.
Google Scholar
Sung, K. and Poggio, T. 1994. Example-based learning for view-based human face detection. A. I. Memo 1521, CBCL paper 112, MIT.
Swets, D., Punch, B., and Weng, J. 1995. Genetic algorithm for object recognition in a complex scene. In Proc. Int'l Conf. on Image Processing, Washington, D. C., pp. 22-25.
Thompson, P. 1980. Margaret Thatcher: a new illusion. Perception, 9:483-484.
Google Scholar
Treisman, A. M. 1983. The role of attention in object perception. Physical and Biological Processing of Images, in O. J. Braddick and A. C. Sleigh (Eds.), Springer-Verlag: Berlin.
Google Scholar
Turk, M. and Pentland, A. 1991. Eigenfaces for recognition. Journal of Cognitive Neuroscience, 3(1):71-86.
Google Scholar
Weiss, I. 1993. Geometric invariants and object recognition. Int'l Journal of Computer Vision, 10(3):207-231.
Google Scholar
Weng, J. 1993. On the structure of retinotopic hierarchical networks. In Proc. World Congress on Neural Networks, Portland, Oregon, Vol. IV, pp. 149-153.
Google Scholar
Weng, J. 1996. Cresceptron and SHOSLIF: Toward comprehensive visual learning. In S. K. Nayar and T. Poggio (Eds.), Early Visual Learning, Oxford University Press: New York.
Google Scholar
Weng, J., Ahuja, N., and Huang, T. S. 1992. Cresceptron: A self-organizing neural network which grows adaptively. In Proc. International Joint Conference on Neural Networks, Baltimore, Maryland, Vol. I, pp. 576-581.
Google Scholar
Weng, J., Ahuja, N., and Huang, T. S. 1993. Learning recognition and segmentation of 3-D objects from 2-D images. In Proc. 4th International Conf. Computer Vision, Berlin, Germany, pp. 121- 128.
Wilson, H. R. and Giese, S. C. 1977. Threshold visibility of frequency gradient patterns. Vision Research, 17:1177-1190.
Google Scholar
Wilson, H. R. and Bergen, J. R. 1979. A four mechanism model for spatial vision. Vision Research, 19:19-32.
Google Scholar
Yang, G. and Huang, T. S. 1994. Human face detection in a complex background. Pattern Recognition, 27(1):53-63.
Google Scholar

Download references

Author information

Authors and Affiliations

Department of Computer Science, Michigan State University, East Lansing, MI, 48824, USA
John (Juyang) Weng
Beckman Institute, University of Illinois, 405 N. Mathews Avenue, Urbana, IL, 61801, USA
Narendra Ahuja & Thomas S. Huang

Authors

John (Juyang) Weng
View author publications
You can also search for this author in PubMed Google Scholar
Narendra Ahuja
View author publications
You can also search for this author in PubMed Google Scholar
Thomas S. Huang
View author publications
You can also search for this author in PubMed Google Scholar

Rights and permissions

Reprints and permissions

About this article

Cite this article

Weng, J.(., Ahuja, N. & Huang, T.S. Learning Recognition and Segmentation Using the Cresceptron. International Journal of Computer Vision 25, 109–143 (1997). https://doi.org/10.1023/A:1007967800668

Download citation

Issue Date: November 1997
DOI: https://doi.org/10.1023/A:1007967800668

Access this article

Log in via an institution

Price excludes VAT (USA)
Tax calculation will be finalised during checkout.

Instant access to the full article PDF.

Institutional subscriptions

Learning Recognition and Segmentation Using the Cresceptron

Abstract

Access this article

Similar content being viewed by others

Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors

Image Segmentation via Weighted Carving Decompositions

The Role of Mid-Level Shape Priors in Perceptual Grouping and Image Abstraction

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Navigation

Learning Recognition and Segmentation Using the Cresceptron

Abstract

Access this article

Similar content being viewed by others

Scene Parsing with Object Instance Inference Using Regions and Per-exemplar Detectors

Image Segmentation via Weighted Carving Decompositions

The Role of Mid-Level Shape Priors in Perceptual Grouping and Image Abstraction

References

Author information

Authors and Affiliations

Rights and permissions

About this article

Cite this article

Share this article

Search

Navigation