Abstract
We describe herein a new means of training dynamic multilayer nonlinear adaptive filters, orneural networks. We restrict our discussion to multilayer dynamic Volterra networks, which are structured so as to restrict their degrees of computational freedom, based on a priori knowledge about the dynamic operation to be emulated. The networks consist of linear dynamic filters together with nonlinear generalized single-layer subnets. We describe how a Newton-like optimization strategy can be applied to these dynamic architectures and detail a newmodified Gauss-Newton optimization technique. The new training algorithm converges faster and to a smaller value of cost than backpropagation-through-time for a wide range of adaptive filtering applications. We apply the algorithm to modeling the inverse of a nonlinear dynamic tracking system. The superior performance of the algorithm over standard techniques is demonstrated.
Similar content being viewed by others
References
L. Armijo, Minimization of functions having Lipschitz continuous first-partial derivatives,Pacific J. Math., vol. 16, no. 1 pp. 1–3, 1966.
A. Back et al., A unifying view of some training algorithms for multilayer perceptrons with FIR filter synapses,Proc. 1994 IEEE Workshop on Neural Networks for Signal Processing, vol. 1, pp. 146–154, 1994.
R. Battiti, First-and second-order methods for learning: Between steepest descent and Newton's method,Neural Comp., vol. 4, pp. 141–166, 1992.
E. B. Baum and D. Haussler, What size net gives valid generalization?,Neural Comp., vol. 1, pp. 151–160, 1989
D. P. Bertsekas,Nonlinear Programming, vol. 1, 2nd edition, Athena Scientific, Belmont, MA, 1995.
L. O. Chua, S. P. Boyd, and Y. S. Tang, Measuring Volterra kernells,IEEE Trans. Circuits Systems, vol. CAS-30, pp. 571–577, August 1983.
P. Eykhoff,System Identification, 1st edition, John Wiley & Sons, New York, 1979.
R. Fletcher,Practical Methods of Optimization, vol. 1, John Wiley, New York, 1980.
G. F. Franklin, J. D. Powell and A. Emami-Naeini,Feedback Control of Dynamic Systems, 2nd edition, Addison-Wesley, Reading, MA, 1991.
G. F. Franklin, J. D. Powell and M. L. Workman,Digital Control of Dynamic Systems, Addison-Wesley, Reading, MA, 1990.
G. Govind and P. A. Ramapoorthy, Multi-layered neural networks and Volterra series: The missing link,IEEE International Conference on Systems Engineering, August 1990, 633-6.
S. Haykin,Neural Networks, Macmillan College Publishing, New York, 1994.
S. B. Holden and P. J. W. Rayner, Generalization and PAC learning: Some new results in the class of generalized single-layer networks,IEEE Trans. Neural Networks, vol. 6, no. 2, pp. 368–377, 1995.
T. Kailath and B. Hassibi,H ∞ optimal training algorithms and their relation to backpropagation,Proc. NIPS94-Neural Information Processing Systems: Natural and Synthetic, pp. 191–198, November–December 1994.
T. Kailath and B. Hassibi,H ∞ optimality of the LMS algorithm,IEEE Trans. Signal Process., vol. 44, pp. 267–280, 1996.
K. I. Kim and E. J. Powers,Orthogonalised frequency domain Volterra model for non-Gaussian inputs, IEE Proc.-F, vol. 140, no. 6, pp. 403–409, 1993.
A. Lapedes and R. Farber, Nonlinear signal processing using neural networks: Prediction and system modeling, Technical Report LA-UR-87-2662, Los Alamos National Laboratory, Los Alamos, NM, 1987.
D. G. Luenberger,Linear and Nonlinear Programming, 2nd edition, Addison-Wesley, Reading, MA, 1984.
D. W.Marquardt, An Algorithm for least squares estimation of nonlinear parameters,SIAM J., vol. 11, 431–444, 1963.
P. J. W. Rayner, M. R. Lynch, and S. B. Holden, Removal of degeneracy in adaptive Volterra networks by dynamic structuring,Proc. ICASSP, pp. 2069–2072, 1991.
D. E. Rumelhart, J. L. McLelland, and the PDP Research Group,Parallel Distributed Processing, Explorations in the Microstructures of Cognition, vol. 1, MIT Press, Cambridge MA, 1987.
K. Saito and R. Nakano, Partial BFGS update and efficient step-length calculation for three-layer neural networks,Neural Comp., vol. 9, pp. 123–141, 1997.
S. D. Sterns and B. Widrow,Adaptive Signal Processing, Prentice-Hall, Englewood Cliffs, NJ, 1985.
D. S. Touretsky, What size net gives valid generalization,Adv. Neural Inform. Systems, pp. 81–90, 1989.
M. Vidyasagar,Nonlinear System Analyses, 2nd edition, Prentice-Hall, Englewood Cliffs, NJ, 1993.
V. Volterra,Theory of Functions, Blackie & Sons, Glasgow, Scotland, 1930.
E. Wan, Discrete time neural networks,J. Appl. Intell., no. 3, pp. 91–105 1993,
E. A. Wan, Finite impulse response neural networks with application in time-series prediction, Ph.D. thesis, Stanford University, Palo Alto, CA, November 1993.
P. J. Werbos, Backpropagation through time: What it does and how to do it,Proc. IEEE, vol. 78, no. 10, pp. 1550–60, October 1990.
D. H. Wolpert, A mathematical theory of generalization: Part 1,Complex Systems, vol. 4, pp. 151–200, 1990.
D. H. Wolpert, A mathematical theory of generalization: Part 2,Complex Systems, vol. 4, pp 201–249, 1990.
Author information
Authors and Affiliations
Additional information
This work was supported by the Stanford Gravity Probe-B project under NASA contract AS 8-36125.
Rights and permissions
About this article
Cite this article
Rabinowitz, M., Gutt, G.M. & Franklin, G.F. An adaptive Gauss-Newton algorithm for training multilayer nonlinear filters that have embedded memory. Circuits Systems and Signal Process 18, 407–429 (1999). https://doi.org/10.1007/BF01200791
Received:
Revised:
Issue Date:
DOI: https://doi.org/10.1007/BF01200791